FindinSite-JS: Search engine for a Java server website   .
  search
Powered by FindinSite-JS
. Home | Installation | Indexing | Configuration | Advanced | Purchasing .
. .
  Indexing / Advanced | File types | Charset support | PDF support

 

findinsite-js indexing


Introduction

Before you can search your web site, you must index it to build a search database.
  • findinsite-js can index HTML, PDF, DOC, DOCX, XLS, XLSX, PPT/PPS, PPTX, TXT and JPEG file types, featuring a regular indexing schedule and email of indexing results.

Setting the findinsite-js Search database

findinsite-js can search one or more search database.  To tell findinsite-js which search database to use, first make your search database as described below.  Then go to the configuration screen Searching section and either:
  • add your new search database, or
  • change a search database to your new one and press Make Changes.

After an indexing run successfully rebuilds a current findinsite-js search database, findinsite-js automatically reloads the new search database. You can confirm the index dates and times for the current search databases by looking in the configuration screen Searching section.

findinsite-js Indexing

The findinsite-js indexer only builds one search database at a time; any other pending indexing runs are held in an Immediate queue. The indexer checks its Scheduled list frequently to see if it has any work to do.

findinsite-js's indexing is controlled from the online configuration screen. The Indexing section of the configuration screen lets you set up immediate or regular indexing runs. Each indexing run builds one search database by indexing one web site.

The screenshot on the right shows the Indexing section of the configuration screen menu. Click on one of these options...

Indexing: shows the indexer status, and lets you set up indexing limits and email reporting.

  • Immediate queue: shows a list of indexing runs queued, in progress or run recently.
  • Scheduled queue: shows a schedule of the regular indexing runs.
  • Create new: runs the wizard to set up a new indexing run.
Status, Limits, Emails Immediate queue Scheduled list Create new wizard Config screen Indexing menu

Creating a new Indexing Run

The Create new indexing run wizard has five steps:
  1. Select time to run: "now" or regular schedule (see right)
  2. Choose a filename for the search database - see below
       Either: Reindex an existing search database
       Or: Build a new search database
  3. Enter URL of web site to index.  Check the file types that you want indexed.
  4. Enter any Advanced Options
  5. Confirm indexing run: Store in the schedule and optionally run it now
Regular indexing times
Hourly on hour
Daily at specified hour
Weekly at specified week-day and hour
Monthly at specified first week-day and hour
Monthly at specified day and hour

Search database filenames and the findinsite-js Work directory

Filenames
A search database is actually stored in many files, each with the basic filename you choose, but with a different extension, eg for index1 the actual files are index1.his, index1.hi1, etc.
In addition, findinsite-js may also use the basic filename with an underscore appended, eg index1_, ie files index1_.his, index1_.hi1, etc.

When findinsite-js remakes an existing search database it chooses the oldest filename to remake, eg index1 or index1_.
This ensures that a good search database is still available in the event that an indexing run fails, eg because network access to the site fails.

Work directory
findinsite-js always puts its search database files in its work directory. Make sure that this is in a suitable location by looking at the configuration screen General section. If you decide to change the work directory then you must change the work init parameter.

Scheduled list of indexing runs

Clicking on the Scheduled list option in the configuration screen Indexing section displays a list of your regular indexing runs like this:

Config screen Indexing Scheduled list

The list shows a summary of each indexing run, with various control options, as described in the next section. If an indexing run has completed then it will also show a summary of the run output - see the Immediate list screen below for an example.

Indexing run control options

When an indexing run summary is displayed on screen, click on the appropriate icon for the following options:

Indexing run Details icon Details Display a full description of the Indexing run and the Last output it generated.
Indexing run Edit icon Edit Start the Create new wizard to edit the indexing run.
Note that the indexing run will be removed from the scheduled list when you start an edit; therefore you must complete the wizard if you want to store your indexing run.
Indexing run Run icon Run Put the indexing run in the Immediate queue so that it is run when it comes to the front of the queue.
Indexing run Stop/Remove icon Stop/Remove The action of this option depends on the state of the indexing run:
  • If in progress, then stop the run
  • If in the completed runs list, then remove from this list
  • If in the scheduled list, then remove and delete the run
In each case, you are asked to confirm the action first.

Immediate queue of indexing runs

Clicking on the Immediate queue option in the configuration screen Indexing section displays this information:
  • The indexing run in progress
  • The list of indexing runs queued waiting for execution
  • The list of recent completed indexing runs (within the last 20 minutes)
If any indexing runs are in progress, then the display updates every 20 seconds to show you the latest status.

The example screenshot below shows an indexing run in progress and one recently completed. Notice how each summary lists the number of pages and words found, and that the completed run found 17 problems; click on the Details (i) icon for more information on these problems.

Config screen Indexing Immediate list

Indexing status and general options

Clicking in the main configuration screen Indexing section displays the current indexing status and settings, with an option to make changes to your general configuration.

The Indexing limits values give you control of all your indexing runs, if you do not want the run to take too long or the search database too large.

If you set all the Indexing Email reporting values then findinsite-js will email you with details of each indexing run completed - useful to keep an eye on findinsite-js.

Press the Make Changes button if you alter any settings.

Option Description
Current status
Immediate queue A summary of the size of the indexing queue, whether an indexing run is in progress, and the number of completed indexing runs.
Scheduled list The number of regular scheduled indexing runs
Indexing Limits
Time limit The maximum number of minutes for an indexing run (in minutes) or 0 to have no limit.
File limit The maximum number of files for an indexing run, or 0 to have no limit.
Indexing Email reporting
All the following boxes must be completed to enable email reporting of index results.
SMTP send mail server The name of your mail server, eg mail.mycompany.com
From name The name of the email sender, eg Julie Wilson
From email address The email of the email sender, eg julie@mycompany.com
To email address The email of the email recipient, eg julie@mycompany.com
Send email if findinsite-js restarted If ths box is checked, findinsite-js will send an email whenever it is started by the servlet engine.
Send test email Click to send a test email.
Make sure that you press "Make Changes" first if you have just entered any changes.
If any of the above options are in grey boxes then you cannot change them; the value has been set by your webmaster or servlet administrator.

Startup considerations

If findinsite-js is not running then its indexer will not be able to check to see if any indexing runs are scheduled to run; these runs may therefore be missed. It is important that findinsite-js is running if you want it to reindex your site regularly.

If the servlet engine stops and restarts then findinsite-js might not be restarted automatically.

Any access to the findinsite-js servlet will start findinsite-js. To ensure that a reindex runs every day, for example, then use some tool to schedule a regular 'ping' of a findinsite-js URL, ie access a findinsite-js page eg the findinsite-js localised search page http://localhost/findinsitejs/search.
Remember to take into account any time zone differences.

Note that accessing the findinsite-js welcome page will usually NOT start the servlet, eg: http://localhost/findinsitejs.

  All site Copyright © 1996-2011 PHD Computer Consultants Ltd, PHDCC   Privacy  

Last modified: 19 October 2009.

Valid HTML 4.01 Transitional Valid CSS!