findinsite-js indexing
Introduction
Before you can search your web site, you must index it to build a
search database.
- findinsite-js can index HTML, PDF, DOC, DOCX, XLS, XLSX, PPT/PPS, PPTX, TXT and JPEG file types,
featuring a regular indexing schedule and email of indexing results.
Setting the findinsite-js Search database
findinsite-js can search one or more search database.
To tell findinsite-js which search database to use,
first make your search database as described below.
Then go to the configuration screen
Searching section and either:
- add your new search database, or
- change a search database to your new one and press Make Changes.
After an indexing run successfully rebuilds a current findinsite-js search database,
findinsite-js automatically reloads the new search database.
You can confirm the index dates and times for the current search databases by
looking in the configuration screen Searching section.
findinsite-js Indexing
The findinsite-js indexer only builds one search database at a time;
any other pending indexing runs are held in an Immediate queue.
The indexer checks its Scheduled list frequently to see if it
has any work to do.
findinsite-js's indexing is controlled from the online
configuration screen. The Indexing section
of the configuration screen lets you set up immediate or regular indexing runs.
Each indexing run builds one search database by indexing one web site.
The screenshot on the right shows the Indexing section of the configuration
screen menu. Click on one of these options...
• Indexing: shows the indexer status, and lets you set up indexing limits and
email reporting.
- Immediate queue: shows a list of indexing runs queued, in progress or run recently.
- Scheduled queue: shows a schedule of the regular indexing runs.
- Create new: runs the wizard to set up a new indexing run.
|
|
Creating a new Indexing Run
The Create new indexing run wizard has five steps:
- Select time to run: "now" or regular schedule (see right)
- Choose a filename for the search database - see below
Either: Reindex an existing search database
Or: Build a new search database
- Enter URL of web site to index.
Check the file types that you want indexed.
- Enter any Advanced Options
- Confirm indexing run: Store in the schedule and optionally run it now
|
Regular indexing times |
Hourly on hour |
Daily at specified hour |
Weekly at specified week-day and hour |
Monthly at specified first week-day and hour |
Monthly at specified day and hour |
|
Search database filenames and the findinsite-js Work directory
- Filenames
- A search database is actually stored in many files,
each with the basic filename you choose, but with a different extension,
eg for
index1 the actual files are index1.his , index1.hi1 , etc.
In addition, findinsite-js may also use the basic filename with
an underscore appended,
eg index1_ , ie files index1_.his , index1_.hi1 , etc.
When findinsite-js remakes an existing search database it chooses
the oldest filename to remake, eg index1 or index1_ .
This ensures that a good search database is still available in the event that
an indexing run fails, eg because network access to the site fails.
- Work directory
- findinsite-js always puts its search database files in its work directory.
Make sure that this is in a suitable location by looking at the
configuration screen General section.
If you decide to change the work directory then you must
change the work
init parameter.
Scheduled list of indexing runs
Clicking on the Scheduled list option in the configuration screen Indexing
section displays a list of your regular indexing runs like this:
The list shows a summary of each indexing run, with various control options, as described in the
next section. If an indexing run has completed then it will also show a summary of the
run output - see the Immediate list screen below for an example.
Indexing run control options
When an indexing run summary is displayed on screen, click on the appropriate icon
for the following options:
|
Details
|
Display a full description of the Indexing run and the Last output it generated.
|
|
Edit
|
Start the Create new wizard to edit the indexing run.
Note that the indexing run will be removed from the scheduled list when
you start an edit; therefore you must complete the wizard if you want to
store your indexing run.
|
|
Run
|
Put the indexing run in the Immediate queue so that it is run
when it comes to the front of the queue.
|
|
Stop/Remove
|
The action of this option depends on the state of the indexing run:
- If in progress, then stop the run
- If in the completed runs list, then remove from this list
- If in the scheduled list, then remove and delete the run
In each case, you are asked to confirm the action first.
|
Immediate queue of indexing runs
Clicking on the Immediate queue option in the configuration screen Indexing
section displays this information:
- The indexing run in progress
- The list of indexing runs queued waiting for execution
- The list of recent completed indexing runs (within the last 20 minutes)
If any indexing runs are in progress, then the display updates every 20 seconds
to show you the latest status.
The example screenshot below shows an indexing run in progress and one recently completed.
Notice how each summary lists the number of pages and words found, and that the completed
run found 17 problems; click on the
Details (i) icon
for more information on these problems.
Indexing status and general options
Clicking in the main configuration screen Indexing
section displays the current indexing status and settings, with an option to make changes
to your general configuration.
The Indexing limits values give you control of all your indexing runs,
if you do not want the run to take too long or the search database too large.
If you set all the Indexing Email reporting values
then findinsite-js will email you with details of each indexing run completed -
useful to keep an eye on findinsite-js.
Press the Make Changes button if you alter any settings.
Option |
Description |
Current status |
Immediate queue |
A summary of the size of the indexing queue,
whether an indexing run is in progress,
and the number of completed indexing runs.
|
Scheduled list |
The number of regular scheduled indexing runs |
Indexing Limits |
Time limit |
The maximum number of minutes for an indexing run (in minutes)
or 0 to have no limit.
|
File limit |
The maximum number of files for an indexing run,
or 0 to have no limit.
|
Indexing Email reporting |
All the following boxes must be completed to enable email reporting of index results.
|
SMTP send mail server |
The name of your mail server, eg mail.mycompany.com
|
From name |
The name of the email sender, eg Julie Wilson
|
From email address |
The email of the email sender, eg [email protected]
|
To email address |
The email of the email recipient, eg [email protected]
|
Send email if findinsite-js restarted |
If ths box is checked, findinsite-js will send an email whenever it
is started by the servlet engine.
|
Send test email |
Click to send a test email.
Make sure that you press "Make Changes" first if you have just entered
any changes.
|
If any of the above options are in grey boxes then you cannot change them;
the value has been set by your webmaster or servlet administrator.
|
Startup considerations
If findinsite-js is not running then its indexer will not be able to check to see
if any indexing runs are scheduled to run; these runs may therefore be missed.
It is important that findinsite-js is running if you want it to reindex your site regularly.
If the servlet engine stops and restarts then findinsite-js might not be restarted automatically.
Any access to the findinsite-js servlet will start findinsite-js.
To ensure that a reindex runs every day, for example,
then use some tool to schedule a regular 'ping' of a findinsite-js URL,
ie access a findinsite-js page eg the findinsite-js localised search page
http://localhost/findinsitejs/search.
Remember to take into account any time zone differences.
Note that accessing the findinsite-js welcome page will usually NOT start the servlet, eg:
http://localhost/findinsitejs.
|