FindinSite-CD: Search engine for CD/DVD   .
 
Powered by FindinSite-MS
. Home | Examples | Starting | Set up | Advanced | Languages | Purchasing | Email .
. .
  Overview | Index loading | Word highlighting | Word rules | Search page template | Field searches

 

Meta field searching in findinsite-cd


Introduction

Information about a file (or parts of a file) is called meta-data.  For example, the HTML TITLE tag defines meta-data for a web page - a "title" field.  Similarly, this HTML can be used to define an "author" field for a page:
<META NAME=author CONTENT="Chris Cant">
FindinSite and Findex add this meta-data field information to the search database, so a search for "Chris Cant" will find this page.

Field searches

As well as ordinary searches, FindinSite-CD can also do field searches where you search specific field(s). For example, searching the "author" field for "Chris Cant" might give better results than an ordinary search.

But note:

  • Field searches are only possible if you build the search database using the Findex tool; they are NOT available using FindinSite-CD-Wizard.
  • Field searches do not work in all browsers - check compatibility here.

For the example above, a FindinSite-CD "author" field search for "Chris Cant" would find the web page.  An ordinary search for "Chris Cant" would also find the page because words in field are also stored in the main word list.

Field searches only work on search databases generated by Findex (or FindinSite-JS or FindinSite-MS).  FindinSite-CD-Wizard cannot generate field search databases.

Limitations:
In Windows, you (and your users) need Internet Explorer 4+, Navigator 4+, Opera 7+ or similar to do field searches.  Currently only Mac OS 9 Netscape 7 supports field searches. Latest compatibility information.

Example

See the field search example.

Field searches are carried out using a standard HTML form in conjunction with the usual FindinSite-CD window.  JavaScript is used to send field search text to FindinSite-CD.  See below for full instructions on how to set up a field search web page.

FindinSite-CD searches fields in exactly the same way as normal searches, so parentheses, boolean operators, wild cards etc are available.  Your JavaScript determines what field search text is sent to FindinSite-CD.  For example, to implement a multiple-selection list box, like those on the field search example, your JavaScript would combine all selected options together with OR operators, surrounded by parentheses, eg if "English" and "Chinese" languages are selected then the JavaScript sets the "lang" field to "( en OR zh )".

As stated above, note that the words in each field are not automatically included in the main word list.  In the field search example, the word "summary" is in the field "DC:Subject".  However the word "summary" is not used on the page itself, so an ordinary search for "summary" will find no results.

Searching field values

FindinSite-CD puts no interpretation on the field values; they are just treated as a series of characters.  For example, a search for "1996" would not match a "date" field value of "10/10/96".

However various bodies have gone to great lengths to define standard field names and values.

  • For example, RFC 1766 recommends that the various language "lang" fields start with a two letter lower-case language code (from ISO-639), and then optionally followed by a hyphen and a two letter upper-case country code (from ISO-3166).
  • As another example, the Dublin Core initiative primarily specifies standard field names for describing documents.  Associated initiatives go further by defining industry-specific schemas, ie a standard set of possible values for certain fields.  See below for more details.

Indexing

FindinSite and Findex index files to build a search database. They all extract meta-data information and store it in the search database.

Findex, FindinSite-JS and FindinSite-MS also store the meta-data information in an extra part of the search database which is used to support field searches.

Indexing HTML meta-data

For HTML files, the following meta-data information is found:

HTML Field Value
<META name=nnn content=xxx> nnn xxx
<META http-equiv=content-language content=xxx> lang xxx
<BODY lang=xxx> lang xxx
<TITLE>xxx</TITLE> title xxx
<IMG alt=xxx> img xxx

Note that some field names are hard-coded.
Fields can be repeated; all information is stored.
The letter case of field names is ignored, so field "Author" is the same as "AUTHOR".
For the IMG tag, the field is associated with the page not the image, so a hit will display the page, not the image.

Some field names contain a period (.).  It is modern practice to use colons (:) instead.  Therefore Findex converts periods to colons, eg field name "DC.Title" is stored as "dc:title".

Indexing meta-data from other file types

Meta-data is found in some other file types.  Typically it finds the file title, description and keywords.  For example, the PDF file "subject" document property is stored as a "description" field.

See the file types page for details of the meta-data information stored in fields for each file type.  The RDF file type is special because it exists solely to define meta-data for other files - see here for details.

Standard HTML meta-data

As described above, the indexers find any META/name/content meta-data.  The following is a list of standard or commonly-used META field names:

abstract author changed copyright created date description distribution doc-version formatter generator indexgenerator keywords originator platform product progid robots template

Dublin Core meta-data

Dublin Core is a minimal set of descriptive elements that facilitate the description and the automated indexing of document-like networked objects, in a manner similar to a library card catalog.

Dublin Core fields can be defined in HTML META/name/content tags.  (They can also be put in RDF/XML format, also indexable by the Findex phdccRDF parser.)  RFC 2731 says that Dublin Core elements should be given a DC. prefix when put in HTML META fields, so the "Title" element should be given META name "DC.TITLE", eg:

<META NAME="DC.Title" CONTENT="Research into Crocodile eating habits">

The following is a list of standard Dublin Core HTML meta-data field names.  Related initiatives define industry-specific schemas, ie a standard set of possible values for certain fields.

DC.Title DC.Creator DC.Subject DC.Description DC.Publisher DC.Contributor DC.Date DC.Type DC.Format DC.Identifier DC.Source DC.Language DC.Relation DC.Coverage DC.Rights

Findex finds Dublin Core meta-data from META/name/content tags in the normal way.  Note that, as described above, all periods (.) in field names are converted to colon (:) to use the standard RDF terminology, eg field DC.Title is renamed DC:Title.


Making field search pages

Field searches are carried out using a standard HTML form in conjunction with the usual FindinSite-CD window (or an invisible FindinSite-CD with HTML results).

JavaScript is used to send field search text to FindinSite-CD.  There is no way to specify field search text within the FindinSite-CD window.

There are two aspects to setting up a field search:

  • The HTML form and applet
  • The JavaScript

HTML Form and Applet

Use an HTML form to specify the field options that you want the user to see.  The following example uses a TABLE to line up the field prompts and options.

The form is called fields.  If the user presses Enter in the form, then the onSubmit JavaScript event handler is run; this returns the value returned from JavaScript function submitFieldsForm() - we will show this function later.

Each row of the TABLE defines form field values.  In this case there is one free text INPUT field called creator, and one multiple selection list-box called language.  You could also have any other form field type, such as radio buttons or check boxes.

The final section of the example defines the FindinSite-CD window in the standard way, with the APPLET named fisCD as usual.  The only new code defines the setFieldsFn parameter as setFields.  This tells FindinSite-CD to call JavaScript function setFields() when it needs to find what field search values the user has specified.

<TABLE>
<FORM NAME="fields" onSubmit="return submitFieldsForm()">
<TR>
<TD VALIGN=top><B>DC:Creator:</B></TD>
<TD VALIGN=top>
    <INPUT TYPE=text NAME=creator MAXLENGTH=40>
</TD>
</TR>
<TR>
<TD VALIGN=top><B>Language:</B></TD>
<TD VALIGN=top>
    <SELECT MULTIPLE NAME="language" SIZE="3">
    <OPTION VALUE="any" SELECTED>Any
    <OPTION VALUE="en">English
    <OPTION VALUE="zh">Chinese
    <OPTION VALUE="zh TW">Chinese Traditional
    <OPTION VALUE="de">German
    <OPTION VALUE="ja">Japanese
    </SELECT> 
</TD>
</TR>
</FORM>
</TABLE>

<APPLET CODE=fisCD NAME=fisCD WIDTH=350 HEIGHT=200 ARCHIVE="fiscd.jar" MAYSCRIPT>
<PARAM NAME=index1 VALUE="fields,en">
<PARAM NAME=rules VALUE="rulesen.txt">
<PARAM NAME=setFieldsFn VALUE="setFields">
<PARAM NAME=target VALUE="_blank">

Sorry, your browser is not set up to run Java applets.<BR>
<A TARGET=_blank HREF="http://www.phdcc.com/getjvm.htm">How to get a Java VM</A>.

</APPLET>

JavaScript

JavaScript is the glue that sends the user's field search options to FindinSite-CD.

In the example below, submitFieldsForm() is called when the user presses Enter in the form.

  • submitFieldsForm() calls the FindinSite-CD Search function; the null parameter means that the search text in the FindinSite-CD window is not changed.
  • submitFieldsForm() returns false to ensure that the default form action does not occur.

The setFields() function is called by FindinSite-CD whenever it needs to find out the field search values.  (Remember that we used the "setFieldsFn" parameter to tell FindinSite-CD the name of this function.)

  • setFields() delegates its field setting jobs to two functions: creatorChange() and langChange().
  • setFields() returns true to tell FindinSite-CD that it has set the fields.

The creatorChange() function gets the "creator" form field value.  If it is empty, then it sets it to *.  If not empty, it is wrapped in protective parentheses.  The creator is then passed to FindinSite-CD using the FindinSite-CD SetField function.

The langChange() function does a similar job for the more complicated case.  If there is no selection or "Any" language is set then the FindinSite-CD "lang" field is set to null so that the "lang" field is not searched.  Otherwise langChange() goes through all possible language options, and - if selected - adds the language value to the string to be passed to FindinSite-CD.  Note how each language value is wrapped in parentheses before being ORed together.  As before, the FindinSite-CD SetField function is used to set the field value.  This code is complicated by the fact that not all browsers allow the "lang" field settings to be accessed in the same way.

<SCRIPT LANGUAGE=JavaScript>
<!--

function submitFieldsForm()
{
    document.fisCD.Search(null);
    return false;
}

function setFields()
{
    creatorChange();
    langChange();
    return true;
}

function creatorChange()
{
    var creator = document.fields.creator.value;
    if( creator=="")
        creator = "*";
    else
        creator = "( "+creator+" )";
    document.fisCD.SetField("dc:creator",creator);
}

function langChange()
{
    var lang = document.fields.language.value;
    if( lang=="" || lang=="any")
        document.fisCD.SetField("lang",null);
    else
    {
        var langcount = document.fields.language.options.length;
        var langs = "";
        var options = document.fields.language.options;
        var optionsitem = options.item;
        for( var langno=1; langno<langcount; langno++)
        {
            var selected;
            if( optionsitem)
                selected = options.item(langno).selected;
            else
                selected = options(langno).selected;
            if( selected)
            {
                if( langs!="")
                    langs += " OR ";
                if( optionsitem)
                    langs += "( "+options.item(langno).value+" )";
                else
                    langs += "( "+options(langno).value+" )";
            }
        }
        if( langs!="")
            langs = "( "+langs+" )";
        document.fisCD.SetField("lang",langs);
    }
}

<!-- -->
</SCRIPT>
  All site Copyright © 1996-2013 PHD Computer Consultants Ltd, PHDCC   Privacy  

Last modified: 1 February 2013.

Valid HTML 4.01 Transitional Valid CSS!