findinsite-ms Search API web service
findinsite-ms has an application programmer interface (API) for searches:
- Send search text to the findinsite-ms search web service.
- Receive back the requested results - all or just the first 10 say.
- Display the results in your desired layout.
- Reorder or change the results as required.
- Cache the results locally to avoid further requests.
The Search API at the phdcc web site is at this URL:
http://www.phdcc.com/findinsite/SearchService.asmx
You can access this URL from your site at any time - it only searches the phdcc web site.
If you are going to use this service URL regularly or as an example, please contact us for permission.
Once you have installed findinsite-ms at your site, the Search API will be available
at a similar URL - the exact URL depends on your web site and installation directory.
The Search API web service can be turned off completely in the findinsite-ms
Control Panel. Alternatively, you can only allow accesses from your own web site -
by disabling external access to the Search API.
If you are using the Free license, these fields in fisSearchResult will be null in results from files after the first 60 files:
URL, HighlightURL and Filename.
Search API Methods
Note: The GetSubsets() and GetFieldnames() methods will probably be reworked
soon to provide more information about the main index and each subset search database.
Examples
The following examples are written as ASP.NET .aspx web form pages.
The examples will therefore only work when run on an ASP.NET site.
If you are viewing the documentation as a file on your local disk, then the "locally" links will not work.
To try out the examples, follow the link to the phdcc web site.
Making these examples work locally
If you have installed the findinsite-ms development kit, the documentation
appears on the Start menu. However, these Search API examples will not work from here.
If you are running IIS (eg if you have Windows XP Pro, Vista, 7, Server 2003, Server 2008 or similar) and have .NET installed,
then you can make the examples work by making an IIS virtual directory that maps to the documentation directory.
To do this, you have to...
- For IIS6, Select
Start+All Programs+Administrative Tools+Internet Information Services
- Expand the tree to get the "Default Web Site".
- Right click on "Default Web Site", and select
New+Virtual Directory...
- Click Next
- Enter a new directory name in the Alias box, eg
findinsite , and click Next
- Select a directory (eg
C:\Program Files\PHD\findinsite\ ), and click Next
- Do not change the permissions, and click Next
- Click Next to finish the wizard
- The findinsite-ms documentation should now be visible at a URL like this:
http://localhost/findinsite/.
- Follow the documentation links to run these examples.
- For IIS7, go into [Control Panel] then [Administrative Tools] then
[Internet Information Services]
Information flows
This diagram shows typical information flows between:
- The user's browser
- Your web site with the search form
- The findinsite-ms Search API web service
|
(Note that a user application program can access findinsite-ms directly -
however a user browser cannot access the Search API directly.)
|
- The user enters search text into your search form and presses "Go Search".
The HTTP request is sent to your web site to be handled, eg by your ASP.NET web form page.
- The web page processes the form by sending a search request to findinsite-ms,
typically using the SOAP protocol.
- findinsite-ms does the search and returns the results to your web site,
again typically using SOAP.
- Your web page formats the received search results and returns the information to the user using an HTTP response.
When using the findinsite-ms Search API you need to be aware of the information flows.
Your web site and findinsite-ms will typically be on the same web site,
so SOAP transmission delays will be very short. However if they are on different servers then information
transmission times could be significant. In particular, a request that returns many hits could take a
long time to transmit.
There are three possible strategies to cope with long transmission delays:
- Put your web site and findinsite-ms on the same server
- Once you have received a search result, store it in a Session variable for your web page.
If the user returns for the next 10 hits say, then retrieve the results from the Session variable.
See example 2 for details of this.
- Only request the number of hits that you can process quickly.
A findinsite-ms search request can set the range of hit numbers required.
Example 3 shows how to use this approach.
Note that the findinsite-ms Search API is set up so that a request and its response are cached
in the server for 60 seconds. This means that a repeat request (within 60 seconds) does not have to be processed
by findinsite-ms. Instead, the result is returned straight away from the ASP.NET cache.
However the result will still have to be sent over the network.
Interface technology
The findinsite-ms search API is implemented as an ASP.NET XML Web Service.
It responds to SOAP and HTTP GET/POST requests.
The search API can be accessed by any SOAP compliant techology - as well as HTTP GET/POST requests.
The Web Services Description Language (WSDL) service description is made available, as at the phdcc web site:
http://www.phdcc.com/findinsite/SearchService.asmx?WSDL
The search API can also be inspected by a web browser, as at the phdcc web site:
http://www.phdcc.com/findinsite/SearchService.asmx
Referencing the web service in ASP.NET
To use the Search API from ASP.NET, you first need to provide a Web Reference to the web service, or provide a class that does the same job.
This can be done in one of three ways using the .NET SDK or Visual Studio .NET:
- In your Visual Studio project, add a Web Reference to the Search API web service,
eg to use the phdcc web site search API, add a reference to
http://www.phdcc.com/findinsite/SearchService.asmx
- You can make a local source file equivalent to a Web Reference using the .NET SDK
wsdl.exe tool.
For example, this command line generates a C# file SearchService.cs for the phdcc web site:
wsdl.exe http://www.phdcc.com/findinsite/SearchService.asmx?WSDL
Include this file in your project.
- The recommended approach is to use the fisClient class library, supplied in
phdcc.fis.fisClient.dll .
This library contains a class fisClient.SearchService based on the code generated by the above method, but
with the following enhancements:
- The web service URL is set in the SearchService() constructor.
- The results classes use properties which make them easy to use in a DataGrid.
Add a Reference to phdcc.fis.fisClient.dll , copying this file
into your project application /bin directory.
Search API - Search() method
The Search API is described in terms of its use in ASP.NET using the supplied fisClient class library in C#.
This section describes the Search() method.
First, you need to instantiate a new SearchService object by calling its constructor,
passing in the URL string of the search service:
C# |
public SearchService( string ServiceURL)
|
eg for the phdcc web site:
C# |
fisClient.SearchService fis = new fisClient.SearchService("http://www.phdcc.com/findinsite/SearchService.asmx");
|
VB |
Dim fis As New fisClient.SearchService("http://www.phdcc.com/findinsite/SearchService.asmx")
|
The search API consists of one method:
C# |
public fisSearchResult Search(fisSearchRequest SearchRequest)
|
VB |
Public Function Search( ByVal SearchRequest As fisSearchRequest) As fisSearchResult
|
ie you call the Search() method, passing in a fisSearchRequest object as a parameter,
and you receive a fisSearchResult object as the result.
The fisSearchRequest object has a simple constructor that lets you do a simple text search, returning all results, eg:
C# |
fisClient.fisSearchRequest request = new fisClient.fisSearchRequest("ASP.NET");
fisClient.fisSearchResult result = fis.Search(request);
|
VB |
Dim request As New fisClient.fisSearchRequest("ASP.NET")
Dim result As fisClient.fisSearchResult
result = fis.Search(request)
|
The returned fisSearchResult object has a Message string field that is set if there are any errors.
The Count integer field is set to the number of hits found.
The Hits field is an array of fisHit objects, one for each hit.
Each fisHit has the various fields to describe a hit in a file, eg
Title and URL.
Search request options specified in fisSearchRequest
The search request options are specified in the fisSearchRequest class:
C# |
namespace fisClient
{
public class fisSearchRequest
{
public string Text = null;
public string Subsets = null;
public int HitsFrom = 0;
public int HitsTo = 0;
public fisHitAttributes HitElems = null;
public ArrayList SearchFields; // fisSearchField-s
public string UserParam = null;
public fisSearchRequest(string text) { Text = text; }
}
public class fisSearchField
{
public string FieldName;
public string FieldValue;
public fisSearchField(string FieldName,string FieldValue);
}
[FlagsAttribute]
public enum fisHitAttributes : short
{
ResultNo = 1,
Title = 2,
URL = 4,
HighlightURL = 8,
Abstract = 16,
Filename = 32,
WordCount = 64,
Date = 128,
Indexed = 256,
Size = 512,
Snippet = 1024,
PageWordCount = 2048,
All = ResultNo|Title|URL|HighlightURL|Abstract|Filename|WordCount
|Date|Indexed|Size|Snippet|PageWordCount,
}
}
|
VB |
Namespace fisClient
Public Class fisSearchRequest
Public Sub New()
Public Sub New(ByVal [text] As String)
Public HitElems As fisHitAttributes
Public HitsFrom As Integer
Public HitsTo As Integer
Public SearchFields As ArrayList
Public Subsets As String
Public [Text] As String
Public UserParam As String
End Class
Public Class fisSearchField
Public Sub New()
Public Sub New(ByVal FieldName As String, ByVal FieldValue As String)
Public FieldName As String
Public FieldValue As String
End Class
Public Enum fisHitAttributes
ResultNo = 1
Title = 2
URL = 4
HighlightURL = 8
Abstract = 16
Filename = 32
WordCount = 64
Date = 128
Indexed = 256
Size = 512
Snippet = 1024
PageWordCount = 2048
All = 4095
End Enum
End Namespace
|
This table summarises the fisSearchRequest fields:
Name |
Type |
Description |
Default |
Text |
string |
The text to search for |
|
Subsets |
string |
The list of subsets to search, comma-separated |
null = all subsets |
HitsFrom |
integer |
The first result number required |
0 |
HitsTo |
integer |
The last result number required |
0 |
If HitsFrom and HitsTo are both 0 (zero) then
all results are returned.
|
HitElems |
enum short |
The result elements required, ORed together |
fisHitAttributes.All |
SearchFields |
ArrayList |
The list of fields and field values to search for |
|
UserParam |
string |
Any value, to be returned in fisSearchResult |
|
Search results returned in fisSearchResult
If you are using the Free license, these fields in fisSearchResult will be null in results from files after the first 60 files:
URL, HighlightURL and Filename.
The search results are returned in the fisSearchResult class:
C# |
namespace fisClient
{
public class fisSearchResult
{
public int Count;
public string Message;
public object[] Hits; // fisHit-s
public string UserParam;
}
public class fisHit
{
public string Abstract { get; set; }
public DateTime Date { get; set; }
public string Filename { get; set; }
public string HighlightURL { get; set; }
public DateTime Indexed { get; set; }
public int PageWordCount = 0 { get; set; }
public int ResultNo { get; set; }
public int Size { get; set; }
public string Snippet { get; set; }
public string Title { get; set; }
public string URL { get; set; }
public int WordCount { get; set; }
}
}
|
VB |
Namespace fisClient
Public Class fisSearchResult
Public Sub New()
Public Count As Integer
Public Hits As Object() ' fisHit-s
Public Message As String
Public UserParam As String
End Class
Public Class fisHit
Public Sub New()
Public Property Abstract As String
Public Property Date As DateTime
Public Property Filename As String
Public Property HighlightURL As String
Public Property Indexed As DateTime
Public Property PageWordCount As Integer
Public Property ResultNo As Integer
Public Property Size As Integer
Public Property Snippet As String
Public Property Title As String
Public Property URL As String
Public Property WordCount As Integer
End Class
End Namespace
|
This table summarises the fisSearchResult fields:
Name |
Type |
Description |
Count |
integer |
The total number of hits for the search request |
Message |
string |
An error or information message |
Hits |
array |
An array of fisHit objects, one for each requested hit, if available |
UserParam |
string |
The value specified in fisSearchRequest.UserParam |
This table summarises the fisHit properties.
Note that the property is only set if the appropriate bit in fisSearchRequest.HitElems is set.
Name |
Type |
Description |
Abstract |
string |
The abstract of the file |
Date |
DateTime |
The date/time of the file |
Filename |
string |
The filename component of the URL |
HighlightURL |
string |
A URL that shows the hit with search words highlighted |
Indexed |
DateTime |
The date/time when the file was indexed |
PageWordCount |
integer |
The count of words in the file |
ResultNo |
integer |
The hit number, ie from 1 to fisSearchResult.Count |
Size |
integer |
The file size in bytes |
Snippet |
string |
Extracts from the file mentioning the search words (with search words highlighted using a span with class hilite) |
Title |
string |
The file title |
URL |
string |
The full URL to the file |
WordCount |
integer |
The count of search words in the file |
Search API - GetSubsets() method
The GetSubsets() method returns a string array containing a list of
all the search database subsets supported by findinsite-ms at this site.
C# |
public string[] GetSubsets()
|
VB |
Public Function GetSubsets() As String()
|
Search API - GetFieldnames() method
The GetFieldnames() method returns a string array containing a list of
all the field names for the specfied search database subset.
C# |
public string[] GetFieldnames(string subsetname)
|
VB |
Public Function GetFieldnames(ByVal subsetname As String) As String()
|
Search API Technical Details
Returned messages are always in English. Only English boolean operators are recognised in the search text, ie AND, OR and NOT.
If requested, the Search API could be updated to let you set the language for each request.
As described earlier, all Search API requests and associated responses are cached on the server for 60 seconds.
Consequently, an immediate repeat request is honoured without calling the Search API -
however the response still has to be sent over the network. Therefore, it is recommended that responses
be saved (eg in a Session variable) if they are likely to be used again.
All characters are sent and received using the UTF-8 character set in XML.
If the generated XML contains control code characters then it will cause an error in a calling ASP.NET application;
therefore these control characters are removed from all returned strings: codes less than \u0020, surrogate codes between \uD800 and \uDFFF, and codes \uFFFE and \uFFFF.
findinsite-ms checks the search text for cross-site scripting attacks -
an error is returned if the validation fails.
The Search API should not normally send any exceptions. However your code should be prepared to cope with any that do arise,
eg due to network failures.
|