findinsite-ms word highlighting
findinsite-ms can highlight search words in result web pages.
Highlighting is on by default, but can be turned off in the Control Panel
Look and Feel screen -
see below for details of how to change the highlighting HTML.
It is recommended that you only use highlighting if you are displaying hits on your own web site.
Word highlighting is very useful because it lets the user see their search words on the page straight away,
making the search process much more friendly. findinsite-ms does not use cached web pages
when highlighting - it uses the live page.
The Search API supports word highlighting by returning a
HighlightURL field for each hit.
If you do a search for
brown car for example,
the hits are listed as normal. If you click on a hit in an HTML web page, then
findinsite-ms displays the page with all search words and their variants highlighted
and contiguous words run together. The page is scrolled to show the first highlight.
Josie jumped out of the car
and landed in the brown mud.
Brown cars came past
and splashed her.
"Brown car, go away!" shouted Josie.
By default, a header is added to the page (just after the <body> tag) to tell the user
that it has been highlighted by findinsite-ms.
This can be switched off in the Control Panel, if desired. Example header:
In the results list, clicking on the hit link will show the page with highlighting. If you right-click and choose
any option (eg Open in New Window) then the hit page is shown without highlighting.
The highlighting process will work across domain boundaries, so findinsite-ms
on the phdcc web site
http://www.phdcc.com/findinsite/ can highlight pages on your domain, eg
- Warning: While the highlighting method (see below) should correctly resolve links correctly, it is
possible that mischievous code could interfere at some point with the findinsite-ms web site.
It is therefore recommended that you only highlight on your own web site.
In a very small number of cases, highlighted pages are not shown correctly if the findinsite-ms domain
is different from the site domain.
If there is a problem, either switch off highlighting, or run findinsite-ms
at the site domain.
The one case that we have found is an unusual frameset, where a frameset is embedded in a larger page and created
dynamically after the main page has been loaded. The problem is that the FRAME SRC is loaded by the browser from the
findinsite-ms site, not the searched site.
The Cached page shown by a major search engine also suffers from this problem.
Word Highlighting Technical Details
findinsite-ms highlights words in a result web page by:
1. Reading the result web page
2. Adding in highlighting HTML
3. Returning the amended web page to the user
When the user clicks on a (highlighting) link in the result list, the
show.aspx is called.
The link URL includes parameters to tell
show.aspx which page to highlight and what words to highlight.
show.aspx retrieves the requested page. It then adds in the highlighting HTML and returns it to the browser.
Normally this process would not work because all the page links would go wrong - findinsite-ms
gets round this problem by adding a <base> tag at the top of the page. For example, for this page online,
it would add in the following, together with the explanatory header:
<base href='http://www.phdcc.com/findinsite/highlite.htm' />
findinsite-ms passes all HTTP request headers to the requested page, and returns all
received HTTP response headers in its response. This ensures that session state using cookies is maintained.
If there are any problems in the above process, then findinsite-ms aborts highlighting
and redirects the browser to show the page without highlighting.
Note that the above process works with any sort of URL that produces HTML output, even dynamically generated pages
produced by ASP, ASPX or PHP pages. Also note that findinsite-ms always requests
the live page - it does not use a cached copy of the page, which could be out of date.
Changing the Highlighting HTML
findinsite-ms highlights found search words by inserting HTML before and after
the found words. The default highlighting uses a background colour of yellow and a bold font colour of red.
This is achieved using the following HTML:
<SPAN style='background: yellow;'><FONT COLOR=red><B>
You cannot change the highlighting definitions in the Control Panel because HTML entry
is disallowed for safety reasons. To change the highlighting HTML, you must therefore edit the
findinsite-ms settings file
findinsite.xml in the work
findinsite.xml stores various settings in XML format. The <highlightStart> and
<highlightEnd> tags store the HTML that start and end highlighting. In the XML file,
< and > must be encoded as < and >.
The default values should be stored as follows:
<highlightStart><SPAN style='background: yellow;'><FONT COLOR=red><B></highlightStart>
Carefully edit the
findinsite.xml directly, eg using FTP software download a copy of this file,
edit it in Notepad or similar, then upload using FTP. findinsite-ms will only
read its setting file when it restarts. Either wait for a restart, or force a restart by updating
Web.Config file. Once restarted, check in the Look and Feel screen
that the highlighting has been set as you wish.