findinsite-cd 日本語 Japanese support


  • ページのどこかに文字列のあるページを検索
  • ダブルコーテーションを使用し、互いに隣接する文字列を表示
  • シングルコーテーションを使用し、頭文字がマッチするものを表示
  • 検索リストよりさらにページを選択
  • 「ページの表示」をピックし、ページを表示する
  • 以下のボックスには、選択されたページのはじめの文字が表示されます
  • ワイルドカード(*または?)を使用し、マッチング文字列を検索
Screenshot of FindinSite-CD running in Japanese
Screenshot of FindinSite-CD running in Japanese

Japanese Japanese support

FindinSite and Findex support Simplified and Traditional Chinese characters.
  • FindinSite and Findex can scan web pages in the Shift JIS, JIS and EUC character sets (shift_jis, x-sjis, iso-2022-jp, euc-cp)
  • FindinSite-CD-Wizard and Findex can scan MS-Word, MS-Excel and MS-PowerPoint files containing Japanese characters.
  • However, FindinSite-CD-Wizard may not be able to scan PDF files containing Japanese characters.
  • FindinSite has a Japanese user interface, from a Japanese language file.

To see this in action your computer and browser must support Japanese character sets.

FindinSite-CD-Wizard Windows set up tool

FindinSite-CD-Wizard screen shot editing Japanese FindinSite-CD-Wizard and Findex can scan Japanese character set web pages even if your computer does not have Japanese character sets installed.

If you are running on a Japanese PC then you will be able to view and edit in Japanese in FindinSite-CD-Wizard. If not, then you can still edit the search database - if you take care. See the Character sets page for full details of viewing and editing.

Read the Character sets page for details of how to set up Windows 2000 and XP to view and edit in Japanese.

FindinSite-CD Java applet

FindinSite-CD is the Java applet that you distribute to your customers on CD-ROM.

FindinSite-CD has a Japanese user interface language file and will work with Japanese characters.

Your customers must have a computer with Japanese character set support to see the Japanese characters. They also must have a browser Java implementation that supports Japanese. See the Character sets page for details of how to set up Internet Explorer and Netscape Communicator to display Japanese characters.

Implementation details

See the characters sets page for full details of the supported Japanese character sets.

Japanese characters are translated from the supported Japanese web character sets (eg Shift JIS) into Unicode. These Unicode characters are stored in the FindinSite search database in UTF-8 format.

Japanese full-width western characters are translated into the base Western character code. Similarly, all half-width Katakana and Hangul characters are translated into their standard width character codes. Other useful character code translations are also done.

All non-Western characters are treated as single words by FindinSite. For example, the three characters in the word "Japanese" (日本語) are separate words, 日, 本 and 語. However, if you search for 日本語 then FindinSite will effectively put double quotes around these characters, so that only instances of these three characters together will be found. If you want to find all instances of 日, 本 and 語 on a page, then search for 日 本  語, ie with spaces in between.

Note that all HTML tag names and HTML tag attribute names must be in Western characters, ie in the Unicode range \u0000 to \u00FF inclusive. And all web page names and target frame names must be in English. For example, the following line is accepted by FindinSite and Findex:
<META NAME="description" CONTENT="日本語">
In this example, META is a tag name, and NAME and CONTENT are tag attribute names.

Currently there is no Japanese stop word file.

