How many files have the major search engines found at my site?
|
The simplest way to find out how many files have been found on your website is to enter site:domain as the search text,
eg site:www.phdcc.com . Note that search engines usually use the word "about" when saying the site size,
eg Results 61 - 70 of about 780.
Use the following links to assess how many pages these search engines think are on the phdcc.com site.
The numbers in the right of the table indicate how many were found today:
Search:
site:www.phdcc.com |
Number of files reported
(8 June 2005)
|
Google
|
763 |
MSN
|
2721...810...250 |
Yahoo
|
1000...952...958..927 |
findinsite
|
712 |
As can be seen, the number reported by MSN was initially 2721 but this went down to 250 by the time the third
or fourth sheet of results was shown. For Yahoo, the results kept reducing as each results sheet was shown.
phdcc's site search engine findinsite
reported 712 files. Here are some reasons that we have found for the numbers being different today (27 May 2005):
- Google does not include this page in its index (
http://www.phdcc.com/dircvs.html ), even though this
page has been available since at least 1999. Both MSN and Yahoo have found this page.
- Google does not include this page in its index (
http://www.phdcc.com/fis/phdcc_fis_intro.ppt ),
probably because it has not re-indexed this portion of our web site for a few days. The file went online 3 days ago.
Other files are also not indexed for the same reason, eg http://www.phdcc.com/brightdayler/access.htm .
MSN and Yahoo have not found included these files either.
- Google does include an SWF Flash file. findinsite does not index this file type.
- findinsite erroneously double-counts some directory URLs, eg it counts
http://www.phdcc.com/findinsite/ and http://www.phdcc.com/findinsite/default.htm as separate
files even though they are the same.
Search:
site:www.swlg.org.uk |
Number of files reported
(7 June 2005)
|
Google
|
141 |
MSN
|
644..63 |
Yahoo
|
106...105...104 |
findinsite
|
142 |
Google does seem to have performed poorly for this site. Is it because the site is based on frames?
Google's old list of 9 hits included various files that have not existed on the site for a long time.
Also: with Google if you search for somthing useful (like "dinghy sailing yorkshire") it doesn't find the yeadon site at all.
|