The truth may be out there on the World Wide Web, but according to recent studies you are unlikely to find it. Most Web users rely on search engines to sift the immense quantity of information available, and there lies the problem.
Focusing on the popular Alta Vista search engine, researchers Tschera Harkness Connell and Jennifer Tipple of Kent State University found that the search engine uncovered Web pages with the correct answer to queries 27 per cent of the time and the wrong answer nine per cent of the time. That is a positive indicator for the reliability of information on the Web, but the staggering result from their study was that the other 64 per cent of pages returned no answer or had disappeared altogether since they were indexed by the search engine.
This study highlights just one of the problems facing traditional search engines in covering the rapidly expanding information base on the Web. Two other US researchers, Steve Lawrence and Lee Giles, published a study of search engines in the July issue of Nature magazine that stated clearly what many Web users have known for some time - search engines are falling rapidly behind in indexing the Web. The study found that, not only do search engines not index sites equally, in some cases they do not actually index sites for months on end. The most significant result from the study, however, was that no single search engine returned more than 16 per cent of the pages currently on the Web. Even combining all 11 engines surveyed, the total is still just over 38 per cent of the Web. The problem has worsened since July. The original study dealt with the 800 million pages available online at the time, and according to Lawrence: "Based on our 800 million estimate and the previous rate of growth of the Web, we get a current estimate of about 1.5 billion pages for the publicly indexable Web."
The problem does not end there, says Lawrence: "The Web has been increasing in size faster than the search engines, so relatively speaking, most engines have been indexing a smaller fraction of the Web over time in recent history." Is there any way to bridge the gap between what the Web makes available and what we can find? "For well-known information," says Lawrence, "directories like Yahoo or engines like Google and DirectHit can be very useful." Subject directories, like Yahoo and Lycos, involve a more human touch in that sites are categorised into subjects by humans rather than being found and indexed by computers. However, the extra time involved in this process means that they score less than the automated search engines on percentage of the Web covered, and offer search methods that are no more refined than the normal search engines.
Specialist directories can provide other search options. Doras is probably the best-known Irish directory, where searchers can find Irish themed sites more easily than by using traditional search engines.
According to Lawrence and Giles's report in Nature, this approach could be beneficial in relation to scientific sites: "The high value of scientific information on the Web . . . suggest that an index of all the scientific information on the Web would be feasible and very valuable." Whatever the subject, specialised subject directories offer a filtered sample of what the Web has to offer and are usually quite reliable. Sites like About.com (formerly The Mining Company) and the WWW Virtual Library (www.vlib.org) are worthwhile for the reviewed topics they offer. Probably the best way to find information is to spread searches over several search engines, hedging the bet on which one will come up with what you want.
"For harder to find or recent information, it can be very useful to use several search engines, because the different engines tend to index different sets of pages and are updated at different intervals. This can be done automatically with meta-search engines like MetaCrawler, SavvySearch, and Copernic." Meta-search engines are programs that search several traditional search engines at once, increasing the chance of a valid hit. They include Web-based meta-searches like Dogpile (www.dogpile.com) and Ask Jeeves (www.askjeeves.com), and PC-based software which performs the same function, like Copernic (available for free download at www.copernic.com). The meta-engines can also be used to search through Usenet groups or newswires, and they generally offer a much wider view of the Web than that of any single search engine. The meta-engines are limited, however, to searching what the other search engines have indexed - still only 38 per cent of the Web.
Tschera Harkness offers probably the best advice of all on researching anything on the World Wide Web; "Know the strengths and weaknesses of the search engines one uses; use multiple search engines and use the Web as one of many complementary sources of information." Steve Lawrence uses a custom internal search engine developed at NEC to search the Web, but he is quick to admit that "search is far from a solved problem - new services will continue to appear, and improvements will be made by the new and existing services in terms of how they rank results, how much they index, and how quickly they index new or modified information."
As for the future of the search engine, Lawrence forecasts that "the use of popularity information, as with engines like Google and Direct Hit, will become more common, and search engines that personalise results for different users will become available."
fionnor@hotmail.com