Djoerd Hiemstra (2015-03-31 13:45 - 14:30 in ZI-2042)
When Google founders Sergey Brin and Larry Page wrote their seminal paper “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, they added an appendix about the scalability of Google in which they argued that Google's scalability is limited by their choice for a single, centralized index. A truly scalable solution would require a drastic redesign. They write the following:
“Of course a distributed systems like Gloss or Harvest will often be the most efficient and elegant technical solution for indexing, but it seems difficult to convince the world to use these systems because of the high administration costs of setting up large numbers of installations. Of course, it is quite likely that reducing the administration cost drastically is possible. If that happens, and everyone starts running a distributed indexing system, searching would certainly improve drastically.” (Brin and Page 1998)
In this presentation, I review such drastic redesigns of web search engines by discussing important challenges of distributed search systems: resources selection, vertical selection, and results merging. I will discuss results from more than 30 research groups that used the evaluation data of the TREC Federated Web Search track, that was organized by the University of Twente in cooperation with Ghent University, Yahoo Reseach, and the USA National Institute of Standards and Technology (NIST).