BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:Query-Driven Indexing in Large-Scale Distributed Systems
DTSTART:20090130T170000
DTSTAMP:20260406T172832Z
UID:b25c03b916883af26009977a4828d4f8a97d675d6bbeb7ec00b26042
CATEGORIES:Thesis defenses
DESCRIPTION:Gleb Skobeltsyn\nEfficient and effective search in large-scale
  data repositories requires complex indexing solutions deployed on a large
  number of servers. Web search engines such as Google and Yahoo! already r
 ely upon complex systems to be able to return relevant query results and k
 eep processing times within the comfortable sub-second limit. Nevertheless
 \, the exponential growth of the amount of content on the Web poses seriou
 s challenges with respect to scalability. Coping with these challenges req
 uires novel indexing solutions that not only remain scalable but also pres
 erve the search accuracy.\nIn this thesis we introduce and explore the con
 cept of query-driven indexing -- an index construction strategy that uses 
 caching techniques to adapt to the querying patterns expressed by users. W
 e suggest to abandon the strict difference between indexing and caching\, 
 and to build a distributed indexing structure\, or a distributed cache\, s
 uch that it is optimized for the current query load.\nOur experimental and
  theoretical analysis shows that employing query-driven indexing is especi
 ally beneficial when the content is (geographically) distributed in a Peer
 -to-Peer network. In such a setting extensive bandwidth consumption has be
 en identified as one of the major obstacles for efficient large-scale sear
 ch. Our indexing mechanisms combat this problem by maintaining the query p
 opularity statistics and by indexing (caching) intermediate query results 
 that are requested frequently. We present several indexing strategies for 
 processing multi-keyword and XPath queries over distributed collections of
  textual and XML documents respectively. Experimental evaluations show sig
 nificant overall traffic reduction compared to the state-of-the-art approa
 ches.\nWe also study possible query-driven optimizations for Web search en
 gine architectures. Contrary to the Peer-to-Peer setting\, Web search engi
 nes use centralized caching of query results to reduce the processing load
  on the main index. We analyze real search engine query logs and show that
  the changes in query traffic that such a results cache induces fundamenta
 lly affect indexing performance. In particular\, we study its impact on in
 dex pruning efficiency. We show that combination of both techniques enable
 s efficient reduction of the query processing costs and thus is practical 
 to use in Web search engines
LOCATION:BC 410 https://plan.epfl.ch/?room==BC%20410
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR
