Statistical Language Modeling for Information Access
Presenters: Maarten de Rijke, Edgar Meij
About a decade ago, statistical language models were introduced in information retrieval (IR). They have proved to be an attractive and effective framework for IR, partly due to their well-defined statistical properties which are built on solid theoretical foundations. Since their introduction to IR, many new models, techniques, and applications have emerged. Language models have, for example, been effectively applied in tasks such as question answering, cross-lingual IR, expert finding, retrieving semi-structured information, topic tracking and detection. Other developments have included leveraging various sources of information, such as feedback documents, thesauri, external corpora, and syntactical clues for the estimation of a query model.
The purpose of this tutorial is to systematically explain the use of statistical language models in information retrieval with an emphasis on the underlying principles and framework, empirically effective models, as well as language models developed for a broad range of retrieval tasks, both traditional and non-traditional, including semi-structured document retrieval, expert finding, question answering, cross-language IR, blog retrieval, topic detection and tracking. Participants can expect to learn the major principles and methods of applying statistical language models to a range of information retrieval tasks, outstanding problems in this area, and to obtain comprehensive pointers to the research literature and available toolkits.
No background in information retrieval is required, but some basic familiarity with statistics and data-driven approaches to language processing is assumed.



