By ChengXiang Zhai, University of Illinois at Urbana-Champaign, USA, czhai@cs.uiuc.edu
Statistical language models have recently been successfully applied to many information retrieval problems. A great deal of recent work has shown that statistical language models not only lead to superior empirical performance, but also facilitate parameter tuning and open up possibilities for modeling nontraditional retrieval problems. In general, statistical language models provide a principled way of modeling various kinds of retrieval problems. The purpose of this survey is to systematically and critically review the existing work in applying statistical language models to information retrieval, summarize their contributions, and point out outstanding challenges.
Statistical Language Models for Information Retrieval systematically and critically reviews the existing work in applying statistical language models to information retrieval, summarizes their contributions, and points out outstanding challenges. Statistical language models have recently been successfully applied to many information retrieval problems. A great deal of recent work has shown that statistical language models not only lead to superior empirical performance, but also facilitate parameter tuning and open up possibilities for modeling non-traditional retrieval problems. In general, statistical language models provide a principled way of modeling various kinds of retrieval problems. Statistical Language Models for Information Retrieval reviews the development of this language modeling approach. It surveys a wide range of retrieval models based on language modeling and attempts to make connections between this new family of models and traditional retrieval models. It summarizes the progress made so far in these models and point out remaining challenges to be solved to further increase their impact. Statistical Language Models for Information Retrieval is written for readers who already have some basic knowledge about information retrieval. Some knowledge of probability and statistics such as the maximum likelihood estimator is helpful, but not a prerequisite to understanding the high-level discussion.