By Emmanuel Abbe, Princeton University, USA, eabbe@princeton.edu
The stochastic block model (SBM) is a random graph model with different group of vertices connecting differently. It is widely employed as a canonical model to study clustering and community detection, and provides a fertile ground to study the information-theoretic and computational tradeoffs that arise in combinatorial statistics and more generally data science. This monograph surveys the recent developments that establish the fundamental limits for community detection in the SBM, both with respect to information-theoretic and computational tradeoffs, and for various recovery requirements such as exact, partial and weak recovery. The main results discussed are the phase transitions for exact recovery at the Chernoff-Hellinger threshold, the phase transition for weak recovery at the Kesten-Stigum threshold, the optimal SNR-mutual information tradeoff for partial recovery, and the gap between information-theoretic and computational thresholds.
The field of community detection has been expanding greatly since the 1980s, with a remarkable diversity of models and algorithms developed in different communities like machine learning, computer science, network science, social science, and statistical physics. Various fundamental questions remain nonetheless unsettled, such as: Are there really communities? Algorithms may output community structures, but are these meaningful or artefacts? Can we always extract the communities when they are present; fully, partially? And what is a good benchmark to measure the performance of algorithms, and how good are the current algorithms?
This monograph describes recent developments aiming at answering these questions in the context of block models. Addressing the issues from an information-theoretic view-point, the author gives a comprehensive description of the historical and recent work that has led to key new concepts in the various recovery requirements for community detection.
The monograph provides a compact introduction to community detection, which enables the reader to apply these techniques in applications such as understanding sociological behavior, protein to protein interactions; gene expressions; recommendation systems; medical prognosis; DNA 3D folding; image segmentation, natural language processing, product-customer segmentation, webpage sorting, and many more.
Update | 0100000067_Update20190617.pdf
In this updated version, the author has added some references, corrected some equations and made small edits to the text.