By Doris Hoogeveen, University of Melbourne, Australia, dhoogeveen@student.unimelb.edu.au | Li Wang, Evernote, USA, li@liwang.info | Timothy Baldwin, University of Melbourne, Australia, tb@ldwin.net | Karin M. Verspoor, University of Melbourne, Australia, karin.verspoor@unimelb.edu.au
This survey presents an overview of information retrieval, natural language processing and machine learning research that makes use of forum data, including both discussion forums and community questionanswering (cQA) archives. The focus is on automated analysis, with the goal of gaining a better understanding of the data and its users. We discuss the different strategies used for both retrieval tasks (post retrieval, question retrieval, and answer retrieval) and classification tasks (post type classification, question classification, post quality assessment, subjectivity, and viewpoint classification) at the post level, as well as at the thread level (thread retrieval, solvedness and task orientation, discourse structure recovery and dialogue act tagging, QA-pair extraction, and thread summarisation). We also review work on forum users, including user satisfaction, expert finding, question recommendation and routing, and community analysis. The survey includes a brief history of forums, an overview of the different kinds of forums, a summary of publicly available datasets for forum research, and a short discussion on the evaluation of retrieval tasks using forum data. The aim is to give a broad overview of the different kinds of forum research, a summary of the methods that have been applied, some insights into successful strategies, and potential areas for future research.
Web Forum Retrieval and Text Analytics: A Survey presents an overview of information retrieval, natural language processing and machine learning research that makes use of forum data, including both discussion forums and community question-answering (cQA) archives. The focus is on automated analysis, with the goal of providing the reader with a better understanding of the data and its users. It discusses the different strategies used for both retrieval tasks (post retrieval, question retrieval, and answer retrieval) and classification tasks (post type classification, question classification, post quality assessment, subjectivity, and viewpoint classification) at the post level, as well as at the thread level (thread retrieval, solvedness and task orientation, discourse structure recovery and dialogue act tagging, QA-pair extraction, and thread summarisation). It also reviews work on forum users, including user satisfaction, expert finding, question recommendation and routing, and community analysis.
Web Forum Retrieval and Text Analytics: A Survey includes a brief history of forums, an overview of the different kinds of forums, a summary of publicly available datasets for forum research, and a short discussion on the evaluation of retrieval tasks using forum data. Covering 450 papers, it provides the reader with a broad overview of the different kinds of forum research, a summary of the methods that have been applied, some insights into successful strategies, and potential areas for future research.