By Cees G. M. Snoek, University of Amsterdam, The Netherlands, cgmsnoek@uva.nl | Marcel Worring, University of Amsterdam, The Netherlands, worring@uva.nl
In this paper, we review 300 references on video retrieval, indicating when text-only solutions are unsatisfactory and showing the promising alternatives which are in majority concept-based. Therefore, central to our discussion is the notion of a semantic concept: an objective linguistic description of an observable entity. Specifically, we present our view on how its automated detection, selection under uncertainty, and interactive usage might solve the major scientific problem for video retrieval: the semantic gap. To bridge the gap, we lay down the anatomy of a concept-based video search engine. We present a component-wise decomposition of such an interdisciplinary multimedia system, covering influences from information retrieval, computer vision, machine learning, and human–computer interaction. For each of the components we review state-of-the-art solutions in the literature, each having different characteristics and merits. Because of these differences, we cannot understand the progress in video retrieval without serious evaluation efforts such as carried out in the NIST TRECVID benchmark. We discuss its data, tasks, results, and the many derived community initiatives in creating annotations and baselines for repeatable experiments. We conclude with our perspective on future challenges and opportunities.
Concept-Based Video Retrieval reviews 300 references on video retrieval, indicating when the text-only solutions of present-day video search engines are unsatisfactory and showing the promising alternatives which are primarily concept-based. Central to the discussion, therefore, is the fundamental notion of a semantic concept: an objective linguistic description of an observable entity. The book aims to motivate and explain how automated detection, selection under uncertainty, and interactive usage might solve the major scientific problems for video retrieval: the semantic gap. In striving to bridge this gap, the authors structured their review by laying down the anatomy of a concept-based video search engine. They present a component-wise decomposition and evaluation of such an interdisciplinary multimedia system, covering influences from information retrieval, computer vision, machine learning, and human-computer interaction. Concept-Based Video Retrieval is aimed primarily at researchers and developers in the broad area of information retrieval. It will also be an invaluable reference for students in computer and information science at the (post)graduate level, as well as industrial practitioners