Speaker recognition aims to recognize the identity of the speaking person. After decades of research, current speaker recognition systems have achieved rather satisfactory performance, and have been deployed in a wide range of practical applications. However, a massive amount of evidence shows that these systems are susceptible to malicious fake actions in real applications. To address this issue, the research community has been responding with dedicated countermeasures which aim to defend against fake actions. Recently, there are several reviews and surveys reported in the literature that describe the current state-of-the-art research advancements. Even so, these reviews and surveys are generally based on a canonical taxonomy to categorize spoofing attacks and corresponding countermeasures from the technology-oriented perspective. This paper provides a new taxonomy from the application-oriented perspective and extends to two major fake forms: spoofing attack and disguise cheating. This taxonomy starts from the applications of speaker recognition technology, e.g., access control, surveillance and forensic, and then rezones two fake forms according to different application scenarios: one is spoofing attack that imitates the voice of an authorized speaker to get access to the target system; the other one is disguise cheating that makes someone unrecognizable by altering his/her voice. Furthermore, for each fake form, more delicate categories and related countermeasures are presented. Finally, this paper discusses future research directions in this area and suggests that the research community should not only focus on the technical view but also connect with application scenarios.
Companion
APSIPA Transactions on Signal and Information Processing Special Issue - Multi-Disciplinary Dis/Misinformation Analysis and Countermeasures: Articles Overview
See the other articles that are part of this special issue.