Voice Biometrics: Technology, trust and security
2: Intelligent Voice Ltd, CNRS-SAMOVAR, Institut Mines-Telecom, Evry, France
Voice biometrics are being implemented globally in large scale applications such as remote banking, government e-services, transportation and building security access, autonomous vehicles, and healthcare. They have been integrated in numerous apps, often coupled with face biometrics and artificial intelligence methods. Voice biometrics products and solutions must meet three key re-quirements for the success in their deployment: they must be highly trustable regarding privacy protection; easy to use and always be available. This edited book presents the state of the art in voice biometrics research and technologies including implementation and deployment challenges in terms of interoperability, scalability and performance, and security. The team of editors and chapter authors combine a wealth of expertise from academia and the industry. Topics covered include the fundamentals of voice biometrics; design of countermeasures for replay attack; attacker's perspective for voice biometrics; voice biometrics; speaker de-identification; performance evaluation of voice biometrics solutions; standardization of voice biometrics technology; industry perspectives; joining forces of voice and facial biometrics; and future trends and challenges in voice biometrics. Providing comprehensive coverage of the field of voice biometrics, this authoritative volume will be of great interest to researchers, scientists, engineers, practitioners and advanced students involved in the fields of security, biometrics, forensic sciences, human computer interaction, speech processing, acoustics, multimedia, pattern recognition, and privacy-preserving, digital signal processing and speech technologies. It will also be of interest to researchers and professionals working in law and criminology.
Inspec keywords: speaker recognition; speech processing; biometrics (access control); data privacy; learning (artificial intelligence)
Other keywords: speech processing; neural nets; computer vision; voice biometrics; speaker recognition; data privacy; cloud computing; learning (artificial intelligence); speech synthesis; biometrics (access control); Gaussian processes
Subjects: General and management topics; Speech recognition and synthesis; Data security; Neural nets; General electrical engineering topics; Speech processing techniques
- Book DOI: 10.1049/PBSE012E
- Chapter DOI: 10.1049/PBSE012E
- ISBN: 9781785619007
- e-ISBN: 9781785619014
- Page count: 267
- Format: PDF
-
Front Matter
- + Show details - Hide details
-
p.
(1)
-
1 Introduction
- + Show details - Hide details
-
p.
1
–5
(5)
The aim of the book is about presenting the reader, whether a student, an engineer, an entrepreneur, a person interested or working in biometrics, the state of the art in voice biometrics research and technology. Nevertheless, we will not only talk about research and technology. Currently, biometrics is a well-established term used not only in the academic environment but also by the general public. Biometric systems, including voice biometric ones, are already implemented in applications of massive use. We are talking about mature technologies that allow their integration in products and solutions because they meet three key requirements necessary for the success in the deployment: highly trustable regarding privacy protection issues, easy to use (ergonomics issues are key factors in their design), and always available (readability). In the book we will also deal with these and other aspects of implementation and deployment (for instance, interoperability and scalability) to which we must pay as much attention as to the performance of the biometric recognition algorithm itself.
-
2 Fundamentals of voice biometrics: classical and machine learning approaches learning approaches
- + Show details - Hide details
-
p.
7
–37
(31)
In this chapter, the main state-of-the-art research approaches to speaker recognition will be described. Techniques go from the first successful Gaussian mixture model (GMM)-universal background model (UBM) (late 1990s) and the well-established i-vectors (since mid-2000s), until the recent introduction of deep learning approaches, which have revolutionized this field and many others such as computer vision or speech processing in general. We will focus on the use of deep neural networks (DNNs) and different architectures as feature extractors, as well as their use to replace other modules in traditional systems such as the computation of posterior probabilities or sufficient statistic estimation (instead of the UBM), ending with the most recent trend to develop end-to-end systems based on deep learning techniques. This way, the evolution of the automatic systems for speaker recognition will be reviewed. We will highlight the difficulties intrinsic to the task of disentangling the speaker information from the rest of nuisance variability contained in the speech signal, and how automatic systems have been designed to deal with it. We will also present different approaches to both text-dependent and text-independent speaker recognition and the importance of obtaining calibrated outputs for the systems.
-
3 Voice biometrics: attackers perspective
- + Show details - Hide details
-
p.
39
–65
(27)
Voice biometric systems, also known as automatic speaker verification (ASV) systems, adopt specialized strategies to authenticate enrolled speakers by means of their claimed identities. In this context, many countermeasures against various spoofing attacks have been proposed during three recent challenge campaigns, namely ASVspoof 2015, ASVspoof 2017, and ASVspoof 2019. Nevertheless, boosting the resiliency of ASV systems just by focusing on the development of countermeasures for anti-spoofing is not enough. To that effect, considering the attackers' perspectives has become definitively crucial. In particular, there have been numerous possibilities to attack ASV systems, and hence, their security can be boosted if the attackers' perspectives are taken into account beforehand, uncovering possible vulnerabilities and loopholes. Thus, this chapter intends to provide insights and understanding into the attackers' possible perspectives, potentially helping in the identification of hidden ASV systems' weaknesses. We present details on different attacks based on the extent of attackers' accessibility to the ASV systems, considering direct and indirect attempts. Apart from the attacks, the threats due to unprotected speech corpora are also discussed. Unprotected speech corpora and ASV systems enable to search for information about a speaker on the Internet, and in this context, privacy-preserving techniques can prevent attackers from getting enrolled speakers' information. Consequently, this chapter additionally discusses various technological challenges occasionally faced by attackers, allowing for their positive exploration to come up with better defense mechanisms.
-
4 Voice biometrics: privacy in paralinguistic and extralinguistic tasks for health applications
- + Show details - Hide details
-
p.
67
–92
(26)
The widespread use of cloud computing applications has created a society-wide debate on how user privacy is handled by online service providers. Regulations such as the European Union's General Data Protection Regulation have put forward restrictions on how such services are allowed to handle user data. The field of privacy-preserving machine learning (PPML) is a response to this issue that aims to develop secure classifiers for remote prediction, where both the client's data and the server's model are kept private. This is particularly relevant in the case of speech and concerns not only the linguistic contents but also the paralinguistic and extralinguistic information that may be extracted from the speech signal. In this chapter, we provide a brief overview of the current state of the art in paralinguistic and extralinguistic tasks for a major application area in terms of privacy concerns - health, along with an introduction to cryptographic methods commonly used in PPML. These will lay the groundwork for the review of the state of the art of privacy in paralinguistic and extralinguistic tasks for health applications. With this chapter we hope to raise awareness to the problem of preserving privacy in this type of tasks and provide an initial background for those who aim to contribute to this topic.
-
5 Voice privacy in biometrics: speaker de-identification
- + Show details - Hide details
-
p.
93
–120
(28)
Speech is becoming an important way of interaction with technologies such as intelligent cars, banking, and mobile phones. Some of these applications imply privacy and security issues: in e-health applications, privacy is important for the users; transmitting speech via the Internet can allow undesired users to impersonate us using voice conversion, speech synthesis technologies, etc. This creates the need to remove the identity of the speaker from the speech recordings. De-identification is a process by which a data custodian alters or removes an individual's identifying information from a dataset, making it harder for users of the data to determine the identity of the data subjects while allowing for data reuse. In the case of speech, it consists of removing the information about an individual's identity from the speech signal, but preserving other features of interest that are present in the signal such as the message and speaker state. This chapter presents the main research challenges for speaker de-identification. In addition, a comparison of state-of-the-art techniques is performed in a common experimental framework.
-
6 Performance evaluation of voice biometrics solutions
- + Show details - Hide details
-
p.
121
–138
(18)
This chapter is dedicated to performance evaluation of voice biometrics solutions. The specific aspects of voice biometrics, compared to other speech-based technologies, are presented, as well as their consequences in terms of performance evaluation. The main existing evaluation protocols and metrics are then presented and discussed, with a focus on the speaker verification part of a voice biometrics systems. Other aspects such as calibration, diarization and forensic or privacy are also introduced. Finally, some limits of performance evaluation and some guidelines are proposed.
-
7 Voice biometrics: how the technology is standardized
- + Show details - Hide details
-
p.
139
–162
(24)
This chapter reports on biometric standardization projects that are relevant to speaker recognition. The main focus is placed on activities within the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Both host the Joint Technical Committee (JTC1), which is composed of subcommittees (SCs). The Biometrics subcommittee, ISO/IEC JTC1/SC37, develops biometric standards since June 2002. Its goal is to ensure a focused and comprehensive worldwide approach for rapid development and approval of formal international biometric standards. While each biometric field (e.g., voice, face, and iris recognition) represents communities of their own with different best practices, SC37 harmonizes among the biometric communities. This encompasses, among others, a general system design, a concise biometric vocabulary, performance testing and reporting, detection of presentation attacks, data protection of biometric information, as well as interoperable interfaces and data interchange formats. This chapter discusses on de facto best practices in speaker recognition in the light of biometrics standards. Even so the first is driven by different fundamentals in performance assessment and signal processing to bridge gaps appears promising to enable non-experts to assess different biometric modalities with a common method, e.g., security risks, performance ranking, interoperability across vendors, data privacy, and consumer protection.
-
8 Voice biometrics: perspective from the industry
- + Show details - Hide details
-
p.
163
–185
(23)
This chapter includes contributions focusing on industry-related aspects of voice biometrics coming from a selection of companies that are actively deploying this technology. Each contribution has been written by a team belonging to a different company: LumenVox, Nuance Communications, and Oxford Wave Research, respectively: 1. Automated self-service password reset application 2. Testing related to commercial deployments 3. Forensic automatic speaker recognition.
-
9 Joining forces of voice and facial biometrics: a case study in the scope of NIST SRE19
- + Show details - Hide details
-
p.
187
–217
(31)
While other chapters of this book are devoted to voice biometrics, in this chapter we present an example of joining forces of voice biometrics with other modalities. The focus of this chapter is the combination (fusion) of voice and facial biometrics, also called audiovisual biometrics. It is well known that multi-biometric systems have the advantage of improving the accuracy over single systems, providing increased security, and making spoofing attacks more difficult. Regarding voice biometrics, combining voice and facial biometrics provides specific advantages that can be exploited in various manners. Combining voice and facial biometrics has the unique advantage that they can be acquired simultaneously, without providing an additional burden for the user. Of course, besides the microphone, a camera is needed. With the increasing use of videos, it becomes more and more natural to treat videos (with sequence of images and sound). Therefore, combining voice and facial biometrics is an interesting combination in order to provide better biometric performance while making imposture more difficult. As an example, we can refer to the latest US National Institute of Standards and Technology (NIST) 2019 speaker recognition evaluation (SRE'19) challenge that besides the usual speaker recognition challenge included for the first time an audiovisual challenge. Telecom SudParis (TSP) participated in this challenge. This chapter provides a description of the TSP speaker and face recognition systems as well as the advantages obtained by combining voice and facial biometrics.
-
10 Voice biometrics: future trends and challenges ahead
- + Show details - Hide details
-
p.
219
–226
(8)
Voice has become woven into the fabric of everyday human-computer interactions via ubiquitous assistants like Siri, Alexa, Google, Bixby, Viv, etc. The use of voice will only accelerate as speech interfaces move to wearables, vehicles, and IoT devices and appliances. As a person's voice is used more to control real-world actions and access private information, the role of voice biometrics will play an increasingly important role in protecting sensitive access and actions and providing personalization for services and devices. This widespread use will bring many new application opportunities as well as challenges in addressing societal privacy concerns, securing systems against sophisticated attacks, and continuously improving voice biometric's reliability in more diverse acoustic environments. In this chapter we will present our assessment of future trends in these important areas.
-
Back Matter
- + Show details - Hide details
-
p.
(1)