APSIPA Transactions on Signal and Information Processing > Vol 5 > Issue 1

Adaptive feature truncation to address acoustic mismatch in automatic recognition of children's speech

Shweta Ghai, Indian Institute of Technology Guwahati, India, shweta.ghai@emory.edu , Rohit Sinha, Indian Institute of Technology Guwahati, India
 
Suggested Citation
Shweta Ghai and Rohit Sinha (2016), "Adaptive feature truncation to address acoustic mismatch in automatic recognition of children's speech", APSIPA Transactions on Signal and Information Processing: Vol. 5: No. 1, e15. http://dx.doi.org/10.1017/ATSIP.2016.16

Publication Date: 09 Aug 2016
© 2016 Shweta Ghai and Rohit Sinha
 
Subjects
 
Keywords
Speech recognitionChildren's speechAcoustic mismatchMFCC featuresCepstral truncation
 

Share

Open Access

This is published under the terms of the Creative Commons Attribution licence.

Downloaded: 614 times

In this article:
I. INTRODUCTION 
II. MATERIAL AND METHODS 
III. ANALYSIS 
IV. EXPERIMENTAL RESULTS AND DISCUSSION 
V. CONCLUSIONS 

Abstract

An algorithm for adaptive Mel frequency cepstral coefficients (MFCC) feature truncation is proposed to improve automatic speech recognition (ASR) performance under acoustically mismatched conditions. Using the relationship found between MFCC base feature truncation and degree of acoustic mismatch of speech signals with respect to recognition models, the proposed algorithm performs utterance-specific MFCC feature truncation for test signals to address their acoustic mismatch in context of ASR. The proposed technique, without any prior knowledge about the speaker of the test utterance, gives 38% (on a connected-digit recognition task) and 36% (on a continuous speech recognition task) relative improvement over baseline in ASR performance for children's speech on models trained on adult speech, which is also found to be additive to improvements obtained with vocal tract length normalization and/or constrained maximum likelihood linear regression. The generality and effectiveness of the algorithm is also validated for automatic recognition of children's and adults' speech under matched and mismatched conditions.

DOI:10.1017/ATSIP.2016.16