APSIPA Transactions on Signal and Information Processing > Vol 4 > Issue 1

Optimized wavelet-domain filtering under noisy and reverberant conditions

Randy Gomez, Honda Research Institute Co., Ltd., Japan, r.gomez@jp.honda-ri.com , Tatsuya Kawahara, Kyoto University, Japan, Kazuhrio Nakadai, Honda Research Institute Co., Ltd., Japan
 
Suggested Citation
Randy Gomez, Tatsuya Kawahara and Kazuhrio Nakadai (2015), "Optimized wavelet-domain filtering under noisy and reverberant conditions", APSIPA Transactions on Signal and Information Processing: Vol. 4: No. 1, e3. http://dx.doi.org/10.1017/ATSIP.2015.5

Publication Date: 27 Jul 2015
© 2015 Randy Gomez, Tatsuya Kawahara and Kazuhrio Nakadai
 
Subjects
 
Keywords
Automatic speech recognitionDereverberationRobustness
 

Share

Open Access

This is published under the terms of the Creative Commons Attribution licence.

Downloaded: 1110 times

In this article:
I. INTRODUCTION 
II. ENHANCEMENT CONCEPT 
III. WAVELET OPTIMIZATION FOR ENHANCEMENT 
IV. IDENTIFYING NOISE PROFILE AND REVERBERATION TIME 
V. EXPERIMENTAL EVALUATIONS 
VI. CONCLUSION 

Abstract

The paper addresses a robust wavelet-based speech enhancement for automatic speech recognition in reverberant and noisy conditions. We propose a novel scheme in improving the speech, late reflection, and noise power estimates from the observed contaminated signal. The improved estimates are used to calculate the Wiener gain in filtering the late reflections and additive noise. In the proposed scheme, optimization of the wavelet family and its parameters is conducted using an acoustic model (AM). In the offline mode, the optimal wavelet family is selected separately for the speech, late reflections, and background noise based on the AM likelihood. Then, the parameters of the selected wavelet family are optimized specifically for each signal subspace. As a result we can use a wavelet sensitive to the speech, late reflection, and the additive noise, which can independently and accurately estimate these signals directly from an observed contaminated signal. For speech recognition, the most suitable wavelet is identified from the pre-stored wavelets, and wavelet-domain filtering is conducted to the noisy and reverberant speech signal. Experimental evaluations using real reverberant data demonstrate the effectiveness and robustness of the proposed method.

DOI:10.1017/ATSIP.2015.5