now publishers - BASPRO: A Balanced Script Producer for Speech Corpus Collection Based on the Genetic Algorithm

APSIPA Transactions on Signal and Information Processing > Vol 12 > Issue 3

BASPRO: A Balanced Script Producer for Speech Corpus Collection Based on the Genetic Algorithm

Yu-Wen Chen, The Research Center for Information Technology Innovation, Academia Sinica, Taiwan and The Department of Computer Science, Columbia University, USA, Hsin-Min Wang, The Institute of Information Science, Academia Sinica, Taiwan, Yu Tsao, The Research Center for Information Technology Innovation, Academia Sinica and Jointly Appointed Professor with the Department of Electrical Engineering, Chung Yuan Christian University, Taiwan, yu.tsao@citi.sinica.edu.tw

Suggested Citation

Yu-Wen Chen, Hsin-Min Wang and Yu Tsao (2023), "BASPRO: A Balanced Script Producer for Speech Corpus Collection Based on the Genetic Algorithm", APSIPA Transactions on Signal and Information Processing: Vol. 12: No. 3, e15. http://dx.doi.org/10.1561/116.00000155

Publication Date: 25 Apr 2023

Subjects

Keywords

Corpus design, Mandarin Chinese speech corpus, phonetically balanced and rich corpus, recording script design, genetic algorithm

Journal details

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 1450 times

In this article:

Abstract

The performance of speech-processing models is heavily influenced by the speech corpus that is used for training and evaluation. In this study, we propose BAlanced Script PROducer (BASPRO) system, which can automatically construct a phonetically balanced and rich set of Chinese sentences for collecting Mandarin Chinese speech data. First, we used pretrained natural language processing systems to extract ten-character candidate sentences from a large corpus of Chinese news texts. Then, we applied a genetic algorithm-based method to select 20 phonetically balanced sentence sets, each containing 20 sentences, from the candidate sentences. Using BASPRO, we obtained a recording script called TMNews, which contains 400 ten-character sentences. TMNews covers 84% of the syllables used in the real world. Moreover, the syllable distribution has 0.96 cosine similarity to the real-world syllable distribution. We converted the script into a speech corpus using two text-to-speech systems. Using the designed speech corpus, we tested the performances of speech enhancement (SE) and automatic speech recognition (ASR), which are one of the most important regression- and classification-based speech processing tasks, respectively. The experimental results show that the SE and ASR models trained on the designed speech corpus outperform their counterparts trained on a randomly composed speech corpus.

DOI:10.1561/116.00000155

Related publications

Companion

APSIPA Transactions on Signal and Information Processing Special Issue - Advanced Acoustic, Sound and Audio Processing Techniques and Their Applications
See the other articles that are part of this special issue.

Introduction
The Proposed BASPRO System
Experiments
Conclusion
Appendix
References

BASPRO: A Balanced Script Producer for Speech Corpus Collection Based on the Genetic Algorithm

Share

Journal details

Abstract

Related publications