now publishers - End-to-End Singing Transcription Based on CTC and HSMM Decoding with a Refined Score Representation

APSIPA Transactions on Signal and Information Processing > Vol 13 > Issue 5

End-to-End Singing Transcription Based on CTC and HSMM Decoding with a Refined Score Representation

Tengyu Deng, Kyoto University, Japan, Eita Nakamura, Kyushu University, Japan, nakamura@inf.kyushu-u.ac.jp , Ryo Nishikimi, NTT Communication Science Laboratories, Japan, Kazuyoshi Yoshii, Kyoto University, Japan

Suggested Citation

Tengyu Deng, Eita Nakamura, Ryo Nishikimi and Kazuyoshi Yoshii (2024), "End-to-End Singing Transcription Based on CTC and HSMM Decoding with a Refined Score Representation", APSIPA Transactions on Signal and Information Processing: Vol. 13: No. 5, e404. http://dx.doi.org/10.1561/116.20240016

Publication Date: 07 Oct 2024

Subjects

Deep learning, Audio signal processing, Statistical/Machine learning

Journal details

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 755 times

In this article:

Abstract

This paper describes an end-to-end automatic singing transcription (AST) method that translates a music audio signal containing a vocal part into a symbolic musical score of sung notes. A common approach to sequence-to-sequence learning for this problem is to use the connectionist temporal classification (CTC), where a target score is represented as a sequence of notes with discrete pitches and note values. However, if the note value of some note is incorrectly estimated, the score times of the following notes are estimated incorrectly and the metrical structure of the estimated score collapses. To solve this problem, we propose a refined score representation using metrical positions of note onsets. To decode a musical score from the output of a deep neural network (DNN), we use a hidden semi-Markov model (HSMM) that incorporates prior knowledge about musical scores and temporal fluctuation in human performance. We show that the proposed method achieves the state-of-the-art performance and confirm the efficacy of the refined score representation and the decoding method.

DOI:10.1561/116.20240016

Related publications

Companion

APSIPA Transactions on Signal and Information Processing Special Issue - Invited Papers from APSIPA ASC 2023
See the other articles that are part of this special issue.

Introduction
Musical Score Representations
Related Work
Proposed Method
Evaluation
Conclusion and Discussion
References

End-to-End Singing Transcription Based on CTC and HSMM Decoding with a Refined Score Representation

Share

Journal details

Abstract

Related publications