APSIPA Transactions on Signal and Information Processing > Vol 7 > Issue 1

Chord-aware automatic music transcription based on hierarchical Bayesian integration of acoustic and language models

Yuta Ojima, Kyoto University, Japan, Eita Nakamura, Kyoto University, Japan, Katsutoshi Itoyama, Kyoto University, Japan, Kazuyoshi Yoshii, Kyoto University, Japan, yoshii@kuis.kyoto-u.ac.jp
 
Suggested Citation
Yuta Ojima, Eita Nakamura, Katsutoshi Itoyama and Kazuyoshi Yoshii (2018), "Chord-aware automatic music transcription based on hierarchical Bayesian integration of acoustic and language models", APSIPA Transactions on Signal and Information Processing: Vol. 7: No. 1, e14. http://dx.doi.org/10.1017/ATSIP.2018.17

Publication Date: 22 Oct 2018
© 2018 Yuta Ojima, Eita Nakamura, Katsutoshi Itoyama and Kazuyoshi Yoshii
 
Subjects
 
Keywords
Automatic Music TranscriptionChord EstimationNon-negative Matrix FactorizationBayesian Inference
 

Share

Open Access

This is published under the terms of the Creative Commons Attribution licence.

Downloaded: 1373 times

In this article:
I. INTRODUCTION 
II. RELATED WORK 
III. GENERATIVE MODELING 
IV. POSTERIOR INFERENCE 
V. EVALUATION 
VI. CONCLUSION 

Abstract

This paper describes automatic music transcription with chord estimation for music audio signals. We focus on the fact that concurrent structures of musical notes such as chords form the basis of harmony and are considered for music composition. Since chords and musical notes are deeply linked with each other, we propose joint pitch and chord estimation based on a Bayesian hierarchical model that consists of an acoustic model representing the generative process of a spectrogram and a language model representing the generative process of a piano roll. The acoustic model is formulated as a variant of non-negative matrix factorization that has binary variables indicating a piano roll. The language model is formulated as a hidden Markov model that has chord labels as the latent variables and emits a piano roll. The sequential dependency of a piano roll can be represented in the language model. Both models are integrated through a piano roll in a hierarchical Bayesian manner. All the latent variables and parameters are estimated using Gibbs sampling. The experimental results showed the great potential of the proposed method for unified music transcription and grammar induction.

DOI:10.1017/ATSIP.2018.17