APSIPA Transactions on Signal and Information Processing > Vol 1 > Issue 1

Latent acoustic topic models for unstructured audio classification

Samuel Kim, 3710 S. McClintock Ave, U.S.A, worshipersam@gmail.com , Panayiotis Georgiou, 3710 S. McClintock Ave, U.S.A, Shrikanth Narayanan, 3710 S. McClintock Ave, U.S.A
 
Suggested Citation
Samuel Kim, Panayiotis Georgiou and Shrikanth Narayanan (2012), "Latent acoustic topic models for unstructured audio classification", APSIPA Transactions on Signal and Information Processing: Vol. 1: No. 1, e6. http://dx.doi.org/10.1017/ATSIP.2012.7

Publication Date: 10 Dec 2012
© 2012 Samuel Kim, Panayiotis Georgiou and Shrikanth Narayanan
 
Subjects
 
Keywords
Unstructured AudioLatent topic modelsAudio information retrievaltext-like audio signal processingAcoustic topic models
 

Share

Open Access

This is published under the terms of the Creative Commons Attribution licence.

Downloaded: 1313 times

In this article:
I. INTRODUCTION 
II. LATENT ACOUSTIC TOPIC MODEL 
III. AUDIO TAG CLASSIFICATION 
IV. EXPERIMENTS AND DISCUSSION 
V. CONCLUDING REMARKS 

Abstract

We propose the notion of latent acoustic topics to capture contextual information embedded within a collection of audio signals. The central idea is to learn a probability distribution over a set of latent topics of a given audio clip in an unsupervised manner, assuming that there exist latent acoustic topics and each audio clip can be described in terms of those latent acoustic topics. In this regard, we use the latent Dirichlet allocation (LDA) to implement the acoustic topic models over elemental acoustic units, referred as acoustic words, and perform text-like audio signal processing. Experiments on audio tag classification with the BBC sound effects library demonstrate the usefulness of the proposed latent audio context modeling schemes. In particular, the proposed method is shown to be superior to other latent structure analysis methods, such as latent semantic analysis and probabilistic latent semantic analysis. We also demonstrate that topic models can be used as complementary features to content-based features and offer about 9% relative improvement in audio classification when combined with the traditional Gaussian mixture model (GMM)–Support Vector Machine (SVM) technique.

DOI:10.1017/ATSIP.2012.7