In this paper, we propose an autoencoder-based missing data completion method for multi-channel acoustic scene classification (ASC). It has been reported that many deep-learning-based ASC methods using multi-channel signals have robust performance. The advantage of using multi-channel data is the capture of spatial and frequency information. However, when there is missing data in multi-channel signals, the classification performance declines significantly. We focus on completing the missing data by using an autoencoder as the preprocessor of ASC models. Since positional relationships between multi-channel microphones are modeled in the latent space of the proposed autoencoder, missing information is reconstructed via the latent space from the multi-channel input, including missing data. In an experiment, the missing data is completed by using the proposed autoencoder, and the accuracy of ASC systems is improved by using the completed multi-channel signals.
Companion
APSIPA Transactions on Signal and Information Processing Special Issue - Advanced Acoustic, Sound and Audio Processing Techniques and Their Applications
See the other articles that are part of this special issue.