APSIPA Transactions on Signal and Information Processing > Vol 13 > Issue 1

Neural Analog Filter for Sampling-Frequency-Independent Convolutional Layer

Kanami Imamura, The University of Tokyo, Japan AND The National Institute of Advanced Industrial Science and Technology (AIST), Japan, kanami_imamura@ieee.org , Tomohiko Nakamura, The National Institute of Advanced Industrial Science and Technology (AIST), Japan, Kohei Yatabe, Tokyo University of Agriculture and Technology, Japan, Hiroshi Saruwatari, The University of Tokyo, Japan
 
Suggested Citation
Kanami Imamura, Tomohiko Nakamura, Kohei Yatabe and Hiroshi Saruwatari (2024), "Neural Analog Filter for Sampling-Frequency-Independent Convolutional Layer", APSIPA Transactions on Signal and Information Processing: Vol. 13: No. 1, e28. http://dx.doi.org/10.1561/116.20230082

Publication Date: 05 Nov 2024
© 2024 K. Imamura, T. Nakamura, K. Yatabe and H. Saruwatari
 
Subjects
Deep learning,  Audio signal processing,  Source separation
 
Keywords
Audio source separationsampling-frequency-independent convolutional layerfourier feature mapping
 

Share

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 179 times

In this article:
Introduction 
Related Works 
Proposed NAF Representation 
Experimental Evaluation in Music Source Separation 
Experimental Evaluation in Speech Separation 
Conclusion 
References 

Abstract

A sampling-frequency-independent (SFI) convolutional layer is an extension of a convolutional layer to handle various sampling frequencies (SFs) with a single deep neural network (DNN). The SFI convolutional layer treats a standard convolutional layer as a collection of digital filters designed from analog filters. Analogous to the analog-to-digital filter conversion, it generates the weights from an SFI structure (latent analog filter) with respect to an input SF. To use the SFI layer, we need to define the mathematical forms of the latent analog filters before training. However, it is difficult to manually define the appropriate forms for an arbitrary network architecture. The inappropriate definition can result in the performance degradation. To overcome this problem, we propose a neural representation of analog filters, which can determine the forms of the latent analog filters in an end-to-end manner. The proposed method treats the latent analog filter as a function of continuous time or frequency and represents it using a DNN. Music source separation and speech separation experiments showed that the proposed method outperformed manually designed latent analog filters.

DOI:10.1561/116.20230082