APSIPA Transactions on Signal and Information Processing > Vol 13 > Issue 1

Speech Privacy-preserving Methods Using Secret Key for Convolutional Neural Network Models and Their Robustness Evaluation

Niwa Shoko, Tokyo Metropolitan University, Japan, Sayaka Shiota, Tokyo Metropolitan University, Japan, sayaka@tmu.ac.jp , Hitoshi Kiya, Tokyo Metropolitan University, Japan
 
Suggested Citation
Niwa Shoko, Sayaka Shiota and Hitoshi Kiya (2024), "Speech Privacy-preserving Methods Using Secret Key for Convolutional Neural Network Models and Their Robustness Evaluation", APSIPA Transactions on Signal and Information Processing: Vol. 13: No. 1, e26. http://dx.doi.org/10.1561/116.20240020

Publication Date: 23 Oct 2024
© 2024 N. Shoko, S. Shiota and H. Kiya
 
Subjects
Deep learning,  Access control,  Application security,  Artificial intelligence methods in security and privacy,  Privacy-preserving systems,  Audio signal processing,  Speech and spoken language processing
 
Keywords
Privacy-preservingwaveform encryptionspectrogram encryptionsecret key
 

Share

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 108 times

In this article:
Introduction 
Privacy-preserving Scenario 
Proposed Methods 
Experiment 
Conclusion 
References 

Abstract

In this paper, we propose privacy-preserving methods with a secret key for convolutional neural network (CNN)-based models in speech processing tasks. In environments where untrusted third parties, like cloud servers, provide CNN-based systems, ensuring the privacy of speech queries becomes essential. This paper proposes encryption methods for speech queries using secret keys and a model structure that allows for encrypted queries to be accepted without decryption. Our approach introduces three types of secret keys: Shuffling, Flipping, and random orthogonal matrix (ROM). In experiments, we demonstrate that when the proposed methods are used with the correct key, identification performance did not degrade. Conversely, when an incorrect key is used, the performance significantly decreased. Particularly, with the use of ROM, we show that even with a relatively small key space, high privacy-preserving performance can be maintained many speech processing tasks. Furthermore, we also demonstrate the difficulty of recovering original speech from encrypted queries in various robustness evaluations.

DOI:10.1561/116.20240020