Dysathria is a neuromotor disorder that causes the individual to speak with imprecise articulation. This paper presents an automatic analysis framework for dysarthric speech, using a linguistically motivated representation based on distinctive features. Our framework includes a seq2seq phonetic decoder for Cantonese dysarthric speech. The manually or automatically transcribed phones can be mapped into a representation that consists of 21 distinctive features (DF). The DFs between the transcribed phones and canonical phones are compared in order to identify articulatory error rate (AER) for each DF. This forms an AER profile for a given set of dysarthric recordings from a speaker. Experiments show that the difference between the AER profile derived from manual versus automatic phonetic transcription is relatively small – with a root mean squared error (RMSE) of 0.053 for the word-reading task and 0.085 for the sentence-reading task in CU DYS. In addition, the correlations between the AER profiles are high, at 0.97 and 0.95 for the two tasks respectively. These results reflect the viability of the proposed framework as an automated means of processing dysarthric speech to achieve articulatory analyses described by DFs. The AER profile is intuitive and interpretable, for pinpointing problem areas in articulation.
Companion
APSIPA Transactions on Signal and Information Processing Special Issue - Advanced Acoustic, Sound and Audio Processing Techniques and Their Applications
See the other articles that are part of this special issue.