Creaky Voice in Adolescents with Autism Spectrum Disorder: An Acoustic, Quantitative Analysis

Poster Presentation
Saturday, May 4, 2019: 11:30 AM-1:30 PM
Room: 710 (Palais des congres de Montreal)
E. Weed1, R. Fusaroli1, J. Mayo2 and I. M. Eigsti3, (1)Aarhus University, Aarhus, Denmark, (2)University of Connecticut, West Hartford, CT, (3)Psychological Sciences, University of Connecticut, Storrs, CT
Background: People with Autism Spectrum Disorder (ASD) have often been described as having unusual prosodic (e.g., "robotic", "flat", "monotone"), and vocal (e.g., "harsh", "nasal" and "hoarse") qualities to their speech, but there is little consensus on exactly how their speech differs from typical speech (McCann & Peppé, 2003). There is growing interest in using acoustic measures of speech to quantify these subjective impressions (Fusaroli et al., 2016), but nearly all studies to date examine traditional aspects of prosody, such as pitch and rhythm, and ignore voice quality.

Objectives: To investigate the accuracy, sensitivity, and specificity of acoustic measures of prosody in combination with measures of voice quality for diagnostic classification.

Methods: We analyzed speech data (8 sentences per participant) from 15 adolescents diagnosed with ASD (mean age = 14.4 years, SD = 1.48) with IQ scores in the typical range, and 15 adolescents with typical development (TD; mean age = 14.1 years, SD = 1.91); groups did not differ on chronological age or full-scale IQ. Participants in both the ASD and the TD groups demonstrated average to high average performance on standardized language measures (see Mayo, 2015, for details). We extracted acoustic features from the audio files using the Covarep(Degottex et al., 2014) toolbox for Matlab and custom Praat (Boersma & Weenink, 2001) scripts, and computed RQA (recurrence quantification analysis, a measure of temporal dynamics) features for voice creak using the R nonlinearTSeriespackage (Garcia, 2015). We built two logistic regression models. Model 1 incorporated acoustic measures of prosody previously employed in the literature (Fusaroli et al., 2016), as predictors of diagnosis: mean F0, standard deviation of F0, pause duration, and speech rate. Model 2 included the above, as well as measures of voice creak: mean creak, SD of creak, and a recurrence measure of creak (RATIO). All models were 10-fold cross-validated, and reported statistics are averaged over 1000 iterations.

Results: Model 1 (prosodic measures only) had an accuracy of 0.61 (CI: 0.60, 0.62), sensitivity of 0.57 (CI: 0.55, 0.59), and specificity of 0.66 (CI: 0.64, 0.67). Model 2 (prosodic and voice measures) had an accuracy of 0.71 (0.70, 0.72), sensitivity of 0.68 (CI: 0.66, 0.70) and specificity of 0.74 (CI: 0.72, 0.75). Adding the voice measures improved the model, even when taking the increased complexity of the model into consideration.

Conclusions: Qualitative descriptions of the speech of people with ASD often allude to characteristics that include both prosodic and voice-quality aspects, but quantitative studies to date focus on measures of prosody. Human raters are still substantially more effective at diagnostic classification on the basis of speech: expert clinicians displayed sensitivity of .86 and specificity of .86 in classifying these samples (Eigsti, Mayo and Simmons, INSAR 2016). However, it remains unclear which acoustic features clinicians base their judgments on. Our results extend previous findings, showing that measures of voice creak improve acoustic diagnostic models. In future work, we aim to systematically combine voice quality and prosody measures with the aim of informing speech language therapy and intervention.