The Potential of an Audio-Based Automated Autism Screen: The Result of a Blind Test Using Third-Party Data

Thursday, May 15, 2014
Atrium Ballroom (Marriott Marquis Atlanta)
D. Xu1,2, B. Boyd3, J. A. Richards1 and J. Gilkerson1,2, (1)LENA Foundation, Boulder, CO, (2)Department of Speech, Language and Hearing Sciences, University of Colorado, Boulder, CO, (3)University of North Carolina at Chapel Hill, Chapel Hill, NC
Background:  Our previous research demonstrated the convenience of collecting naturalistic audio data for autism research using wearable recorders. The developed algorithm for automated data analysis has demonstrated reliability and validity. Naturalistic daylong recordings and automated algorithms capture the characteristic behaviors regarding deficits in different areas of development of children with autism, including social-emotional interaction, language and communication and stereotyped behavior. The efficiency and the effectiveness of this methodology make it a promising tool for autism screen. Our previous effort utilized 1363 in-house recordings with 106 typically developing children (TD), 49 children with language delay not related to autism (LD) and 71 children with autism (ASD) (mainly 15-48 months). The cross-validation relied on the “leave-one-out” simulation with around 90% equal-sensitivity/specificity.

Objectives:  Third-party data are desired for further analysis, validation and improvement. Questions are asked: if the performance can hold for a blind test with third-party data; if the behavior characteristics extracted from audio recordings can show consistency when applied to new data; and if any potential issues or improvements can be identified with blind third-party data. This study intends to answer the questions.

Methods:  Daylong audio recordings were collected using wearable LENA recorders. The automated algorithm detected key-child, adults and other environment sounds. The statistics in the sequence of sound categories in a child’s recording can reflect how the child interacts with the environment. Even the synchrony between the child and caregivers can be indicated by e.g. the co-vocalization rate between them. Human voice was further processed via phone recognition or sound clustering algorithms, providing frequencies of occurrence for phones, sound clusters and their sequences which are highly correlated with language phonetic and vocal development. Prosodic features such as duration, loudness and pitch are highly related to emotions and other behaviors. More than 100 features were analyzed and modeled to provide the risk score for autism using machine learning approaches. The algorithms were trained with the in-house data, and tested with the third-party data.

Results:  The third-party data were from three sources using the same type of recorders. Site-1-data had 59 daylong recordings from 31 children with autism (25-48 months); site-2-data had 125 recordings in preschool environments from 67 children with autism (36-68 months) and site-3-data had 115 daylong recordings from 40 children of typical development (11-22 months). Two methods were tested for autism risk. For Method-1 with the trained cutoff threshold, 88 among 98 ASD children were positive (90% sensitivity) and 38 among 40 TD children were negative (95% specificity); varying the threshold gave 95% equal-sensitivity/specificity. For Method-2 with the trained threshold, 84 among 98 ASD children were positive (86% sensitivity); 36 among 40 TD children were negative (90% specificity); and the equal-sensitivity/specificity was 90%.

Conclusions:  The test confirmed the performance of around 90% sensitivity/specificity with the third-party data, showing the great potential of the proposed method. The detailed features extracted from audio recordings are discussed with the relationship to autism screen and are compared among both the in-house data and the third-party data for further improvements.