Crowdsourced Validation of a Machine Learning Classification System for Autism and ADHD

Friday, May 12, 2017: 12:00 PM-1:40 PM
Golden Gate Ballroom (Marriott Marquis Hotel)
M. Duda1, N. Haber2 and D. Wall3, (1)University of Michigan, Ann Arbor, MI, (2)Stanford University, Stanford, CA, (3)Stanford University, Palo Alto, CA
Background: Autism spectrum disorder (ASD) and attention deficit hyperactivity disorder (ADHD) together affect >10% of the children in the United States, but considerable behavioral overlaps between the two disorders can often complicate differential diagnosis. Currently, there is no screening test designed to differentiate between the two disorders, and with waiting times from initial suspicion to diagnosis upwards of a year, methods to quickly and accurately assess risk for these and other developmental disorders are desperately needed.

Objectives: Our goal was to improve our previously published classification system to distinguish ASD from ADHD using a small set of caregiver-directed behavioral questions. We aimed to train a model that generalized well to unseen crowd-collected data comprised of subjects of varying severity to create a robust classifier capable of making accurate risk predictions as a real-world mobile screening tool.

Methods: As part of a large crowdsourcing effort, we electronically collected responses to 15 behavioral features from parents of children with ASD (n = 248) or ADHD (n = 174) to use in conjunction with our archival data set (n = 2925). We trained/tested five machine learning models on different subsets of our data (archive/survey, survey/archive, archive/archive, survey/survey, mixed/mixed) to find the model that best generalized to both data sets. We used a nested grid search cross validation for parameter optimization, and to overcome class imbalance in our training and testing trials we performed 100 random subsamplings of the majority class.

Results: Due to the high variability in our crowdsourced survey data set, our classification accuracy was lower for the survey data set than for the archive data set. However, we found that two of our models (Elastic Net and Linear Discriminant Analysis) were especially robust in classifying the crowdsourced data, specifically when trained on a mixed set of archive and survey data, indicating that including clear ASD/ADHD examples in the training set improved the classification of more difficult cases. Our final models performed with AUC = 0.89 ± 0.01 using only 15 questions.

Conclusions: These results support the potential of creating a quick, accurate, and widely accessible method for differentiating risks between ASD and ADHD for use inside or outside of clinical settings. Our success in crowdsourcing indicates that mobile administration of this screening tool is possible, and would be well received by parents of children at risk. Furthermore, the simplicity of the approach would allow for real-time calculation of risk scores and rapid feedback to clinicians and/or parents. By combining this machine learning classifier with others, we hope to create a mobile screening system that has the specificity to pinpoint ASD, ADHD, and other developmental delays. Such a mobile screening platform would provide actionable information to parents in need, regardless of geographic location or socio-economic status. Moreover, the data captured by this mobile approach would supplement the standard clinical encounter, providing both clinicians with an in-depth assessment of the patient before the clinical visit, speeding intake and accelerating the delivery of therapy.