19745
Machine Learning and Autism Diagnostics: Promises and Potential Pitfalls

Friday, May 15, 2015: 11:30 AM-1:30 PM
Imperial Ballroom (Grand America Hotel)
D. K. Bone1, M. S. Goodwin2, M. P. Black3, C. C. Lee4, K. Audhkhasi1 and S. Narayanan1, (1)Signal Analysis and Interpretation Lab (SAIL), University of Southern California, Los Angeles, CA, (2)Northeastern University, Boston, MA, (3)Information Sciences Institute (ISI), University of Southern California, Marina del Rey, CA, (4)Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan
Background: Computational methods have immense potential to create novel findings in various target domains. Machine learning is increasingly utilized in autism research, with applications ranging from neurogenetic etiology to population stratification. One evident function of machine learning methods is in creating objective, efficient, robust diagnostic algorithms based on manual phenotypic coding instruments such as the Autism Diagnostic Observation Schedule (ADOS) or Autism Diagnostic Interview Revised (ADI-R), thereby eliminating a certain degree of subjectivity in algorithm generation. However, application of machine learning (as with statistical techniques in general) presents the potential for misuse and misinterpretation. 

Objectives:  The objectives of this work are two-fold. First, building on our previous work (Bone et al. in JADD, 2014), we critically evaluate two recent studies that claim to drastically reduce time to diagnose autism via machine learning applied to the ADOS (Wall et al. in Translational Psychiatry 2(4):e100, 2012) and the ADI-R (Wall et al. in PloS One 7(8), 2012), supporting our arguments with empirical studies. Second, we aim to generate robust algorithms for Best Clinical Estimate (BCE) diagnosis through machine learning fusion of multiple diagnostic instruments (i.e., ADOS and ADI-R) across large, independent datasets.

Methods:  Our experiments are conducted using three expansive corpora containing in total approximately 3000 ADOS and 3000 ADI-R administration scores: Autism Genetic Research Exchange (AGRE), Balanced Independent Dataset (BID), and National Database for Autism Research (NDAR). In our first study, we utilize the AD-Tree classifier (Weka) to classify instrument diagnosis from instrument codes, as in the Wall et al. studies. We examine the proposed reduced code set for selection reliability and classification accuracy, as well as the effects of including the previously excluded middle-severity diagnostic group for classification. In our second study, we utilize various classifiers including Support Vector Machines to create an automatic algorithm which fuses behavioral codes from multiple diagnostic instruments into a single BCE diagnosis.

Results:  In our first study, we assert several conceptual and methodological errors in the Wall et al. studies relating to: (i) interpretation of experimental results; (ii) exclusion of the critical middle-severity diagnostic group, leaving only extreme cases; (iii) classification of instrument (rather than BCE) diagnosis using instrument codes; (iv) inadequate data; and (v) insufficient reliability testing. We demonstrate empirically that the selected codes are highly variable (data-dependent) given the chosen methodology, and that performance degrades dramatically when including the more confusable middle-severity group.  In our second study, we examine site-dependent preferences for instrument and instrument codes when making a BCE diagnosis. Essential behavioral codes for differential diagnosis will be reported. 

Conclusions:  Our experiments suggest both the promise and potential pitfalls of applying machine learning to autism diagnosis. First, our inability to generate comparable findings to those reported by Wall and colleagues using larger, more balanced data underscores the importance of domain and computational knowledge. Second, we report on initial experiments that utilize machine learning to objectively fuse diagnostic instruments. Lastly, we propose certain best-practices and draw attention to especially promising areas for collaboration between computational and behavioral scientists.