32228
Application of Supervised Learning Methods in Stratification of Autism Based on Mixed Measures of Eye Tracking and Electroencephalogram: Results from the ABC-CT Interim Analysis
Objectives: To investigate the feasibility of combining individual measures for ASD biomarker discovery, this study utilizes computational and machine-learning methods to examine how measures extracted from EEG and ET paradigms can be combined to achieve robust predictions of behavioral phenotype.
Methods: This study leverages derived variables from all ET paradigms (Biomotion Preference, Activity Monitoring(AM), Social Interactive(SI) Play, Static Social(SS) Scenes, and Pupillary Light Reflex(PLR)) and EEG paradigms (Visual Evoked Potential(VEP) and Faces vs Houses(ERP)) presented in the interim data set (Summer 2018) of Autism Biomarkers Consortium for Clinical Trials(ABC-CT), formed to investigate promising biomarkers for ASD. The dataset contained data from baseline measurements from 225 children with ASD(n=161) and typical-development(n=64). K-nearest-neighbors was used for imputation of missing predictors. Using a variety of machine-learning techniques, with 10-fold 100x-repeated-cross-validation, this study examines the Pearson’s correlation of predictions derived from 47 potential predictors to ADOS calibrated severity scale(CSS) and IQ. The complete set of predictors, LASSO, principal component analysis(PCA), elastic net(EN) and ridge decision tree(RDT) are used for feature selection; first and second order linear regression models(LM), random forest(RF), support vector machines(SVM) are used for prediction(total of 16 different machine-learning approaches). For comparison, the best single predictor is identified from the set of all predictors.
Results: 1.As a baseline, correlations between individual variables and outcome variables was examined.
- CSS: the highest correlations were observed in ET %head averaged across AM,SI,SS paradigms,%heads(AM), and %looking(SI), r=0.548,0.469,0.447. The average correlation across all variables was r=0.282.
- IQ: ET looking%(SI), EEG erp (faces_good) and ET looking%(SS) showed top correlations of r=0.47,0.45,0.43. Similar to CSS, the mean variable correlation was r=0.280.
2.Using 2nd and 1st-order LM,SVM and RF with all derived measures (EEG&ET) achieved:
- CSS: r=-0.000576,0.545,0.609,0.584.
- IQ: r=0.00307,0.549,0.533,0.484.
3.Combining PCA,LASSO,RDT, and EN with 1st-order LM and SVM achieved:
- CSS: LM:r=0.54, 0.59,0.59, 0.60 and SVM:r=0.601,0.602,0.612, 0.594.
- IQ: LM:r=-0.54,0.568,0.551, 0.564 and SVM:r=0.574,0.553,0.504, 0.541.
Conclusions: These results suggest that a machine-learning approach improves associations above the average single biomarker outcomes and results in minor gains as compared to best single biomarker outcomes. However, only limited improvements were observed, relative to straightforward first-order linear regression, by using either variable selection or more advanced machine-learning techniques (SVM,RF). This suggests that leveraging complex interactions between biomarkers may not provide substantial gains in matching biomarkers to clinical phenotypes -- a finding that may be expected given that ABC-CT paradigms were all primarily designed to index social-communicative function. Additional nuances of this work, including variables identified by machine-learning approaches best describing phenotypic relationships, the robustness of machine-learning in high-dimension spaces, and machine-learning limitations, will be discussed in subsequent reports.