Exploring Heterogeneity in the ASD Blood Transcriptome: Machine-Learning Classification Accuracy Is Improved By Modeling Subgroups.

Thursday, May 11, 2017: 12:00 PM-1:40 PM
Golden Gate Ballroom (Marriott Marquis Hotel)
D. S. Tylee1, J. L. Hess1, T. P. Quinn1, B. Stamova2, F. R. Sharp3, I. Hertz-Picciotto4, S. V. V. Faraone5, S. W. Kong6 and S. J. Glatt1, (1)SUNY Upstate Medical University, Syracuse, NY, (2)UC Davis MIND Institute, Sacramento, CA, (3)Neurology, University of California, Davis School of Medicine, Sacramento, CA, (4)University of California at Davis, Davis, CA, (5)Psychiatry, SUNY Upstate Medical University, Syracuse, NY, (6)Computational Health Informatics Program, Boston Children's Hospital, Boston, MA
Background:  Blood-based microarray studies comparing individuals affected by autism spectrum disorder (ASD) and typically developing individuals have helped characterize differences in circulating immune cell functions and offer potential biomarker signal. Genetic heterogeneity is widely recognized within the ASD phenotype and at the level of etiology, yet relatively few studies have explicitly examined heterogeneity in the transcriptome.

Objectives:  We sought to examine heterogeneity in the ASD blood transcriptome.

Methods:  Recently, we combined the subject-level data from previously published blood microarray studies in order to perform combined-samples mega-analysis. The present study utilized a subset of these data (male samples self-identified as European ancestry; n asd =417, n control= 243). We identified genes and functional gene-sets that were differentially expressed within this sample. We then clustered ASD-affected samples into putative subgroups based on genes and gene-sets, as well as expression principal components.

Results:  Machine-learning classification accuracy in withheld samples was significantly improved for subgroup-informed classification problems (e.g., ASD subgroup k vs. all comparison; overall accuracies ranging from 67 to 76%), as compared with the baseline classification problem (i.e., all ASD samples vs.all comparison; overall accuracies ranging from 61 to 63%); this effect was most pronounced for gene-set- and PCA-based subgroups. All subgroup solutions showed pronounced differences in leukocyte-specific marker genes, indicating that heterogeneity in cellular composition contributes critically to transcriptomic heterogeneity. Additionally, many of the subgroup solutions showed significant differences in domains of the Mullen Early Learning Scale and comorbid developmental conditions.

Conclusions:  These findings begin to shed light on heterogeneity within the ASD blood transcriptome.

See more of: Genetics
See more of: Genetics