Exploring Heterogeneity in the ASD Blood Transcriptome: Machine-Learning Classification Accuracy Is Improved By Modeling Subgroups.
Objectives: We sought to examine heterogeneity in the ASD blood transcriptome.
Methods: Recently, we combined the subject-level data from previously published blood microarray studies in order to perform combined-samples mega-analysis. The present study utilized a subset of these data (male samples self-identified as European ancestry; n asd =417, n control= 243). We identified genes and functional gene-sets that were differentially expressed within this sample. We then clustered ASD-affected samples into putative subgroups based on genes and gene-sets, as well as expression principal components.
Results: Machine-learning classification accuracy in withheld samples was significantly improved for subgroup-informed classification problems (e.g., ASD subgroup k vs. all comparison; overall accuracies ranging from 67 to 76%), as compared with the baseline classification problem (i.e., all ASD samples vs.all comparison; overall accuracies ranging from 61 to 63%); this effect was most pronounced for gene-set- and PCA-based subgroups. All subgroup solutions showed pronounced differences in leukocyte-specific marker genes, indicating that heterogeneity in cellular composition contributes critically to transcriptomic heterogeneity. Additionally, many of the subgroup solutions showed significant differences in domains of the Mullen Early Learning Scale and comorbid developmental conditions.
Conclusions: These findings begin to shed light on heterogeneity within the ASD blood transcriptome.