Combining Supervised and Unsupervised Learning to Subgroup Autism Spectrum Disorder According to Regional Brain Volumes

Chadaram, Rohit

Background: Autism Spectrum Disorder (ASD) is a highly heterogenous condition with an unknown number of potentially unique neural phenotypes with distinct etiological causes. One biological factor that may be important in efforts to fractionate the autism spectrum into more homogenous subgroups is brain size. Recently it has been shown that a proportion of individuals with ASD have persistent disproportionate megalencephaly, i.e. brain volume disproportionate to body size (ASD-DM). There is some evidence that individuals with ASD-DM, on average have poorer outcomes. However, it is unknown if certain brain regions in ASD-DM are disproportionately affected and how regional variation in brain volumes in ASD-DM may contribute to the severity of ASD phenotypes.

Objectives: Utilizing both supervised and unsupervised machine learning techniques we aim to 1) identify the most important brain volumes for classifying ASD-DM and individuals with ASD but more typical brain volumes (ASD-N) and 2) use these identified brain regions to cluster individuals into groups with more homogenous volumetric neural phenotypes.

Methods: We acquired structural magnetic resonance imaging (MRI) scans of 147 male preschool aged children with ASD. ASD-DM classification was defined as having a total cerebral volume greater that 1.5 standard deviations from an established sample of age matched typically developing (TD) children, resulting in 16 ASD-DM and 131 ASD-N cases. Volumetric data from 239 brain regions were extracted using an automated T1-segmentation pipeline (https://mricloud.org) and normalized for total brain volume. Regional brain volumes were input as features for classification of ASD-DM and ASD-N using a RandomForest model. Model accuracy (ACC), specificity (SP) and sensitivity (SN) were estimated using a 10-fold cross validation scheme utilizing SMOTE to account for sampling bias as well as within an independent sample of 7 ASD-DM and 36 ASD-N cases. Model significance was assessed via n=1000 permutations of the class labels. The most discriminative features were determined according to measures of MeanDecreased accuracy and Out-of-Bag (OOB) error. Hierarchical clustering was then performed on the entire sample utilizing the most discriminative volumetric features in order to identify clusters of individuals with homogenous volumetric neural phenotypes.

Results: RandomForest was able to classify ASD-DM from ASD-N with a cross-validated ACC=89%/SN=74%/SP=90%. Similar results were observed when tested on an independent sample (ACC=86%/SN=75%/SP=87%). Permutation testing showed all classification results to be significant below chance level (p<0.05). After ranking features according to mean decreased accuracy it was determined that selecting 28 features resulted in the lowest OOB error, thus the top 28 features which included the superior frontal gyrus, middle temporal gyrus, and posterior cingulate gyrus, were further exported for hierarchical clustering of the entire sample. Cutting the resulting dendrogram at the second level resulted in two clusters containing n=4 ASD-DM/110 ASD-N and n=12 ASD-DM/21 ASD-N respectively.

Conclusions: Combining supervised and unsupervised machine learning techniques offers a powerful methodological framework for classifying and grouping individuals across the autism spectrum according to more homogenous biologically based subgroups. Such techniques represent a valuable tool in future efforts to identify new ASD subgroups with shared biological features.

31265 Combining Supervised and Unsupervised Learning to Subgroup Autism Spectrum Disorder According to Regional Brain Volumes

31265
Combining Supervised and Unsupervised Learning to Subgroup Autism Spectrum Disorder According to Regional Brain Volumes