Unsupervised Data-Driven Stratification of Autism Based on ADI-R Symptom Domains

Friday, May 12, 2017: 12:00 PM-1:40 PM
Golden Gate Ballroom (Marriott Marquis Hotel)
M. V. Lombardo1,2, B. Auyeung3, E. Loth4, G. Dumas5 and M. C. Lai6, (1)University of Cambridge, Cambridge, United Kingdom, (2)University of Cyprus, Nicosia, Cyprus, (3)University of Edinburgh, Edinburgh, United Kingdom, (4)Forensic and Neurodevelopmental Sciences, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, United Kingdom, (5)Institut Pasteur, Paris, France, (6)Psychiatry, University of Toronto, Toronto, ON, CANADA

Autism spectrum disorders (ASD) are clinically and etiologically heterogeneous. Pushing research forward towards precision medicine goals requires a deeper understanding of how individuals and subgroups within ASD are distinguished. Such distinctions within ASD may then point towards more precision in terms of clinical research and practice (e.g., aid in clinical practice for assessment, diagnosis, prognosis, planning treatment, monitoring, etc.) and may pave the way forward for translational research endeavors such as identifying etiological mechanisms and discovering novel targets for treatment.


To utilize unsupervised multivariate data-driven tools to aid in discovery of ASD subgroups defined by different patterning of symptom severity across social, nonverbal, verbal and repetitive restricted behavior (RRB) domains from ADI-R algorithm.


ADI-R data was identified for n=3,380 ASD individuals across 72 independent datasets within the National Database for Autism Research (NDAR). Each individual had complete data across all algorithm items in the social, nonverbal, verbal, and repetitive restricted behavior domains. Domain totals were computed as the sum of all algorithm items within these domains. The dataset was split randomly in half within each of the 72 datasets to generate independent Discovery and Replication datasets (n=1,690 in each). We then utilized a hierarchical clustering tools commonly applied within genomics and systems biology analyses such as weighted gene co-expression network analysis (WGCNA), and applied it to the subject dimension of this dataset to identify ASD subgroups in an automated and unsupervised fashion (Lombardo et al., 2016, Scientific Reports). Euclidean distance was used as the distance metric and hierarchical clustering was implemented with the Ward linkage method. Subgroups were automatically identified using a dynamic hybrid tree-cut algorithm with a deepSplit parameter set at 1. Once subgroup clusters were identified we descriptively report symptom severity for each subgroup in each domain as a percentage of the maximum total score one could theoretically obtain across all items in that domain.


Across both independent Discovery and Replication datasets, we find evidence of 5 replicable ASD subgroups with near identical patterns of symptom severity across the domains. Four of the 5 subgroups could be differentiated on-average by degree of severity and had relatively similar levels of severity across domains. However, there was one subgroup that was distinctly different from all others, with a pattern of high levels of social and nonverbal symptom severity, but relatively large drop-offs in severity on verbal and RRB domains.


Utilizing unsupervised data-driven tools for discovery of replicable ASD subgroups based on ADI-R symptom severity highlights 5 distinct subgroups. We will next examine how some of these subgroups may be identified and dissociated in further independent datasets (ABIDE, MRC AIMS, EU-AIMS LEAP), which have biological (e.g., resting state fMRI) and cognitive data collected on the same individuals.