Topological Data Analysis Reveals Meaningful Subgroups in ASD Research Data Based on Neural Responsivity and Behavioral Measures

Friday, May 12, 2017: 10:00 AM-1:40 PM
Golden Gate Ballroom (Marriott Marquis Hotel)
T. McAllister1, A. Naples2, S. A. A. Chang3, S. Hasselmo2, M. J. Rolison2, J. A. Trapani4, S. M. Malak4, K. A. McNaughton2, T. C. Day4, T. Halligan2, B. Lewis2, E. Jarzabek4, K. S. Ellison4, K. Stinson5, J. Wolf6 and J. McPartland4, (1)Child Study Center, Yale University School of Medicince, New Haven, CT, (2)Child Study Center, Yale University School of Medicine, New Haven, CT, (3)Yale University, New Haven, CT, (4)Child Study Center, Yale School of Medicine, New Haven, CT, (5)Yale University- Child Study Center, Milford, CT, (6)Yale Child Study Center, New Haven, CT

Modern neuroscience research increasingly collects vast quantities of rich, multivariable data, but analytical methods remain largely unchanged. Without more sophisticated tools, much of the data will not be fully utilized. While clinical and electroencephalography (EEG) datasets from Autism Spectrum Disorder (ASD) research routinely contain hundreds of variables, standard statistical practices incorporate relatively few. Topological Data Analysis (TDA) is an approach designed to explore high-dimensional datasets. The Mapper algorithm (Singh, Mémoli and Carlsson, 2007) reduces dimensionality while maintaining structural features by generating clusters in the full high-dimensional space. Resulting cluster visualizations offer insights that can direct statistical investigation, support current methods, and foster understanding of complex interrelations between variables.


By implementing the Mapper algorithm, we: (1) visualize both EEG and clinical characterization data from a sample of individuals with ASD and typically developing controls (TD); (2) identify and describe subgroup clusters; (3) assess the utility of TDA for high-dimensional clinical neuroscience datasets.


The Mapper algorithm was implemented using Javascript and Python, and used to analyze data from individuals with ASD and TD controls (ASD:n=61,mean age=14.08;TD:n=40,mean age=13.96). Results were visualized as 2D force-directed graphs in which cluster shading indicated the percentage of individuals in a group with a diagnosis of ASD. Clinical variables included measures from the Child Behavior Checklist, Social Responsivenss Scale, and Vineland Adaptive Behavior Scales II. Event Related Potential (ERP) variables included amplitude and latency at the P100 and N170 in response to dynamic faces. Naive clustering grouped subjects based on similarity in a high-dimensional space, then edges connected clusters with shared subjects.


We created visualizations for behavioral data, ERP data, and combinations of both. Resulting structures identified areas of diagnostic similarity suggesting that high-dimensional clustering can successfully differentiate groups in a data-driven manner. Groups containing both individuals with and without ASD suggest there are differentiable clusters that further parse heterogeneity within diagnostic categories. Figure 1 demonstrates a strong differentiation of diagnosis between cluster groups, based on 76 variables of only ERP data. Two regions outline groups of similar subjects with 100% and 27.27% of subjects diagnosed with ASD, respectively. 79% (n=73) of the eligible population (n=94) fit into these groups, with 2 subjects found in both.


Our initial results indicated subgroups of participants that are diagnostically well-differentiated by high-dimensional neural data, and other subgroups which appear more heterogeneous. Ongoing work seeks to analyze which ERP measures discriminate most between subgroups, and thus improve predictive models of differences in clinical phenotype. Further work will examine using subgroups for stratifying samples in clinical trials, and whether smaller subgroups within diagnoses differ meaningfully. These visualizations of latent structure within our data are a novel and valuable tool for exploring clinical datasets and building unique insights that inspire further research. Our findings have already identified meaningful subgroups based solely on ERP data. TDA warrants further development and refinement, particularly in clustering methodology. This method will allow us to scale up to richer multi-modal datasets e.g. including clinical characterization and eye-tracking measures.