Topological Data Analysis Reveals Meaningful Subgroups in ASD Research Data Based on Neural Responsivity and Behavioral Measures
Modern neuroscience research increasingly collects vast quantities of rich, multivariable data, but analytical methods remain largely unchanged. Without more sophisticated tools, much of the data will not be fully utilized. While clinical and electroencephalography (EEG) datasets from Autism Spectrum Disorder (ASD) research routinely contain hundreds of variables, standard statistical practices incorporate relatively few. Topological Data Analysis (TDA) is an approach designed to explore high-dimensional datasets. The Mapper algorithm (Singh, Mémoli and Carlsson, 2007) reduces dimensionality while maintaining structural features by generating clusters in the full high-dimensional space. Resulting cluster visualizations offer insights that can direct statistical investigation, support current methods, and foster understanding of complex interrelations between variables.
By implementing the Mapper algorithm, we: (1) visualize both EEG and clinical characterization data from a sample of individuals with ASD and typically developing controls (TD); (2) identify and describe subgroup clusters; (3) assess the utility of TDA for high-dimensional clinical neuroscience datasets.
We created visualizations for behavioral data, ERP data, and combinations of both. Resulting structures identified areas of diagnostic similarity suggesting that high-dimensional clustering can successfully differentiate groups in a data-driven manner. Groups containing both individuals with and without ASD suggest there are differentiable clusters that further parse heterogeneity within diagnostic categories. Figure 1 demonstrates a strong differentiation of diagnosis between cluster groups, based on 76 variables of only ERP data. Two regions outline groups of similar subjects with 100% and 27.27% of subjects diagnosed with ASD, respectively. 79% (n=73) of the eligible population (n=94) fit into these groups, with 2 subjects found in both.
Our initial results indicated subgroups of participants that are diagnostically well-differentiated by high-dimensional neural data, and other subgroups which appear more heterogeneous. Ongoing work seeks to analyze which ERP measures discriminate most between subgroups, and thus improve predictive models of differences in clinical phenotype. Further work will examine using subgroups for stratifying samples in clinical trials, and whether smaller subgroups within diagnoses differ meaningfully. These visualizations of latent structure within our data are a novel and valuable tool for exploring clinical datasets and building unique insights that inspire further research. Our findings have already identified meaningful subgroups based solely on ERP data. TDA warrants further development and refinement, particularly in clustering methodology. This method will allow us to scale up to richer multi-modal datasets e.g. including clinical characterization and eye-tracking measures.