31956
Regression Clustering for Discovering Multiple Types of Brain-Behaviour Associations in a Sample: Data from the Pond Network

Panel Presentation
Saturday, May 4, 2019: 11:45 AM
Room: 517C (Palais des congres de Montreal)
S. Panahandeh1, E. Anagnostou2, J. P. Lerch3 and A. Kushki4, (1)Institute of Biomaterials and Biomedical Engineering, University of Toronto, Vaughan, ON, Canada, (2)Holland Bloorview Kids Rehabilitation Hospital, Toronto, ON, Canada, (3)Mouse Imaging Centre, Hospital for Sick Children, Toronto, ON, Canada, (4)Bloorview Research Institute, Toronto, ON, Canada
Background: The autism spectrum is associated with significant heterogeneity in etiology, biology, and phenotype. This heterogeneity challenges traditional statistical tools used for examining brain-behaviour associations (e.g., linear regression), which do not take into account the possible presence of subgroups that can be characterized by different regression models. To address this challenge, we propose a novel data-driven and unsupervised approach to discover multiple types of brain-behaviour associations in a sample. This approach clusters the sample data into K groups, each with its own linear regression function. The difference between this method and traditional clustering is that the proposed approach groups data points based on their relative similarly to a regression line, not direct similarity to each other.

Objectives: Our objective was to discover multiple regression lines that explain the association of brain-phenotype patterns in ASD. Specifically, we looked at associations between cortical thickness and social communication function quantified using the Social Communication Questionnaire.

Methods: Data from a sample of 121 participants with a diagnosis of ASD were obtained from the POND Network (age:11.9(3.6); 98 male). Brain data included cortical thickness measurements from 76 regions of the brain obtained using the CIVET pipeline, corrected for total gray matter volume, sex, age, and scanner. Behavioural data were scores on the social communication domain of the SCQ. Analyses were performed using a machine learning pipeline which employs regression clustering to clusters the sample data into K groups that are characterized by different regression functions. To ensure stability of the found patterns, the analyses were run on 100,000 random partitions of the data, each including 5% of the participants. Th RANSAC algorithm was used to fit linear models to each subset of the data, and similarity matrices were built based on whether or not data points were on the same regression line. Spectral clustering was used to cluster the similarity matrices. Number of clusters was chosen to maximize the within-to-between scatter ratio. Clusters were validated by ensuring that the scatter ratios were significantly different than those for randomly generated data.

Results: Our results support the notion that SCQ-cortical thickness association can be characterized using multiple regression lines for several cortical regions previously implicated in ASD. These included the orbital part of the right superior frontal gyrus (2 clusters; regression slopes -3.5(0.2) and 10.7(0.4); p<0.00001) and the right posterior cingulate gyrus (2 clusters; regression slopes -4.6(0.2) and -3.7(0.3) ; p<0.00001). These results are shown in Figure 1. As seen in this figure, 2 clusters of points that lie on the same regression line are identified for each regions.

Conclusions: Our results demonstrate the feasibility of using data-driven approaches to model heterogeneity in brain-behaviour associations.