31393
Investigating ASD-Specific Salient Visual Features Using Discriminatory Convolutional Neural Networks: Results from the ABC-CT Interim Analysis

Poster Presentation
Thursday, May 2, 2019: 5:30 PM-7:00 PM
Room: 710 (Palais des congres de Montreal)
C. Carlos1, A. Naples1, K. Chawarska1,2, R. Bernier3, S. Jeste4, C. A. Nelson5, G. Dawson6, S. J. Webb3, M. Murias7, F. Shic8,9, C. Sugar4 and J. McPartland1, (1)Child Study Center, Yale University School of Medicine, New Haven, CT, (2)Child Study Center, Yale School of Medicine, New Haven, CT, (3)Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, (4)University of California, Los Angeles, Los Angeles, CA, (5)Boston Children's Hospital, Boston, MA, (6)Department of Psychiatry and Behavioral Sciences, Duke Center for Autism and Brain Development, Durham, NC, (7)Duke Center for Autism and Brain Development, Department of Psychiatry and Behavioral Sciences, Duke University, Durham, NC, (8)Center for Child Health, Behavior and Development, Seattle Children's Research Institute, Seattle, WA, (9)Pediatrics, University of Washington School of Medicine, Seattle, WA
Background: Autism spectrum disorder (ASD) is marked by atypical visual attention to socially meaningful visual stimuli. While reduced focus to social features is commonly studied, less research has been directed towards characterizing visual features that draw the attention of individuals with ASD. By training convolutional neural networks (CNNs) to discriminate among saliency maps of eye-tracking (ET) data from individuals with ASD vs. typical development (TD) vs. randomly generated maps, visual attention patterns associated with ASD can be described. Using the biophysically inspired CNN model, insight into the underlying mechanism of visual attention for ASD individuals may lead to better clinical treatment and intervention strategies.

Objectives: This study uses CNNs to model image characteristics that are most salient to ASD individuals relative to TD peers.

Methods: ET data were collected for 225 participants between the ages of 6 and 11 (ASD: N = 161, 131 male; TD: N = 64, 42 male) using an SR Eyelink 1000+ while participants viewed static images of social scenes. ET data were used to generate saliency maps, which were overlaid onto the respective trial’s stimulus image. These masked images served as the dataset for the CNN models. A set of random saliency maps was also generated. Three identical CNNs were trained independently to discriminate between ASD and TD datasets (ASDvTD), ASD and random datasets (ASDvRAND), and TD and random datasets (TDvRAND). A consistency-of-gaze (CoG) metric was calculated to index a person’s tendency to fixate on their most salient feature of a stimulus image by dividing the number of sampled viewings in the highest salient location by the total number of sampled viewing positions.

Results: A significant difference was found between the CoG metric for ASD and TD groups indicating less consistency of gaze in ASD; t(1333.7)=-3.76, p=1.8e-4. Validation set accuracy and mean squared error (MSE) assessed ability of CNNs to discriminate datasets. ASDvTD obtained accuracy=0.70, MSE=0.30. ASDvRAND obtained accuracy=0.59, MSE=0.41. TDvRAND obtained accuracy=0.86, MSE=0.11.

Conclusions: These data show that CNNs are capable of discriminating between groups based on saliency weighted image characteristics. CoG significantly differed between diagnostic groups; this may reflect the previously reported tendency of TD participants in this sample, relative to ASD participants, to attend to faces in stimuli (Shic et al., 2018). This would also explain the high accuracy of TDvRAND, as the model was capable of easily learning head specific low-level visual patterns. The low accuracy of ASDvRAND suggests that, in contrast to the shared low-level features of TD salience, low-level ASD-specific features were not strongly distinct from random low-level visual features. Future research could add more convolutions to the model to increase the size of the receptive field for CNN. Such deeper model could test whether ASD-specific salient features are noisy and non-homogenous or whether that gaze in ASD individuals is mediated by a set of common visual features.