31393
Investigating ASD-Specific Salient Visual Features Using Discriminatory Convolutional Neural Networks: Results from the ABC-CT Interim Analysis
Objectives: This study uses CNNs to model image characteristics that are most salient to ASD individuals relative to TD peers.
Methods: ET data were collected for 225 participants between the ages of 6 and 11 (ASD: N = 161, 131 male; TD: N = 64, 42 male) using an SR Eyelink 1000+ while participants viewed static images of social scenes. ET data were used to generate saliency maps, which were overlaid onto the respective trial’s stimulus image. These masked images served as the dataset for the CNN models. A set of random saliency maps was also generated. Three identical CNNs were trained independently to discriminate between ASD and TD datasets (ASDvTD), ASD and random datasets (ASDvRAND), and TD and random datasets (TDvRAND). A consistency-of-gaze (CoG) metric was calculated to index a person’s tendency to fixate on their most salient feature of a stimulus image by dividing the number of sampled viewings in the highest salient location by the total number of sampled viewing positions.
Results: A significant difference was found between the CoG metric for ASD and TD groups indicating less consistency of gaze in ASD; t(1333.7)=-3.76, p=1.8e-4. Validation set accuracy and mean squared error (MSE) assessed ability of CNNs to discriminate datasets. ASDvTD obtained accuracy=0.70, MSE=0.30. ASDvRAND obtained accuracy=0.59, MSE=0.41. TDvRAND obtained accuracy=0.86, MSE=0.11.
Conclusions: These data show that CNNs are capable of discriminating between groups based on saliency weighted image characteristics. CoG significantly differed between diagnostic groups; this may reflect the previously reported tendency of TD participants in this sample, relative to ASD participants, to attend to faces in stimuli (Shic et al., 2018). This would also explain the high accuracy of TDvRAND, as the model was capable of easily learning head specific low-level visual patterns. The low accuracy of ASDvRAND suggests that, in contrast to the shared low-level features of TD salience, low-level ASD-specific features were not strongly distinct from random low-level visual features. Future research could add more convolutions to the model to increase the size of the receptive field for CNN. Such deeper model could test whether ASD-specific salient features are noisy and non-homogenous or whether that gaze in ASD individuals is mediated by a set of common visual features.