Distinct but Effective Neural Networks for Facial Emotion Recognition in Individuals with Autism : A Deep Learning Approach

Mayor Torres, Juan Manuel

Background: Individuals with Autism Spectrum Disorder (ASD) evince deficits in facial emotion recognition (FER; Lozier et al., 2014). However, it is unclear if these deficits result from a failure to encode FER, or to deploy correctly-encoded information in making FER judgements (Yang et. al 2018). Deep-learning methodologies, such as Deep Convolutional Neural Networks (Deep ConvNets), utilize a data-driven approach to isolate neural networks involved in encoding emotional faces using single-trial electroencephalography-based (EEG) classification in typically-developing (TD) individuals (Schirrmeister et. al 2017). However, the extent to which Deep ConvNets can determine whether individuals with ASD correctly-encode FER using similar neural networks to their TD peers remains unclear.

Objectives: (1) Examine accuracy of a Deep ConvNet classifier in classifying neural responses to four different emotions (happy, sad, angry and fear), and (2) detect differences in neural networks involved in FER in individuals with and without ASD.

Methods: Thirty-six TD, and twenty-nine ASD individuals (Mage=13.48, SDage=1.894, 44 male; ADOS-2-confirmed; Lord et. al 2012) completed an EEG FER task (DANVA-2; Nowicki 2004). Each participant viewed 48 faces x 752 samples x 30 channels. Trials were segmented 0-1500ms post-stimuli and pre-processed using the PREP (Bigdely-Shamlo et. al 2015) and ADJUST (Mognon et. al 2011) pipelines. A ZCA whitening normalization (Coates & Ng, 2012) was applied constructing a 752x30 image/trial. The Deep ConvNet received images composed of 2 conv-pool layers: (1) a convolutional layer (kernel size: 100x10, filters: 32) and a pooling layer (size: 5x2; 2 stride), and (2) a convolutional layer (kernel size: 20x5; filters: 64) and a pooling layer (size: 2x2; 2 stride). The final fully-connected layer (1024units) was flattened to 4 softmax-derived probabilities-one/class. Accuracy of classifying FER neural responses using the Deep ConvNet was calculated by grouping all the true-positives from the confusion matrix/48 faces. To determine which channels are needed to accurately classify FER, occlusion analyses were used (Zeiler & Fergus, 2014) resulting in 19 possible spatial combinations spanning 5 time-ranges.

Results: Individuals with ASD performed worse on the FER task (p’s<0.05). The Deep ConvNet classifier successfully distinguished FER neural responses with high accuracy (>0.85; Figure 1,A;B) in both groups, indicating successful encoding occurred. Encoding of FER measured by the Deep ConvNet performance was higher than behavioral FER performance in ASD (p’s<0.001, Figure 1.C). Occlusion analyses indicated the importance of fronto-temporal regions for TD (Figure 1.D), but left-temporal region for ASD (Figure 1.E), suggesting distinct network engagement in FER in ASD.

Conclusions: The Deep ConvNet was a robust EEG single-trial classifier obtaining high accuracies in classifying correctly-encoded neural responses to FER in ASD. Results suggest that individuals with ASD successfully encode FER using left-temporal networks, which are distinct networks from their TD peers. However, despite accurate encoding, individuals with ASD experience difficulties deploying the correctly-encoded information needed to accurately perform FER tasks behaviorally. Such findings emphasize that individuals with ASD can indeed encode FER, but do not reliably use this information to perform FER tasks.

30962 Distinct but Effective Neural Networks for Facial Emotion Recognition in Individuals with Autism : A Deep Learning Approach

30962
Distinct but Effective Neural Networks for Facial Emotion Recognition in Individuals with Autism : A Deep Learning Approach