Cross-Modal Coordination of Face-Directed Gaze and Emotional Speech Production in Adolescents with ASD
Objectives: In an emotional speech mimicry task, we predict that adolescents with ASD will make primarily lower-face movements to support speech production, but not make concurrent upper-face movements to express emotion. We also expect that adolescents who spend more time gazing at the face of a video model will produce more upper-face movements for emotional expressions.
Methods: Participants watched and mimicked videos of adolescents producing two-sentence combinations (a neutral sentence followed by an emotional one). We recorded acoustic measures of verbal speech (intensity, F0, etc.), facial motion capture (32 markers across the face), and eye-tracking data (dwell time to face) from 13 adolescents with ASD and 19 neurotypical (NT) adolescents. We used Granger causality to measure the strength of coordination between facial movements and acoustic measures: strong Granger causality indicates rigid speech-face dependence with few effective degrees of freedom, whereas weak Granger causality indicates independent control of the face and voice. We obtained Autism Quotient (AQ) scores as a continuous measure of autism features and used linear mixed effects models to analyze the relationship between AQ and (1) Granger causality between speech-face patterns and (2) face-directed gaze.
Results: Participants with more autistic features (i.e. high AQ scores) have greater cross-modal dependence than low-AQ adolescents (χ2(7)=1541.9, p<0.05). A linear mixed effects model with random slopes for AQ by motion capture marker demonstrates the effect of AQ is stronger for the lower face (lower cheek, mouth, chin) than for the upper face (eyes, eyebrows, forehead) (U=595, p<0.05).
AQ scores also interact with face-directed gaze. Specifically, the slope for net dwell time on any part of the face is steeper for children with high AQ (linear mixed effects model; all p<0.05): Heightened visual attention to the face results in greater speech-face dependence in high-AQ participants, but less dependence in low-AQ participants.
Conclusions: High-AQ adolescents have greater cross-modal dependence, suggesting they move all facial regions primarily for speech production but not for additional emotional expressivity in an emotional-speech mimicry task. In contrast, low-AQ participants produce more facial feature movements not directly related to speech, particularly in the upper face (e.g. eyebrow raises), which can transmit emotional expressivity without being tied to the rhythm of lower-face speech movements. Contrary to our prediction, the difference between high- and low-AQ participants is amplified by greater visual attention to an emotional face.