First Impressions: Facial Expressions and Prosody Signal ASD Status to Naïve Observers

Friday, May 18, 2012
Sheraton Hall (Sheraton Centre Toronto)
9:00 AM
A. Schmid1, N. Pitre2, K. Hasty1 and R. B. Grossman1,2, (1)Psychiatry, UMMS Shriver Center, Waltham, MA, (2)Emerson College, Boston, MA
Background:  Data suggest that differences in the voice quality and prosody of individuals with ASD quickly signal the disorder to typically developing (TD) peers (Diehl et al. 2009, Lord & Paul, 1997) and that their facial expressions are rated as awkward by TD coders (Grossman et al. 2008).

Objectives:  The purpose of this study was to determine whether facial expressions and/or prosody of adolescents with ASD convey their diagnosis or general social awkwardness to naïve participants even in very short stimuli.  We hypothesized that dynamic information, particularly the combination of audio and video, would identify individuals with ASD, but still photographs would not. 

Methods:  We used videos of adolescents, aged 8-16 (TD=15, ASD=25) recorded during a story-retelling task (Grossman et al. 2008) and extracted still images, one-second clips and three-second clips.  All video clips were saved in three different versions: Silent Video (SV), Audio Only (wave), and Audio with Video (AV).  Naïve adult participants (17-29 per stimulus type) watched and/or listened to the stimuli and decided through button presses whether the person they had just seen or heard was “socially typical” or “awkward.”  Midway through the task the instructions changed, asking participants to determine whether the person in the preceding clip might have autism.  We included both prompts to assess whether participants changed their perceptions of the adolescents in the stimuli when asked to determine a specific diagnosis, vs. more generic social awkwardness.

Results:  We calculated accuracy for determining ASD or TD status of the adolescents in the clips for each stimulus type. Performance for still images was at chance throughout and accuracy for three-second clips was significantly higher than for one-second clips.  A multivariate ANOVA with stimulus type (AV, SV, Wave) as the dependent variable showed no difference in accuracy rates for three-second clips produced by adolescents with ASD vs. those produced by TD adolescents if the prompt was to detect social awkwardness.  When the prompt was to determine ASD status, AV stimuli produced by adolescents with ASD were categorized significantly more accurately than SV or wave files, which had accuracies at chance level. 

Conclusions:  Our data show that naïve adult observers were able to accurately differentiate between adolescents with ASD and their TD peers based on only three seconds of visual and/or auditory information.  Still photographs did not provide sufficient information for naïve observers to make that determination.  Participants were more accurate in trials prompting them to diagnose ASD than those prompting to detect social awkwardness, suggesting that there are indicators beyond general social awkwardness that are used by naïve observers to form their perception of ASD status.  These factors appear to be most definitive when auditory (prosody) and visual (facial expression) features are preserved together, rather than presented in isolation.

| More