Computer Vision Analyses of Social Coordination and Social Communication Deficits in Autism
Objectives: To develop robust quantitative methods for precisely assessing coordinated facial movements during social interaction; and to test whether such a measurement process can distinguish those with autism spectrum disorder (ASD) from typically developing controls (TD), and whether it can distinguish individual differences in social communication skill within the group with ASD, specific from restricted and repetitive behaviors (RRB).
Methods: Our primary sample consisted of 44 young adults, 17 with ASD and 27 TD. We tested the generalizability of the results in a replication sample of 30 adolescents, 17 with ASD and 13 TD. Both samples were matched on age, verbal IQ (normative range), and gender. Participants engaged in an unstructured, 3-minute “get to know you” conversation with an unfamiliar study team confederate. Confederates were instructed not to initiate topics and to not speak more than 50% of the time. Dyadic interactions were captured with a specially designed “TreeCam” with two synchronized HD video cameras pointing in opposite directions. Dyadic facial coordination was automatically quantified with a computer vision and machine learning analytic pipeline. Facial movements were captured as a set of 180 independent, regional “bases”, where bases represented time series of facial movements (e.g., corner of the mouth) for each person. Quantification of dyadic coordination between conversational partners involved windowed cross correlation between the partners’ time series. A machine learning framework (with nested, leave-one-out cross-validation; LOOCV) was designed to predict group membership (ASD vs. TD) and individual differences in ADOS-2 overall CCS, Social Affect (SA), and RRB scores. Only the dyadic features that predicted adult group membership were used in the replication sample.
Results: Classification (ASD vs. TD) accuracy was 88.6% (p<.0001; PPV=.93; NPV=.87) for the primary sample, and 86.7% (p<.0005; PPV=.88; NPV=.85) for the replication sample. Automated computer prediction in the primary sample was more accurate than that of expert (n=9; 87% vs. 82%) and non-expert (n=11; 87% vs 77%) study staff who made diagnostic judgements from the same dyadic videos (p’s<.001). Using the feature groups selected for classification, support vector regression with LOOCV predicted the ADOS-2 CSS in the primary sample (r=.57, p=.02) and the replication sample (r=.53, p=.03). As hypothesized, correlations were higher for SA scores than RRB scores (SA: .58 and .20, respectively; RRB: .00 and .06).
Conclusions: Automatic assessment of social coordination from brief videos of natural conversations promises to be an important new tool for autism research, which adds granularity and scalability to diagnostic and social communication assessment.