Automatic Computer Vision Analysis of Emotional and Behavioral Responses of Children with Autism

Hashemi, Jordan

Background: The gold standard for Autism Spectrum Disorder (ASD) screening involves parental report of child behaviors followed by observational assessment by a trained clinician. With emerging methods in computer vision and machine learning and wide spread access of mobile devices, new low-cost, scalable, and objective screening methods of child behavior are feasible. These include designing and displaying specific video stimuli on the screen while simultaneously coding behaviors while he/she is watching.

Objectives: To develop and deploy multiple video stimuli, designed to elicit behaviors relevant to ASD screening, on a mobile device and validate them with computer vision methods that automatically code affective and attentive child responses.

Methods: Multiple short video stimuli (<1 minute) designed to elicit specific social and emotional responses were displayed on an iPad, while at the same time the front facing camera recorded the child’s face at 1080p resolution and 30 frames per second. Computer vision algorithms to assess head position, head turns, and affect were developed. In addition, human raters trained on Facial Action Coding System for Infants and Young Children coded multiple videos for ground truth affect and head turn labeling. A subset of the responses coded by the human raters and computer vision methods were analyzed for agreement. All responses coded by the computer vision methods were analyzed to determine whether stimuli differentially elicit child behavior.

Results: 104 toddlers (22 with and 82 without ASD) ages 16-31 months participated in the study with informed consent and IRB oversight. All of the participants’ facial and head behaviors were coded with the automatic computer vision methods while a subset of them (15 ASD and 18 non-ASD) were also coded by human raters. Out of the 97 videos that were coded by human raters for affect, the automatic methods agreed with the raters on 82% of the frames. Out of the 87 head turns that were coded by human raters, the automatic method correctly identified 81% of the head turns. Compared with toddlers without ASD, toddlers with ASD displayed reduced positive emotions throughout many of the stimuli. From the automatic coding, toddlers with ASD also displayed longer latency between the time when their name was called and when they social referenced by turning their heads.

Conclusions: We developed and deployed multiple video stimuli on a mobile device to elicit ASD-specific behavioral responses. From the video data and our computer vision methods, we were able to automatically code affective and attentive responses. We validated our automatic methods with trained human raters, and demonstrate the effectiveness of our video stimuli to differentially elicit child behaviors. Applications such as the one presented could lead to new or refined behavioral risk marker assessments.

Technology demonstration: We will have a demo where the audience can watch a video stimuli on a mobile device while the front facing camera records their face. After the video, we will present facial statistics (affect, head movement) that were automatically computed with our methods.

26586 Automatic Computer Vision Analysis of Emotional and Behavioral Responses of Children with Autism

26586
Automatic Computer Vision Analysis of Emotional and Behavioral Responses of Children with Autism