A Researchkit App with Automatic Detection of Facial Affect and Social Behaviors from Videos of Children with Autism
Objectives: To create and test a self-contained mobile application that allows for remote collection of ASD-specific questionnaire data, observational video data of a child reacting to various video stimuli while in his/her natural environment, and automatic coding of observational video data.
Methods: The ResearchKit app Autism&Beyond was developed and launched as a study with informed consent and IRB oversight. The app incorporates demographic questionnaires and a digital version of the full Modified Checklist for Autism in Toddlers (M-CHAT-R/F) including follow-up questions. In addition, three short video stimuli (bubbles, rhymes and toys, and a bunny) as well as a mirror stimulus are presented to elicit affective and social responses. Using the front facing camera on the mobile device, video of the child is recorded during stimulus presentation and saved at 640 by 480 resolution and at 15 frames per second. Developed computer vision algorithms automatically detect and track multiple facial landmarks including points around the eyes, nose, and mouth. From the tracked facial landmark locations, multiple characteristics are gathered including head position, head orientation, facial affect classification, and blink rate. Responses to stimuli were analyzed separately to determine whether stimuli differentially elicit child behavior.
Results: Over six months of data collection, 878 subjects met the inclusion criteria (at date of abstract submission we have registered over 2,000 consent forms, over 5,000 video recordings, and over 8,000 completed surveys and questionnaires). Of these, 152 children 1-6 years old with parent-reported Autism diagnosis and/or high risk M-CHAT score uploaded video data through the mobile application. The child’s face was detected on average during 84-92% of the video stimulus, with the most face detection in the rhymes stimulus. Computer vision algorithms classified positive affect during a mean of 22-31% of the total time, with the most positive affect detected during the mirror stimulus. Using head position, social referencing occurred for 12-42% of subjects throughout the various stimuli, with greatest probability of social referencing observed during the Mirror stimulus.
Conclusions: We developed and deployed a mobile application to gather ASD-specific questionnaire and video data in naturalistic settings. From the video data we were able to automatically score affective and social responses of a child during various video stimuli. Application of computer vision to facial expression analysis could lead to new behavioral imaging methods for detecting subtle neurologic processes relating to attention and emotional expression, and may be useful for early Autism screening and monitoring.