Background: A recent feasibility study involving children with autism spectrum disorders (ASD) interacting with a socially assistive robot showed that some children have positive reactions to robots, while others may have negative reactions, easily observable in the child's distances from the robot, parent, and the walls of the room. A positive reaction with the robot included time standing in front of the robot at a close distance, a negative reaction was indicated by moving away from the robot, and staying close to a wall or to a parent. Since it is unlikely that children with ASD will enjoy any robot 100% of the time, it is important to develop methods for detecting negative child behaviors in order to minimize distress and facilitate effective human-robot interaction.
Objectives: The goals of this work are 1) to model the interaction state of children with ASD as they interact with a robot and a parent using distance-based features obtained automatically from an overhead camera, and 2) to determine if such automatically coded distance state are comparable with a human rating of the interaction state.
Methods: We recorded 8 children with ASD as they interacted with their parent and a robot in the experimental space for multiple 5-minute sessions. A human rater annotated the sessions for child response to the robot, overall positive or negative interaction during the session, and current behavior of the child including: interaction with the robot; huddling near parent; huddling near wall; and avoiding the robot.
We equipped the experiment space with an overhead camera. Using an in-house overhead vision system, the positions of the child, robot, and parent can be automatically determined. A spatio-temporal model of social behavior using and 8-dimensional feature vector, including distances and velocities, between the child and robot, parent, and wall was computationally obtained using the overhead data and clustered using expectation-maximization to a Gaussian Mixture Model (GMM) into 50 clusters. These clusters were then classified using a naïve Bayes classifier, based on a human rating for training data into the above described behavior. The model was trained on 20% of the recorded data, and tested on the remaining 80%.
Results: The approach achieves a 91.4% accuracy rate in classifying robot interaction, parent interaction, avoidance, and hiding against the wall behaviors and demonstrates that these classes are sufficient for distinguishing between positive and negative reactions of the child to the robot.
Conclusions: The overhead camera system discussed in this work was able to extract the relevant features from the recorded data. The GMM-based method for state clustering can efficiently and effectively cluster the 8-dimensional feature space. These states are easily labeled by using annotated training data and could be used for partial behavior transcription. Potential concerns include over-generalization that can happen with human labeling, and over-specialization given the heterogeneity of the participant population.