Automatic Retrieval of Videos of Stereotyped and Repetitive Movements

Friday, May 18, 2012
Sheraton Hall (Sheraton Centre Toronto)
3:00 PM
A. Ciptadi, A. Rozga, G. D. Abowd and J. Rehg, Georgia Institute of Technology, Atlanta, GA

Collecting large corpora of video data has become common practice among researchers and clinicians studying autism (e.g., Watt, 2008). One of the difficulties in analyzing video data stems from the need for human coders to browse all of the content in order to manually annotate the occurrence of specific behaviors of interest, a time intensive and laborious process. We demonstrate a collaboration between computer vision research and developmental psychology aimed at developing automated tools to speed up the annotation process. The specific context of this initial work was to assist in the automatic retrieval of gross motor physical stereotypies from video based on a single example identified in the video by a human coder. 


Develop a computer vision algorithm that, given a single example of a behavior of interest occurring in a video, automatically retrieves instances of similar behaviors from the video database. 


One of the ways humans perceive action is by observing local movement patterns and then abstracting a coherent structure by looking at the relations between these patterns [Johansson, 1973]. For example, when two children play a ball game with their feet, we can characterize that activity by the movement pattern of the ball, the movement pattern of the feet and how the different local movements interact with each other. We devised a computer vision algorithm to parse a video into a set of signals that correspond to the timing pattern characteristic of local movements. We then compare the similarity between any two given snippets of the video by looking at the similarity of two sets of signals extracted. Using this as a basis, we rank all snippets from the video based on one example snippet containing a particular behavior. First we measure similarity between every single snippet in our database and the target (example) snippet. Then, we rank the videos based on the similarity score.


We applied our method to two sets of videos: a 30-minute session of a child with autism engaged in a structured teaching activity at a table with a therapist, and a 30-minute free play session between a child with autism and a familiar adult. In both videos, the children exhibited stereotyped and repetitive movements that were annotated by a developmental psychologist with expertise in autism. These behaviors included hand flapping, clapping, jumping, and close visual inspection of objects. Our preliminary results indicate on average our method is able to rank 90% video snippets of behaviors that include gross body movement (hand flapping, jumping) in the top 20% of the retrieval results. This means an expert will only have to go through 20% of the video to see 90% of the relevant behaviors, representing a 5-fold saving in time.


Our preliminary results show great promise in automatically retrieving exemplars of gross motor movement from video recordings, and have relevance for research on stereotyped and repetitive behaviors in autism. In future work, we aim to improve the accuracy and extend the range of behaviors that can be retrieved.

| More