International Meeting for Autism Research (May 7 - 9, 2009): Automatic Retrieval of Mother-Infant Social Games from Unstructured Videos

Friday, May 8, 2009

Boulevard (Chicago Hilton)

P. Wang , School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA

J. M. Rehg , School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA

G. D. Abowd , School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA

R. I. Arriaga , School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA

Background: Social games are an important source of diagnostic information on social deficiency for autism. Social interaction gestures (most of which arise in the hand movements of social games) are analyzed in recorded videos (retrospective study) or under in-situ observations. Current approach on video editing and behavior coding for retrospective study involves manual scene tagging and selection, and manual behavior scoring with trained professionals. It leads to highly inefficient use of both human time and videos, and subjectivity and inconsistency in coding results. Our long term goal is to facilitate the process by developing techniques to automate video filtering and behavior coding.

Objectives: Develop methods for automatically retrieving social games from unstructured videos.

Methods: Our computational model characterizes social games as quasi-periodic spatio-temporal patterns based on its four attributes: dyadic, interactive, multi-instantiation and repetitive. Social games are fundamentally dyadic interactions, and the correspondent motion patterns are temporally-interacting and spatially-distinct. The game itself is loosely defined by an abstract game rule, which enables multiple game instantiations. Finally the repetition (with a permissible range of variations) will generate approximately same categorical motion patterns throughout a game instantiation. From the retrieval perspective, the quasi-periodic spatio-temporal pattern should distinguish social games from other non-game footages. Our strategy takes three steps. First, we process the footage to identify segments that are recorded with a relatively static camera. We assume the videographer tries to hold the camcorder still when recording a social game. Second, we use a temporal data mining technique called InfoMiner (Yang et. al., 2001) to find the quasi-periodic patterns from the filtered video. The entire videos are divided into short segments (400 to 600 frame-long on average). Then a sequence of motion symbols is extracted from each segment. For each sequence, InfoMiner finds the sequential patterns that exhibit a repetitive structure in that sequence. Finally, we will use validation method that integrates the spatial and temporal relationships to examine the hypothetical patterns.

Results: Our testing sequence contains 208 video segments, with 53 games (14 patty-cake and 39 ball games) and 155 non-game activities (such as single person play with toy, two-person conversations). Retrieval performance is evaluated by recall and precision. Exhaustive manual search has 100% recall (all the games are found) and 25.5% precision (53/208, which means 74.5% time are wasted at navigating non-game videos). Our method returns 92.45% recall and 31% precision. It doesn’t require any human-preprocessing once the video is ripped from DV or VHS tapes. It takes only seconds to decide which segment contains social games after the motion symbols are extracted. The reported results are obtained from the first two steps. Currently the validation stage for our retrieval system is still under development. We believe that validation will eliminate many false positives and retrieval precision will be much improved.

Conclusions: The social game retrieval system has shown promising performance compared to manual retrieval and is the first step towards automatic retrieval of general social interactions.

See more of: Innovative Technologies Demonstration Session
See more of: Technology