31969
Fully Automated Measurement of Imitation and Motor Learning Differences in Children with ASD

Poster Presentation
Thursday, May 2, 2019: 5:30 PM-7:00 PM
Room: 710 (Palais des congres de Montreal)
C. J. Zampella1, K. Bartley1, E. Sariyanidi1, B. Tunc1, M. Cola2, S. Plate2, L. Bateman3, A. de Marchena4, E. S. Kim2, J. D. Herrington5, J. Parish-Morris1, J. Pandey2 and R. T. Schultz2, (1)Center for Autism Research, The Children's Hospital of Philadelphia, Philadelphia, PA, (2)Center for Autism Research, Children's Hospital of Philadelphia, Philadelphia, PA, (3)The Center for Autism Research/CHOP, Philadelphia, PA, (4)University of the Sciences, Philadelphia, PA, (5)Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
Background: Meta-analysis indicates that imitation impairments are strongly and specifically associated with ASD. While impairments are robust across tasks, how imitation is operationalized within studies moderates whether impairments are detected – i.e., measuring form distinguishes ASD from non-ASD better than simply measuring end states. Accurately measuring the form of actions as they unfold requires tools that are spatially and temporally granular – now achievable via computer vision. We used computer vision to quantify gross motor imitation in children during a brief task.

Objectives: Apply automated computer vision approaches to measure imitation accuracy and change over time; compare a scalable, open-source motion-tracking program against an established but more resource-intensive system.

Methods: Participants were 21 children with ASD and 18 matched typically developing children (TDC; see Table). Children imitated in real time a 2.5-minute video of a man making a sequence of body movements. The task was completed twice, separated by another brief task. Kinect V2 units collected front-facing whole body video at 30 frames/second. Joint movements were digitally tracked in coordinate space using two platforms: (1) 3D tracking with iPi Motion Capture; (2) 2D tracking with OpenPose (open-source software). Imitation performance was quantified through windowed cross-correlations (4-second sliding windows) between child joint coordinates and joint coordinates from the stimulus video. Results herein are for a subset of joints – child’s left wrist relative to stimulus right wrist and vice versa. Mean peak cross-correlations were analyzed in the context of 2 (ASD/TDC) x 2 (Time 1/Time 2) mixed ANOVA.

Results: iPi (3D): There were significant group by timepoint interactions for both wrists, with large effect sizes [left: p=.02, ηp2=.15; right: p=.01, ηp2=.16]. TDC significantly outperformed ASD for both wrists at Time 2 [left: p=.002, d=1.07; right: p=.003, d=1.03], but not Time 1 [left: p=.11, d=.53; right: p=.17, d=.46]. TDC performance was significantly higher at Time 2 than Time 1 [left: p=.03, d=.54; right: p=.03, d=.54], whereas the ASD group did not differ significantly across timepoints [left: p=.15, d=-.34; right: p=.11, d=-.40], suggesting a lack of improvement with practice in ASD. OpenPose (2D): The pattern of results was highly similar to iPi (see Figure). There was a significant group effect for both wrists, with medium and large effect sizes respectively [left: p=.046, ηp2=.10; right: p=.01, ηp2=.16]. Neither interaction terms nor timepoint effects reached significance for either wrist. While iPi and OpenPose patterns were consistent, mean cross-correlations from OpenPose were lower and standard deviations were higher.

Conclusions: Results are consistent with literature documenting imitation differences in ASD, and are specifically suggestive of impaired motor learning. The novelty of our approach is direct acquisition of raw movement data, rather than reliance on human raters. Such granular measurement should improve imitation assessment, particularly of change over time (e.g., treatment outcomes). 3D motion tracking outperformed 2D tracking; the latter yielded somewhat higher levels of noise in movement representations. However, the freely available, fully automated 2D method yielded the same pattern of results and can be used with standard video (vs. Kinect), which holds promise for large-scale deployment.