International Meeting for Autism Research: Managing Missing Data In Autism Research: The Use of Multiple Imputation

Managing Missing Data In Autism Research: The Use of Multiple Imputation

Saturday, May 14, 2011
Elizabeth Ballroom E-F and Lirenta Foyer Level 2 (Manchester Grand Hyatt)
9:00 AM
J. F. Strang1,2, D. Luckenbaugh3, L. Kenworthy2, G. L. Wallace4, J. L. Sokoloff2,5 and D. O. Black6, (1)Suite 350, Children's National Medical Center, Rockville, MD, (2)Center for Autism Spectrum Disorders, Children's National Medical Center, Rockville, MD, (3)Experimental Therapeutics and Pathophysiology Branch, National Institute of Mental Health, Bethesda, MD, (4)NIMH, Bethesda, MD, United States, (5)Children's National Medical Center, Rockville, MD, (6)Pediatrics and Developmental Neuropscience Branch/ NIMH, NIMH, Bethesda, MD, United States
Background: Missing data is a challenge for much clinical research and is often managed by listwise deletion (dropping incomplete cases).  Listwise deletion can result in reduced power, and when data is not missing at random, biased results.  Previous studies have shown that dropping cases with incomplete data can in fact lead to overestimates of the relationships between variables (Janseen et al., 2009).  Multiple imputation is a statistical technique that generates plausible values for missing data based on observed data (Rubin, 1987).  This technique has been shown to produce less biased predictions in medical research than listwise deletion.  Very few studies have used multiple imputation in autism-related datasets, and no studies have investigated the impact of the method in autism research where variability is a hallmark.        

Objectives: Compare multiple imputation with listwise deletion in a sample of typically developing children and children with autism spectrum disorders and no intellectual disability.  

Methods: Gender, Adaptive Behavior (Vineland Adaptive Composite), and Executive Function (BRIEF GEC) data from 120 children without intellectual disability (85 ASD; 35 Typically Developing controls) was used to predict group membership (ASD vs. Typically Developing) under the following conditions:  1. Complete data, 2. Multiple imputation after removal of 33% and 66% of the data for one variable, and 3. Listwise deletion after removal of 33% and 66% of the data for one variable.  Multiple imputation was performed using default settings of SPSS 17, where imputation replaced each missing value with 5 values drawn from an estimated distribution, resulting in 5 imputed datasets.  Multinomial logistic regression was used to predict group membership.  Nagelkerke’s r-square values/ranges are reported for each method.  Removal of data was intentionally biased to create a missing at random (MAR) condition using a covariate. 

Results: In the model with complete data, a Nagelkerke’s r-square of .78 was obtained.  In the imputed data with 33% of one variable removed, the r-squares ranged from .74 to .78; a similar pattern was observed with the imputed data sets with 66% of data removed (r-squares ranged from .72-.77).  By contrast, listwise deletion at 33% and 66% resulted in models that overestimated the relationship among the variables (r-square = .86 for both models).   

Conclusions: Results in this clinical autism dataset are consistent with previous findings in the statistical literature that multiple imputation yields estimates which more accurately reflect the complete data than listwise deletion, which may over-estimate relationships among variables.  This suggests the potential value of multiple imputation procedures for managing missing data in clinical autism research.

| More