31612
Do Predictors in Machine Learning Classification of ASD Differ for Children Vs. Adolescents?
Objectives: Apply machine learning to language features extracted from transcripts of naturalistic conversations, with the goals of (1) classifying participants as ASD or typically developing, and (2) comparing classification accuracy and predictive features between a child sample, an adolescent sample, and a collapsed sample that includes all participants.
Methods: Eighty-five matched participants (Table 1) participated in two 3-minute semi-structured “get to know you” conversations with two previously unknown confederates who were not autism experts (Ratto et al., 2011). In the first conversation, the confederate is trained to act interested in the conversation, and in the second, bored. Transcripts were analyzed using LIWC software (Tausczik & Pennebaker, 2010) and R’s ‘qdap’ package (Rinker, 2017), resulting in 121 features for participants and confederates in each condition, as well as the difference between conditions. Our machine learning pipeline included a logistic regression classifier trained with participant and/or confederate features within a leave-one-out-cross-validation loop. Cross-validated classification accuracy was measured within children and adolescent samples separately, as well as across the entire age range; accuracy was compared using McNemar’s test. Conversational features with non-zero coefficients in the classifier were identified as top predictors of diagnostic status.
Results: Diagnostic classification accuracy was high in both age groups: 89% in adolescents and 76% in younger children (Table 2). Accuracy dropped significantly to 66% (p<.015) when the entire age range was classified within a single model, suggesting that optimal classification models may differ by age group. The most accurate classification model was driven by participant-level features for children and by confederate-level features for adolescents. For children, top predictive features included participant pronoun use, intra-turn pause duration, and “friend”-category words. For adolescents, top predictive features in the most parsimonious model included confederate word-level “authenticity” and negations.
Conclusions: This study showed that (1) features derived from naturalistic conversations with non-expert interlocutors can be used for diagnostic classification, and (2) top classification features may change over the course of development. Using machine learning to extract clinically-relevant dimensions from short, naturalistic conversation samples with naïve confederates could provide a new path toward rapid improvements in remote screening, characterization, and developing yardsticks for measuring treatment response.