Computational Semantic Analysis of Restrictive and Repetitive Behavior in Language Samples of Children with Autism

Friday, May 15, 2015: 11:30 AM-1:30 PM
Imperial Ballroom (Grand America Hotel)
M. Rouhizadeh1, R. Sproat2 and J. van Santen1, (1)Center for Spoken Language Understanding, Oregon Health & Science University, Portland, OR, (2)Google, Inc., New York, NY

Restrictive and repetitive behavior (RRB) is a core symptom of autism spectrum disorder (ASD). We investigate whether RRB is also present in language behavior, specifically whether children with ASD talk about fewer topics more repeatedly during their conversations. We hypothesize a higher semantic overlap ratio (SOR) between dialogue turns in children with ASD compared to those with typical development (TD). Few studies have tested this hypothesis since manual analysis is exceedingly labor-intensive. However, computational text analysis tools can be adapted for quantitative characterization of RRB in ASD at the semantic level.


(1) To develop computational text analysis tools for automatically assess SOR between dialogue turns. (2) To apply these tools to transcripts of ADOS conversations involving children with ASD or TD.


Participants. Participants were children ages 4-8, 44 with TD and 25 with ASD without language impairment (CELF Core Language Score above 85). Age, VIQ and NVIQ were matched between ASD and TD groups. ASD diagnosis utilized the ADOS revised algorithm, the Social Communication Questionnaire, and DSM-IV-TR-based clinical consensus.

Context. In order to calculate the semantic similarity at different turn intervals, for each child, we compare every turn pair i and j in the following distance windows: a) W≤3: j is up to 3 turns after i, b) 3<W≤9: j is within 3 to 9 turns after i, c) 9<W≤27, d) 27<W≤81.

Measure. To calculate the similarity between a pair of turns, we use a word overlap measure based on the Jaccard similarity coefficient, which is defined as the number of common words in both turns, relative to the sum of the maximum number of each word in either turn. To assign a higher weight to words specific to a particular child, and a lower weight to the words used frequently by a large number of children (such as function words), we use an inverse document frequency (IDF) term weight. Finally, we compute the SOR for each child by averaging the similarity of every turn pair in the four distance windows.


The ASD group had a significantly higher SOR than the TD group in the four windows using one-sided Mann-Whitney's U test: 

a) W≤3: Mdn(ASD) = 0.166, Mdn(TD) = 0.142, U = 301, p = 0.013, R = 0.27;
b) 3<W≤9: Mdn(ASD) = 0.073, Mdn(TD) = 0.068, U = 397, p = 0.028, R = 0.23;
c) 9<W≤27: Mdn(ASD) = 0.056, Mdn(TD) = 0.048, U = 323, p = 0.002, R = 0.34;
d) 27<W≤81: Mdn(ASD) = 0.044, Mdn(TD) = 0.041, U = 417, p = 0.049, R = 0.20.


The ASD group has significantly higher inter-turn semantic similarity than the TD group, at various turn distance windows. These results support our hypothesis, and could provide a convenient and robust ASD-specific behavioral marker.