Objectives: We focus on childhood autism risk estimation. The new effort extends the previous child vocalization features to the interactive and environmental ones, demonstrating the richness of the information that audio signal can provide. The presentation interactively demonstrates the hardware-software framework, the obtained features of stable macro-statistics, the related developmental trends, patterns and final results for children with typical development, language delay and autism.
Methods: A lightweight digital recorder is worn by a child for a whole day to collect his/her sound and environment sound. Pattern recognition is used to automatically detect different sound segments, including key-child, other-child, adult-male, adult-female, overlapped-sounds, noise, TV and silence, producing a sequence of segment labels. Key-child segments are further decomposed to generate vocalization composition features. The overlapped sounds that a key-child collides with other sounds can reflect certain interactive behavior and can be characterized by the same decomposition scheme. The Markov-chain-type statistics of segment sequence related to a key-child are also characterizing the way the child interacts with the environment. It was also found that parents and children with autism tend to have more “near-distance-talk” than “far-distance-talk” and the distance-correlated loudness (dB-level) can also be utilized. All these features are modeled with a proposed Ada-Boosting method for autism risk estimation. Leave-one-child-out cross-validation is used to validate the obtained results.
Results: Data set: Typical-development: 802-recordings, 106-children; Language-delay: 333-recordings, 49-children; Autism: 228-recordings, 71-children. Age: 8-48-month. No recording is under 9 hours and contains therapy which may interfere with interactive-environmental-features. Equal sensitivity and specificity (ESS) is used as performance-measure for 3 tasks at recording or child level (by combining recordings for a same child): autism-versus-others (ASD-OTH); autism-versus-typical-development (ASD-TD); autism-versus-language-delay (ASD-LD). The previous vocalization features yield ESSs: recording: ASD-OTH-88.2%, ASD-TD-89.7%, ASD-LD-80.8%; child: ASD-OTH-89.1%, ASD-TD-90.6%, ASD-LD-81.7%. By incorporating new features, the ESSs are: recording: ASD-OTH-91.3%, ASD-TD-93.9%, ASD-LD-86.5%; child: ASD-OTH-93.0%, ASD-TD-94.4%, ASD-LD-88.8%.
Conclusions: The results demonstrate the effectiveness of the innovative way for child behavior and environment monitoring, and the power of macro-statistics of large samples and its robustness to variations, noises and machine errors. The developed framework is highly potential as an effective and efficient autism screen tool, and can be used for treatment and effect monitoring.