16878
The Development of an Intelligent Virtual Reality Intervention Application
Objectives: This current work describes the development and preliminary validation results of a multimodal VR interface and a dynamic fusion strategy for incorporating these modes into within system decision making. This system was constructed to be able to measure and respond in real-time to user speech, gaze patterns, and physiological responses.
Methods: Three interfaces were integrated into a VR interaction platform: 1) speech-based turn-taking dialog management, 2) multi-channel peripheral physiological signal detection (Liu et al., 2008), and 3) an eye gaze sensitive module (see Bekele et al., 2013). While the physiological signal detection and eye gaze modules were developed in previous work, this specific research examined the development of speech-based recognition and integration of all three systems into one interactive environment. For the speech interface we developed domain dependent conversation threads for more reliable speech-based recognition. . We also developed a dialog management engine that parses these threads and perform a lexical comparison between each of the options and what the user utterances as captured from a speech interface module within a specified time interval. Initial validity of these interfaces was tested individually across user studies. Further, the output of the physiological detection algorithm and the gaze sensitive module were assessed during the user performance via multimodal input fusion (Dumas et al., 2009; Jaimes et al., 2007) tested against clinician ratings as a ground-truth for decision making.
Results: Validation results for the dialog management system in terms of the performance of the speech recognition, lexical similarity and overall option selection will be presented and available for real-time demonstration. For the physiological-based affect recognition, we randomly divided the data set in to training, validation and testing sets with proportion of 70%, 15%, and 15%, respectively, for all the classifiers and used 10-fold cross validation to fit the best model in each case. Results of four channel physiological data classification utilize varied machine learning algorithms exceeded 80%.
Conclusions: The current work provides a preliminary demonstration of the ability to develop VR environments and paradigms sensitive to not only performance within systems, but also gaze patterns, physiological responses, and naturalistic speech. The ability to harness such behaviors within future intelligent systems may dramatically enhance VR environments as social intervention tools.