Note: Most Internet Explorer 8 users encounter issues playing the presentation videos. Please update your browser or use a different one if available.

Architecting an Open-Source Research Exchange Database (RexDB): How to Design a System That Supports Integrated Data Management for Multidisciplinary Autism Research

Friday, 3 May 2013: 09:00-13:00
Banquet Hall (Kursaal Centre)
L. Rozenblit, F. Farach, D. Voccola and C. Tirrell, Prometheus Research, LLC, New Haven, CT
Background: The need for effective, centralized management of biomedical and behavioral research data has expanded dramatically in the past decade, especially in interdisciplinary areas like autism research, where longitudinal and multi-center projects require secure, flexible data management and sharing platforms. However, current solutions are either too expensive or insufficiently flexible. Our team has developed an integrated data management platform that has been used at over a dozen leading autism research centers over the last 7 years. In late 2011, we began to package this platform as an open-source project to make it freely available to all autism research centers and programs. Defining a suitable high-level architecture for this platform was a key early challenge.

Objectives: Design a high-level architecture for the open-source RexDB platform that will: (1) communicate key functionality to stakeholders; (2) define clear boundaries and interfaces between components; (3) support all the key functions of integrated data management including data acquisition, integration, curation, and use; (4) support configuration of studies and instruments by non-programmers; (5) support extensibility by third-party developers; and (6) encourage adoption of the open-source platform by the research/data manager/developer communities.

Methods: We reviewed publicly available architectures for several relevant projects and extracted key insights. We then created several mock architecture diagrams and reviewed each for technical feasibility, communication effectiveness, and compliance with the architectural objectives. After an internal review, we conducted a focus group to gather input from relevant stakeholders.

Results: Internal and external feedback suggested revisions to the architecture that were implemented as a consensus solution. The resulting architecture separated out the data acquisition component as a stand-alone system, RexAcquire. This system has minimal dependencies, does not require a database, and can be scaled quickly across many servers to support very large simultaneous data collection efforts. RexAcquire consumes configuration files that specify the instruments to be administered and their display logic; it emits raw measure results files. The largest component, RexCollect, encompasses the major data integration and curation functions, including instrument and study configuration, data transform rules, data quality checks, and user management. It consumes the raw measure data files emitted by RexAcquire and transforms them into appropriate relational data structures; it emits instrument configuration files that RexAcquire consumes. RexCollect also allows users to select large subsets of data across studies, data types, and sites, and generate new temporary relational databases for exploring the extracted subsets. Another relatively stand-alone component, RexPlore, supports the exploration of the extracted subsets and generation of tabular data sets for statistical analysis.

Conclusions: A platform architecture must attempt to balance many competing forces. Functionality, technical feasibility, and communication effectiveness rarely pull a design in the same direction. We developed and applied a process that attempts to balance these competing demands in an open-source project by involving various stakeholders early in the design process. The result is an architecture that we believe is a better compromise, and that will contribute to the success of RexDB as a common integrated data management platform for autism research.

| More