Objectives: Design a high-level architecture for the open-source RexDB platform that will: (1) communicate key functionality to stakeholders; (2) define clear boundaries and interfaces between components; (3) support all the key functions of integrated data management including data acquisition, integration, curation, and use; (4) support configuration of studies and instruments by non-programmers; (5) support extensibility by third-party developers; and (6) encourage adoption of the open-source platform by the research/data manager/developer communities.
Methods: We reviewed publicly available architectures for several relevant projects and extracted key insights. We then created several mock architecture diagrams and reviewed each for technical feasibility, communication effectiveness, and compliance with the architectural objectives. After an internal review, we conducted a focus group to gather input from relevant stakeholders.
Results: Internal and external feedback suggested revisions to the architecture that were implemented as a consensus solution. The resulting architecture separated out the data acquisition component as a stand-alone system, RexAcquire. This system has minimal dependencies, does not require a database, and can be scaled quickly across many servers to support very large simultaneous data collection efforts. RexAcquire consumes configuration files that specify the instruments to be administered and their display logic; it emits raw measure results files. The largest component, RexCollect, encompasses the major data integration and curation functions, including instrument and study configuration, data transform rules, data quality checks, and user management. It consumes the raw measure data files emitted by RexAcquire and transforms them into appropriate relational data structures; it emits instrument configuration files that RexAcquire consumes. RexCollect also allows users to select large subsets of data across studies, data types, and sites, and generate new temporary relational databases for exploring the extracted subsets. Another relatively stand-alone component, RexPlore, supports the exploration of the extracted subsets and generation of tabular data sets for statistical analysis.
Conclusions: A platform architecture must attempt to balance many competing forces. Functionality, technical feasibility, and communication effectiveness rarely pull a design in the same direction. We developed and applied a process that attempts to balance these competing demands in an open-source project by involving various stakeholders early in the design process. The result is an architecture that we believe is a better compromise, and that will contribute to the success of RexDB as a common integrated data management platform for autism research.