Scalable Sequencing Pipeline on Cloud

Thursday, May 14, 2015: 5:30 PM-7:00 PM
Imperial Ballroom (Grand America Hotel)
J. Y. Jung1,2, A. Lancaster1,2, Y. Souilmi1, P. J. Tonellato1 and D. Wall2, (1)Center for Biomedical Informatics, Harvard Medical School, Boston, MA, (2)Stanford University, Palo Alto, CA
Background: While sequencing methods have been approved as clinical diagonosis tools, computation time and cost are still substantial barriers preventing the use of next generation sequencing (NGS) in a clinical setting. For this technology to be adopted by clinicians, the turnaround time of genomic data analysis must be within hours and the cost of rendering to clinically actionable information should be reduced to the level of typical lab test costs. 

Objectives: Here we introduce our cloud-based NGS analysis pipeline and benchmark results of all public whole exome and genome data from autism studies.

Methods: We built a generic workflow management system running on clouds, then implemented Genome analysis toolkit (GATK) workflow as our NGS pipeline. We tested this system on Amazon Web Service (AWS) platform with all autism whole exome and whole genome data sets available to us, in order to examine scalaiblity and cost-effectiveness of our pipeline.

Results: Test results showed that the pipeline works in a scalable manner up to hundred exomes or genomes which is a typical batch size in sequencing. We will also discuss our findings on autism specific data in joint variant calling and characteristics of rare / de novo / knockout variants.

Conclusions:  N/A

See more of: Genetics
See more of: Genetics