The Autism Speaks Mssng Whole Genome Sequencing Resource
Autism Spectrum Disorder (ASD) is a highly heterogeneous disorder, both in clinical presentation and genetic architecture. There are many, perhaps hundreds, of loci associated with ASD with multiple forms of genetic variation contributing risk variants.
We are performing whole genome sequencing (WGS) of families with ASD to build a resource, named MSSNG, to enable the sub-categorization of phenotypes and underlying genetic factors involved.
We have created a cloud database containing WGS data and clinical information which is accessible through an internet portal with controlled access. We have recently released new data to bring the total number of genomes to 7,235, including subjects who are part of the POND-network and Baby Siblings Research Consortium. Data are available for single nucleotide variants (SNVs), small insertion/deletions (indels) and copy number variants.
From the first 5,205 genomes, we detected on average 73.8 de novo SNVs and 12.6 de novo indels per ASD subject and identified 18 new candidate ASD-risk genes, such as MED13 and PHF3. In total, by including de novo SNVs and indels and large copy number variants (CNVs), a molecular diagnosis could be determined for 11.2% of ASD cases (Nature Neurosciences, 2017). Analysis of the CNV data from these subjects found an average of 22.7 rare (<1% frequency) CNVs >1kb in size per individual sequenced on Illumina platforms, and 7.87 rare CNVs >2kb per individual sequenced by Complete Genomics. Of these, an average of 9.89 and 4.78, respectively, impacted protein-coding regions of genes. We are now analysing structural variant calls using multiple different tools; CREST, LUMPY, Manta and DELLY. The MSSNG phenotype database is also being expanded, and dozens of families are being added with multigenerational pedigrees, multiple affected siblings, and participants from clinical trials. Moreover, epigenetic analysis of DNA samples with data from methylation microarrays is adding additional functional data to the genomic information, and all of this is made available to the research community through a simple MSSNG user interface (portal).
The Autism Speaks MSSNG project combines high quality WGS data with phenotype information to facilitate researchers of all backgrounds in studies of the genetic architecture of ASD.