The Autism Speaks Mssng Whole Genome Sequencing Precision Medicine Resource
Autism Spectrum Disorder (ASD) is a highly heterogeneous disorder, both in clinical presentation and genetic architecture. There are hundreds of loci associated with ASD with multiple types of rare and common genome-wide variation contributing risk.
We are performing whole-genome sequencing (WGS) of families with ASD to build a resource, named MSSNG, to enable the sub-categorization of phenotypes and underlying genetic factors involved.
We have created a cloud database containing WGS data and clinical information which is accessible through an internet portal with controlled access. We are soon releasing data to bring the total number of genomes to over 10,000, including new subjects who are part of the Ontario POND-network, The Quebec Transforming Care Consortium (TACC), the British Columbia iTARGET project, the Autism Phenome Project, the Baby Siblings Research Consortium and Autism Speaks AGRE samples (including The Autism Simplex Collection (TASC)). Data are available for single nucleotide variants (SNVs), small insertion/deletions (indels) and copy number variants, with structural variants (SV; including short repeats), and mitochondrial variants also coming.
From our first analysis, 61 genes and ~35 copy number variation (CNV) loci were implicated as contributing to ASD risk based on de novo or X chromosome single nucleotide variants (SNVs) and small indels, with 18 of these genes identified for the first time. Subsequently, we have focused on the identification and characterisation of smaller CNVs and SVs in 7,231 genomes. We have detected CNVs >1 kb in size using our established pipeline combining ERDS and CNVnator, or from Complete Genomics data, and we are now adding SVs detected using algorithms Manta, LUMPY and DELLY. Of the first 3,427 affected subjects analyzed, 6.6% carry an ASD-risk rare CNV, falling into one of four categories: chromosome abnormalities (0.8%), large CNVs >3Mb (0.9%), CNVs corresponding to known genomic disorders (3%) and deletions at known ASD-susceptibility genes or loci (1.9%). A further 1.7% of subjects have rare duplications impacting ASD-risk genes. All other public exome and WGS data is being incorporated into our MSSNG framework genome-wide data through comparative meta-analysis. To enable functional analysis studies of candidate variants, we have generated 63 iPS derived neuronal lines from individuals with ASD, and their familial controls, and another 25 lines using CRISPR modelling in an isogenic line. We have also consented 308 families (more coming) for connection of participants into local health medical record databases, with the aim of searching for environmental influences, and potential medical trends in the data.
The Autism Speaks MSSNG project combines high quality WGS data with phenotype information to facilitate researchers of all backgrounds in studies of the genetic architecture of ASD. Further updates expected for the MSSNG resource include improved portal functionalities, phenotype querying capabilities, re-analysis of data with hg38, and inclusion of epigenetic data. Additionally, a subset of the genomes will be included in worldwide data sharing via the Beacon Network.