26864
Whole-Genome Sequencing to Detect Rare Noncoding Variants in Autism Spectrum Disorder
Objectives: To assess whether existing hypotheses regarding noncoding risk in ASD replicate in new datasets and to perform an unbiased analysis corrected for multiple comparisons across the combined dataset.
Methods: Whole genome sequencing (WGS) of an additional 1,024 individuals from 256 quartets (256 cases, unaffected sibling controls, and both parents) were compared alongside the existing WGS data for 2,076 individuals in 519 families. De novo SNVs, indels, and structural variants (SVs) were identified using 12 variant discovery algorithms; cross-site validation exceeded 93% for all variant classes. Variants were annotated using an extensive series of noncoding functional annotations at the level of nucleotides, genes, and regulatory regions, resulting in 51,801 combinations of annotation categories. ASD association within each category was assessed using a binomial test to compare variant counts in cases and controls in a Category-Wide Association Study (CWAS). To account for multiple testing, correlations of p-values were assessed between the 51,801 categories from 20,000 sets of simulated variants. Eigenvalue decomposition estimated that 4,123 effective tests explained 99% of the variation.
Results: We did not observe replication of prior hypotheses for noncoding variation in the new samples. Combining these samples with existing data, no annotation category was significant after correcting for 4,123 tests in the CWAS. As before, the lowest p-values were observed in coding regions, including missense variation and SVs not detected by previous technologies. Similarly, no category of rare inherited variants demonstrated parental transmission bias or ASD association.
Conclusions: Our results suggest that there is no clear category of rare noncoding variation with equivalent impact on ASD risk as large SVs or protein-disrupting mutations. Furthermore, given the lack of replication of previous hypotheses, we conclude that identifying noncoding disease associations and quantifying this risk will require a statistically rigorous approach that includes stringent multiple testing correction for this multitude of plausible hypotheses. Analogous to genome-wide association studies of common variation, this approach is likely to identify sound and replicable noncoding associations, but will require substantially larger sample sizes, likely in excess of 5,000 cases.