Improving Genetic Association Approaches for Risk Gene Discovery in ASD

Oral Presentation
Thursday, May 10, 2018: 10:55 AM
Jurriaanse Zaal (de Doelen ICC Rotterdam)
B. K. Sheppard1, J. Wang2, M. Peng2, J. Y. An1, M. State1, B. Devlin3, K. Roeder2 and S. Sanders1, (1)Psychiatry, University of California San Francisco, San Francisco, CA, (2)Carnegie Mellon University, Pittsburgh, PA, (3)Univ of Pittsburgh School of Medicine, Pittburgh, PA
Background: Sequencing studies of autism spectrum disorder (ASD) have successfully identified dozens of risk genes, particularly through loss of function (LoF) de novo mutations (DNMs). However, it has been estimated that hundreds of genes contribute to the genetic architecture of ASD. Recent work by the Daly lab has shown that using Exome Aggregation Consortium (ExAC) allele frequencies can help distinguish risk from non-risk variants (Kosmicki et al. 2017). Here we assess the impact of this on gene discovery.

Objectives: Our study aims to adapt existing gene discovery methods to incorporate ExAC allele frequency data. Additionally, we will present a new ASD gene list based on this methodology after incorporating additional samples.

Methods: We used the Transmission and De Novo Association (TADA) method to identify ASD risk genes using LoF and probably damaging missense (Mis3) DNMs in whole exome sequencing (WES) data on 4,109 ASD probands. TADA is a hierarchical Bayesian model that incorporates information from various functional categories by parameterizing each category by its average relative risk derived from estimates of genome-wide burden and hypothesized number of risk genes. Burden estimates were obtained using a subset of 1,911 probands and their unaffected siblings. TADA was used to identify gene lists passing false discovery rate (FDR) at 10% for: 1) the full set of high confidence LoF and Mis3 DNMs and 2) a subset of DNMs after filtering mutations present in the ExAC database. These two gene lists were then compared using the GeNets Metanetwork protein-protein interaction algorithm. We will also incorporate data from targeted sequencing of 250 putative ASD risk genes in 14,208 samples from 5,357 ASD families, incorporating additional priors based on allele frequency observed in ExAC and gene-based conservation scores.

Results: Using the full set of mutations, genome-wide burden estimates for LoF and Mis3 DNMs were 1.99 (average RR = 20.74) and 1.08 (average RR = 2.64), respectively. Multiple testing correction yielded 57 genes with FDR q-value < 0.10. After filtering DNMs present in the ExAC database, approximately 9% of LoF and 29% of Mis3 mutations were removed, increasing the LoF burden estimate to 3.16 (average RR = 44.18) and the Mis3 burden estimate to 1.11 (average RR = 3.11). This resulted in 55 genes with FDR q-value < 0.10. Fourteen genes identified by the previous analysis slipped below the detection threshold, while 12 new genes from this list were raised above this threshold. Despite minimal change in the number of associated genes, network analysis revealed improvement in overall functional connectivity for the new gene list (network p-value = 0.002) compared to the list identified in the first analysis (network p-value = 0.09).

Conclusions: As hypothesized, filtering ExAC variants from our DNM list resulted in higher estimates of genome-wide burden when compared to unaffected siblings. This yielded a refined list of ASD risk genes, which was similar in size, but overall more functionally connected based on network analysis. Incorporation of additional priors to the model and new samples are expected to further improve quality of ASD gene lists.