26865
Improving Genetic Association Approaches for Risk Gene Discovery in ASD
Objectives: Our study aims to adapt existing gene discovery methods to incorporate ExAC allele frequency data. Additionally, we will present a new ASD gene list based on this methodology after incorporating additional samples.
Methods: We used the Transmission and De Novo Association (TADA) method to identify ASD risk genes using LoF and probably damaging missense (Mis3) DNMs in whole exome sequencing (WES) data on 4,109 ASD probands. TADA is a hierarchical Bayesian model that incorporates information from various functional categories by parameterizing each category by its average relative risk derived from estimates of genome-wide burden and hypothesized number of risk genes. Burden estimates were obtained using a subset of 1,911 probands and their unaffected siblings. TADA was used to identify gene lists passing false discovery rate (FDR) at 10% for: 1) the full set of high confidence LoF and Mis3 DNMs and 2) a subset of DNMs after filtering mutations present in the ExAC database. These two gene lists were then compared using the GeNets Metanetwork protein-protein interaction algorithm. We will also incorporate data from targeted sequencing of 250 putative ASD risk genes in 14,208 samples from 5,357 ASD families, incorporating additional priors based on allele frequency observed in ExAC and gene-based conservation scores.
Results: Using the full set of mutations, genome-wide burden estimates for LoF and Mis3 DNMs were 1.99 (average RR = 20.74) and 1.08 (average RR = 2.64), respectively. Multiple testing correction yielded 57 genes with FDR q-value < 0.10. After filtering DNMs present in the ExAC database, approximately 9% of LoF and 29% of Mis3 mutations were removed, increasing the LoF burden estimate to 3.16 (average RR = 44.18) and the Mis3 burden estimate to 1.11 (average RR = 3.11). This resulted in 55 genes with FDR q-value < 0.10. Fourteen genes identified by the previous analysis slipped below the detection threshold, while 12 new genes from this list were raised above this threshold. Despite minimal change in the number of associated genes, network analysis revealed improvement in overall functional connectivity for the new gene list (network p-value = 0.002) compared to the list identified in the first analysis (network p-value = 0.09).
Conclusions: As hypothesized, filtering ExAC variants from our DNM list resulted in higher estimates of genome-wide burden when compared to unaffected siblings. This yielded a refined list of ASD risk genes, which was similar in size, but overall more functionally connected based on network analysis. Incorporation of additional priors to the model and new samples are expected to further improve quality of ASD gene lists.