20669
Gene Expression, Regulatory Elements and Rare Sequence Variation Identify Genes and Subnetworks Underlying Autism Risk

Friday, May 15, 2015: 4:30 PM
Grand Ballroom A (Grand America Hotel)
A. E. Cicek1, L. Liu2, S. J. Sanders3, A. J. Willsey3, J. Cotney4, R. A. Muhle5, N. Sestan6, J. Noonan4, M. W. State3, B. Devlin7 and K. Roeder2, (1)Ray and Stephanie Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, PA, (2)Statistics, Carnegie Mellon University, Pittsburgh, PA, (3)Psychiatry, UCSF, San Francisco, CA, (4)Genetics, Yale University School of Medicine, New Haven, CT, (5)Yale Child Study Center, New Haven, CT, (6)Neurobiology, Yale University School of Medicine, New Haven, CT, (7)Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA
Background: Whole-exome sequencing (WES) studies have uncovered risk-conferring variation by enumerating de novo variation, which is sufficiently rare that recurrent mutations in a gene provide strong causal evidence. Analysis of rare coding variation in more than 20,000 people has lead to the discovery of dozens of risk genes. Yet, the genetic architecture suggests that autism spectrum disorder (ASD) involves nearly 1000 genes. Using BrainSpan gene expression data, ChIP-seq data from chromatin modifiers, and exome- and genome-wide sequencing data, we aim to accelerate the search for ASD risk genes. DAWN provides a statistical framework for attaining this goal.

De novo loss of function (dnLoF) mutations occur substantially more often in ASD probands than their unaffected siblings. Multiple, independent dnLoF mutations in the same gene implicate the gene in risk and hence provide a systematic, albeit arduous path forward for ASD genetics. Willsey et al. (2013) identified brain gene coexpression networks as meaningful for organization and inter-relationships of ASD genes; and identified the mid-fetal prefrontal and motor-somatosensory cortex as the developmental periods and regions in which these genes tend to coalesce to confer risk to ASD.

Objectives: Co-expression networks will be estimated from spatially and temporally rich mRNA expression data from developing human brain. Using these co-expression networks along with targets of chromatin modifiers determined by ChIP seq, we aim to identify genes and subnetworks of genes that affect risk for autism.

Methods: We build on the DAWN algorithm, to model three kinds of data: rare variants from whole-exome and whole-genome sequencing; gene co-expression; and regulatory elements and their impact on gene expression. The algorithm casts the ensemble data as a Hidden Markov Random Field in which the graph structure is determined by gene co-expression and co-regulation. The algorithm combines these interrelationships with node-specific observations, namely gene identity, expression, genetic data, and its estimated effect on risk to identify risk genes.

Results: DAWN identifies novel genes and gene networks that plausibly affect ASD risk. Using recent sequencing data, a significant fraction of these genes are validated. In the validation experiment, DAWN was able to distinguish genes that will accumulate new dnLoF mutations better than any existing method. Moreover, incorporating binding events of ASD-associated chromatin modifiers, identified by ChIP-seq, significantly increased the detection of ASD risk genes.

Conclusions: Focusing on human mid-fetal prefrontal and motor-somatosensory cortex, we obtain a predicted list of risk genes that is enriched for genes under evolutionary constraint. A subnetwork obtained by seeding these genes within a high-confidence protein-protein interactome confirms that the putative genes are enriched for neuronal functions. Enrichment analysis reveals genes involved in synaptic transmission and cell-cell communication, histone modifying enzymes and DNA binding proteins and neurodegenerative disorders. Finally, the power of the DAWN prediction model is enhanced by incorporating the targets of histone modifier genes such as CHD8 into the model.