Utilizing Extended Families to Prioritize Autism Risk and Protective Variants from Whole Exome Sequencing

Cuccaro, Michael L.

Background: Massively parallel sequencing in autism (AUT) has focused primarily on trio cohorts for identification of de novo loss of function protein coding variants. Extended, multiplex families, with at least one cousin pair with AUT, offer a unique and powerful tool to identify potential new AUT genetic risk loci using identical by descent (IBD) filtering. These pedigrees are likely to carry AUT susceptibility loci of moderate to high effect that may not be identified through de novo identification strategies. Furthermore, using typically developing siblings from these same families we can refine these risk candidate genes to only those with strong effects and identify potentially protective genes segregating in the families.

Objectives: Our study applies WES to extended, multiplex families likely to carry rare, partially penetrant inherited alterations. We hypothesize that separate IBD analysis among AUT individuals and typically developing individuals in these pedigrees will define genomic regions of shared AUT risk or protection and allowing identification of shared risk or protective variants.

Methods: We performed WES on at least two ASD individuals and two typically developing siblings across 14 extended families. Sequencing was performed on the Illumina HiSeq2000 and analyzed through current best practice pipelines including BWA-MEM alignment, quality recalibration by GATK, and variant calling with the GATK HaplotypeCaller. Annotations were applied with ANNOVAR. We determined IBD regions using existing whole genome genotyping data and the MERLIN package first using shared regions in ASD individuals in each family to identify shared risk variants, and then again with non-ASD individuals to identify protective regions. Variants were selected by heterozygous IBD sharing in all ASD (risk) or non-ASD (protective) individuals per family, protein coding effect (non-synonymous), and population frequency (< 5% in the ExAC database). Priority was further determined by removing variants shared with either non-ASD siblings (risk) or ASD siblings (protective).

Results: Filtering for shared risk variants in IBD regions identified 4-123 variants per family and for protective variants yielded 0-96 per family, depending on pedigree structure and resulting IBD region size. After filtering potential risk variants shared by non-AUT individuals there were 0-59 risk variants per family. Similarly, filtering out potential protective variants shared by AUT individuals revealed 0-30 protective variants per family. For risk gene identification, only one gene, FAAP100 (C17orf70), passed all filtering criteria in more than one family. It is highly expressed in the cerebellum though its neuronal function is unknown. Individual families carried risk variants in several high evidence AUT candidate genes (defined by SFARI Gene database) including ANK2, CIC, NF1, SCN7A, and WDFY3. For protective gene identification, again only a single gene, SCN9A, was shared between non-AUT individuals only in more than one family. It is expressed primarily in the hypothalamus and testes and interestingly has been associated with protection against pain and neurodegeneration.

Conclusions: By studying these unique pedigrees, we have identified novel DNA variations related to AUT and demonstrated that exome sequencing in extended families is a powerful tool for ASD risk and protective gene discovery.

32251 Utilizing Extended Families to Prioritize Autism Risk and Protective Variants from Whole Exome Sequencing

32251
Utilizing Extended Families to Prioritize Autism Risk and Protective Variants from Whole Exome Sequencing