Confronting the issue of heritability in large-scale genetic studies
By Eddy Ball
In the first of two symposia in May on large-scale genetic studies, world-renowned experts tackled the genome-wide association studies (GWAS) dilemma — a proliferation of data that has frustrated scientists by thus far failing to fully realize the potential for better understanding disease risk and host response.
GWAS examine many common genetic variants in different individuals to see if any variant is associated with a trait. GWAS typically focus on associations between single nucleotide polymorphisms (SNPs) and traits such as major diseases. Sample sizes have grown progressively larger due to high-throughput screening capabilities, surveying millions of SNPs in recent studies.
Organized by NIEHS Biostatistics Branch lead researcher Dmitri Zaykin, Ph.D., and research fellow Chia-Ling Kuo, Ph.D., the symposium May 24-25 on “Emerging issues in analysis and design of large scale genetic studies” featured eight presentations on large-scale association studies, moderated by NIEHS epidemiologists Jack Taylor, M.D., Ph.D., and Stephanie London, M.D., Dr.P.H. Among the topics addressed were large-scale aspects of modern genetic studies, advances in identification of genuine signals, the problem of missing heritability, design of discovery and replication stages of studies, risk magnitude distribution in the genome, pathway analyses, and approaches for analysis of rare variants.
“It’s absolutely clear that this issue of differential host response to the environment is just pivotal to any of the goals associated with our strategic plan,” said NIEHS Deputy Director Rick Woychik, Ph.D., in welcoming remarks. “Why is it that, although we are all exposed to the same environment, there are different health consequences?”
In his keynote address, University of Washington biostatistician and geneticist Bruce Weir, Ph.D., (http://www.gs.washington.edu/faculty/weir.htm) discussed classic studies in the heritability of human height, beginning with the data and findings published in 1886 by Francis Galton. While family studies have demonstrated that 80 percent of height is heritable, search for genetic variants associated with height could not account for more than 10 percent. Weir presented recent advances in statistical methodology that increased that figure to 45 percent and discussed ways to account for the remaining 35 percent. Weir’s fellow presenter Daniel Stram, Ph.D., (http://www.usc.edu/uscnews/experts/1042.html) of the University of Southern California, as well as other speakers on the program, are striving to account for the complexity of heritability, by better capturing the strong polygenic signal created by additive heritability of many common variants.
A common theme among the statisticians at the meeting was the need for even larger sample sizes, for utilizing hidden heritability associated with common SNPs, by building statistical models that are more sensitive to hidden stratification, to unravel the genetic architecture involved. Speakers proposed several statistical refinements, such as variance components methods, ensemble or set testing that aggregates individual features, and a prototype similarity collapsing approach for more effectively capturing additive and non-additive effects among markers.
As part of a symposium that was filled with statistical discussion, University of Chicago geneticist Nancy Cox, Ph.D., (http://genes.uchicago.edu/contents/faculty/cox-nancy.html) spoke on tying biological function to analyses, and North Carolina State University geneticist Trudy Mackay, Ph.D., (http://cals.ncsu.edu/genetics/index.php/people/trudy-mackay) addressed the genetic and environmental factors affecting variation in quantitative traits, using Drosophila as a model system.
Cox’s focus in her talk was on the role of transcriptional function in the effects of SNPs on disease and risk. “For me,” she said, “it’s more about function.” Referring to results from her bipolar study, she explained that effects of SNPs may vary from tissue to tissue, depending on whether there is significant enrichment by cis-acting elements, DNA sequences in the vicinity of the structural portion of a gene that are required for gene expression, or by trans-acting factors that bind to cis-acting sequences to control gene expression.
For Mackay, there are insights to be gained from recapitulating known biological pathways in model organisms that retain the same function as they do for humans, as a large-scale association study general strategy test. Supported by NIEHS funding, Mackay mated Drosophila, randomly through more than 70 generations, to create diversity for studying differences in genetic architecture among gene networks and expanding the findings of a genome scan.
The challenges of large-scale association studies
As the symposium at NIEHS demonstrated, there are two major approaches to teasing more translatable information from the volumes of data available from large-scale association studies. One deductive approach is to refine analysis methodology to give associations between data and endpoints more statistical power. The second, more inductive in nature, involves grounding large-scale association study statistical analysis in biology itself, through analysis of function and orthological patterns with model species.
Both approaches seek to illuminate what has been described as the dark matter of the genome — missing heritability. Current statistical approaches are limited in that they account for only the additive part of heritability — the variation transmittable from parents to offspring. However, joint effects of allelic variants, while genetic, are not transmittable in the same way, because specific combinations of variants are broken down by recombination.
As Zaykin explained, the non-additive part can be substantial. There is also a problem of predicting individual risk. It is one question to explain genetic variation using a sample. It is another, and still unsolved, question of how to predict risk for an individual given his or her sequence data. The technological limitation is the still incomplete coverage of all variants of individual genomes.