Skip Navigation

Your Environment. Your Health.

Emerging Issues in Analysis and Design of Large Scale Genetic Studies


May 24-25, 2012
NIEHS, Rodbell Auditorium


 Emerging Issues in Analysis and Design of Large Scale Genetic Studies 


Eight talks by world-renowned leaders in the field will discuss large-scale aspects of modern genetic studies, advances in identification of genuine signals, the problem of missing heritability, design of discovery and replication stages of studies, risk magnitude distribution in the genome, pathway analyses, and approaches for analysis of rare variants.





Downloadable PDF Poster

Poster (9MB)


Agenda - Schedule

May 24 - Topic: Finding Heritability

13:00-13:10Welcome by Dmitri Zaykin and Richard Woychik
13:10-14:00"The heritability of human height"
Bruce Weir (keynote speaker), University of Washington
14:00-14:45"Estimating and interpreting heritability from genome wide association studies"
Daniel Stram, University of Southern California
15:10-15:55"The return of the common allele: heritability in complex traits"
Nancy Cox, University of Chicago
15:55-16:40"Hidden heritability in genome-wide association studies and risk prediction"
Nilanjan Chatterjee, National Cancer Institute
16:40-17:00Closing remarks and discussion by Dmitri Zaykin


May 25 - Topic: Advances in Statistical Genetics

09:00-09:10Welcome by Dmitri Zaykin
09:10-09:55"When is a large sample not-so-large? Problems with inference in high-throughput studies, with some solutions"
Kenneth Rice, University of Washington
09:55-10:40"Similarity collapsing approach for gene-level analysis on common and rare variants with general traits"
Jung-Ying Tzeng, North Carolina State University
11:10-11:55"The genetic architecture of quantitative traits: lessons from Drosophila"
Trudy Mackay, North Carolina State University
11:55-12:40"Pathway analysis and ensemble testing with correlated features"
Fred Wright, University of North Carolina at Chapel Hill
12:40-13:00Closing remarks and discussion by Dmitri Zaykin


Agenda - Details

Details for May 24, 13:00-17:00

Topic: Finding Heritability.


13:10-14:00: Bruce Weir, Professor and Chair, Departments of Biostatistics & Genome Sciences, University of Washington.
Title: The heritability of human height.
Abstract: In 1886 Francis Galton published data on heights for people and their parents. He showed that people’s heights tended to be closer to the population mean height than was the average of their parents' heights, introducing the concept of "regression to the mean." He went on to show that the relationship between the heights of pairs of people depends on the degree of relatedness between the pair. His work was replicated by Karl Pearson in 1903, three years after the rediscovery of Mendel's Laws and "in the present controversial phase of the theory of heredity." With the introduction of quantitative genetic models (and the analysis of variance) by R.A. Fisher in 1918 we now express the correlation in heights for pairs of people in terms of their relatedness and the heritability of height. Heritability of a trait is the portion of variance in trait values that has an (additive) genetic component. By measuring heights on pairs of people of known family relatedness, geneticists have estimated the heritability of human height to be about 0.80. The recent flurry of genome-wide association studies has revealed many genetic markers, SNPs, associated with height -- a 2010 publication listed 135 from a meta-analysis of 133,653 heights. However, these SNPs collectively accounted for only 10% of the variation in height and the search began for the "missing heritability." Using data from the GENEVA project that have been processed in our department, P.M. Visscher has extended the early work of Galton, Pearson and Fisher by using all the SNPs scored in a genome-wide scan, and by using measures of relatedness estimated from these SNPs instead of being inferred from family history. He could account for 45% of the variation. I will explain his approach (Yang et al., Nature Genetics 43:519-525, 2011) and suggest ways to account for the remaining 35%.


14:00-14:45: Daniel Stram, Professor, Division of Biostatistics and Genetic Epidemiology, Department of Preventive Medicine, University of Southern California.
Title: Estimating and interpreting heritability from genome wide association studies.
Abstract: Many studies have investigated the heritability of complex traits in genome wide association studies (GWAS). To date the top associations (i.e. those judged globally significant after correction for multiple comparisons) for most complex traits only seem to explain a relatively small fraction of observed trait heritability, as estimated in family studies.  Various interpretations of these findings are possible: there is intense current interest, for example, in determining the role that rare variation may play in the genetic architecture of these traits. Others, for example Zuk et al (PNAS 2012) argue that narrow-sense additive heritability has been over-estimated for many traits and that simple polygenic approaches (summing the effects of many variants with no epistatic contribution) cannot be expected to explain either individual phenotype variation or trait resemblance between close relatives. However other analyses of complex traits, e.g. those of Yang et al (Nat Genet, 2010, 2011), find a signal of strong additive heritability in GWAS data that involve common variants, even though identification of the particular common variants that make contributions to this strong polygenic signal is not yet possible (presumably because the signal of each is very small). An extreme version of this is embodied in the analyses of Purcell et al (Nature 2009) in which polygenic scores involving half of all markers examined appeared to be predictive of schizophrenia and bipolar disorder. In this talk I will review these arguments, discussion the estimation of additive heritability in GWAS data using variance components methods, the sensitivity of these methods to low levels of hidden stratification, and describe ongoing analyses of the heritability of three phenotypes: height, prostate cancer, and breast cancer, using GWAS studies taking place within a multiethnic cohort.


15:10-15:55: Nancy Cox, Professor and Section Chief, Section of Genetic Medicine, Department of Medicine. Professor, Department of Human Genetics, University of Chicago
Title: The Return of the Common Allele: Heritability in Complex Traits.
Abstract: Although the heritability attributable to highly significant and reproducible associations discovered through GWAS is low for most common diseases and complex traits, it appears that the total heritability attributable to all variants interrogated by GWAS can be much higher, accounting in some cases for virtually all of the heritability estimated in family studies. It should now be possible to utilize such studies to hone in on classes of genetic variation contributing disproportionately to the risk to complex diseases, as well as to systematically investigate the shared genetic architecture of potentially related phenotypes. We illustrate these ideas with examples from a variety of complex disorders.


15:55-16:40: Nilanjan Chatterjee, Chief of the Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute.
Title: Hidden heritability in genome-wide association studies and risk prediction.

Abstract: Common SNPs discovered through modern genome-wide association studies (GWAS) have limited ability for prediction of individual traits, but recent estimates of "hidden heritability" indicate that predictive power of polygenic model can potentially be enhanced in the future. We provide new statistical characterization of the predictive performance of a polygenic model based on  the number and distribution of effect sizes for the underlying susceptibility SNPs, the sample size of the training dataset and the balance of true and false positives associated with the underlying SNP selection criterion.  Using this framework and an effect-size distribution that is consistent with results reported from a large GWAS study, we project that while 45% of total variance of adult height has been attributed to common variants, a predictive model built based on as many as one million people may only explain 33 .4%  of variance of the trait in an independent sample.  Emerging effect-size distributions across a total of additional eight different complex traits, including five chronic diseases, also indicate that in general very large samples are needed for utilizing hidden heritability associated with common SNPs. The analysis suggests ultimate utility of polygenic predictive models will depend on achievable sample sizes, heritability, information on other risk-factors including family history, and existence of intervention strategies for the relevant traits.


Details for May 25, 9:00-13:00

Topic: Advances in Statistical Genetics.


9:10-9:55: Kenneth Rice, Associate Professor, Department of Biostatistics, University of Washington.
Title: When is a large sample not-so-large? Problems with inference in high-throughput studies, with some solutions.
Abstract: Many analyses of high-throughput data require extreme levels of significance, much further "out in the tails" of reference distributions than usual. Unfortunately, standard epidemiological intuition and rules of thumb developed to determine results near p = 0.05 levels of significance need not apply in the region of, say, p=5x10-8. In this talk, we illustrate high-throughput settings where doing "the usual thing" with regression-based tools gives poor statistical results. The examples are drawn from work with Genome-Wide Association Studies (GWAS), but the consequences are generic, and could apply to any high-throughput data-analysis featuring confounding, interaction, or prediction. The immediate goal of the talk is to to illustrate the the scale of the problems in analysis of modern high-throughput data, and to motivate better understanding of why `the usual thing' does not work well. Several improvements to standard practice will also be suggested.


9:55-10:40: Jung-Ying Tzeng, Associate Professor, Department of Statistics, North Carolina State University.
Title: Similarity collapsing approach for gene-level analysis on common and rare variants with general traits.
Abstract: We introduce a gene-trait similarity model to aggregate information from loci that are in the same gene or exonic region to study genetic effects . The method uses genetic similarity to aggregate information from multiple polymorphic sites, with adaptive weights dependent on allele frequencies and functionality scores to signify rare and common functional variants. Collapsing information at the similarity level instead of the genotype level avoids canceling signals with opposite etiological effects, is applicable to any class of genetic variants without having to dichotomize the allele types, and can capture non-additive effects among markers.  To assess gene-trait associations, trait similarities for pairs of individuals are regressed on their genetic similarities, with a score test whose limiting distribution is derived. We show how this framework can be applied to various trait types such as continuous, binary, and survival traits.


11:10-11:55: Trudy Mackay, Professor, Department of Genetics, North Carolina State University.
Title: The genetic architecture of quantitative traits: lessons from Drosophila.
Abstract: The Drosophila melanogaster Genetic Reference Panel (DGRP) consists of 192 sequenced inbred strains derived from the Raleigh, NC population. The DGRP is a community resource for genome wide association (GWA) analyses for quantitative traits in a scenario where all single nucleotide polymorphisms are genotyped. The large amount of quantitative genetic variation, lack of population structure and rapid local decay of linkage disequilibrium in the DGRP present a favorable scenario for identifying candidate causal genes and even polymorphisms affecting complex traits. Further, we have derived a large outbred, advanced intercross population from a subset of 40 DGRP lines (Flyland) to test multi-locus predictions derived from the DGRP. We performed bulk segregant sequencing for pools of individuals from the Flyland population with extreme phenotypes, and mapped variants associated with the traits with significant allele frequency differences between the high and low pools. I will discuss inferences about the contribution of rare and common alleles, and of additive and epistatic gene action, obtained from analyses of genetic architecture in the DGRP and Flyland populations, and the implications of the Drosophila data for studies in other species, including humans.


11:55-12:40: Fred Wright, Professor, Department of Biostatistics, University of North Carolina at Chapel Hill.
Title: Pathway analysis and ensemble testing with correlated features.
Abstract: In the gene expression and association analysis literature, the term "pathway analysis" has accrued a generic meaning to describe various sorts of ensemble testing, in which tests of individual features, such as genes or SNPs, are aggregated.  Simple methods which  assume independence of the features are still popular, despite clear evidence of strong and pervasive correlation.  For pathway test statistics based on sums of score statistics, the empirical moments under a permutation null can be solved, and provide insight into the effects of the correlation patterns.  We describe current work on analytic approximations to pathway permutation testing, with an attention to accuracy for small p-values.  We show applications to gene expression pathway analysis and SNP-set testing, including special cases of burden testing of rare variants and calculation of gene-based p-values in genome-wide  association scans.



Open to public. Visitors please come early and bring a photo ID and a copy of the announcement to go through the security.


The NIEHS has moved to a higher level of security awareness. More stringent requirements for access to NIEHS' campus have been implemented. Any individual seeking access to the NIEHS campus to attend a conference/seminar will need to be prepared to show two forms of identification, i.e., driver's license plus one of the following: company ID, government ID or university ID and to bring a copy of the announcement of this symposium.



The symposium will take place in Rodbell Auditorium at the NIEHS Main Campus.
For directions, please visit



See the following PDF file for various hotels in the nearby area:

Hotels in the Research Triangle Park area (30KB)






Chia-Ling Kuo, Ph.D. (
Research Fellow
Tel (919) 541-0754
Fax (919) 541-4311
Dmitri Zaykin, Ph.D. (
Principal Investigator
Tel (919) 541-0096
Fax (919) 541-4311

Back to Top