Genotype-Phenotype-Environment representation of PEGS

The Personalized Environment and Genes Study (PEGS) is a long-term project to collect health, exposure, medical, and genetic data from a diverse group of people in North Carolina. Initiated in 2002, PEGS was formerly called the Environmental Polymorphisms Registry (EPR) and is sponsored by the National Institute of Environmental Health Sciences (NIEHS), which is part of the National Institutes of Health (NIH). A total of 19,445 participants are enrolled, and ~11,000 are actively engaged with annual contact. Enrollment of participants is ongoing.

From 2013-2020, three surveys were used to collect phenotype and exposure data in the cohort. The PEGS Health and Exposure Survey (N = 9,449) collected data on general demographics, family medical history, information on lifestyle factors such as smoking and alcohol use, and data on occupational exposures. Beginning in 2017, the NIEHS PEGS Exposome Survey was administered to collect comprehensive information about endogenous and exogenous exposures throughout life. Part A is focused on external exposures, including chemical and environmental exposures at work and home from childhood to the present. Part B includes questions about internal exposures, including medications, and lifestyle factors such as physical activity, stress, sleep, and diet.

In addition to the survey-based exposure data, address histories (including longest-lived childhood address) are collected. These addresses have been used to link participants with a growing list of geospatial exposure estimates, including air pollution and distance to toxic release and agricultural operation sites.

In addition to the collection of extensive exposure data, whole genome sequencing (WGS) has been conducted for the 4,737 participants with the most complete exposure data. The WGS data enable genetic studies evaluating not only single nucleotide variants but also other variants comprehensively assayed by WGS, including copy number and structural variants, telomere length, and high-resolution human leukocyte antigen (HLA) complex variation.

Efforts are ongoing with both the University of North Carolina at Chapel Hill (UNC) and Duke University to integrate electronic health records for PEGS participants. Together these efforts represent an unprecedented integration of medical information backed by UNC and Duke system electronic medical records (EMRs), exposure information and WGS data. International efforts such as the Tohoku Medical Megabank in Japan, multiple registries in Denmark, and CanPath in Canada have assembled extensive data within their cohorts. In the United States, however, not only is PEGS unique, but PEGS participants are also an active and engaged group.

In summary, PEGS has registered 19,445 North Carolina residents and includes rapidly expanding sets of high-dimensional data that comprise:

  • Responses to health and exposure surveys, internal exposome surveys and external exposome surveys.
  • Electronic Health Records (EHRs) and Electronic Medical Records (EMRs), including International Classification of Diseases (ICD) information.
  • Whole genome sequencing data.
  • Geographic Information Systems (GIS) data.


While most studies focus on a single disease or environmental exposure, PEGS collects data on multiple diseases and environmental exposures, along with information about diet and lifestyle and genetic data. The goal of PEGS is to integrate the large-scale, multi-dimensional data collected and enable researchers to dissect the etiology of diseases and identify the collective effects of environment, diet, lifestyle, and genetic factors on human health.

Participants in PEGS can be called back for follow-up studies, enabling validation efforts, multi-omics data collection, and add-on studies. The ability to call participants back to the Clinical Research Unit has been used by NIEHS investigators for studying phenotype and cellular responses in participants with and without variants of specific interest (phenotype by genotype studies) and has enormous potential for following up potential findings by collecting additional data.

The PEGS cohort is diverse and includes participants of varying age, race, education, and socioeconomic status. This diversity enables researchers to investigate disease risk in multiple populations and uncover health disparities across groups due to disproportionate exposures to environmental factors. Because of this diversity, the findings from research using PEGS data are broadly applicable.

The nearly unprecedented assembly of data on the PEGS cohort presents several unique and important opportunities. Taking advantage of the richness and diversity of PEGS data, scientists can conduct studies on a multitude of topics, including but not limited to:

  • Identifying novel genetic and environmental factors that alter the risk of several common diseases, including diabetes, heart disease, stroke, multiple sclerosis, psoriasis, rheumatoid arthritis, allergies, asthma and cancer.
  • Identifying how multiple genetic and environmental factors jointly increase disease risk.
  • Identifying the interaction of genetic and environmental factors affecting disease risk.
  • Developing risk scores using comprehensive genetic data and varied environmental exposures for potential use in conjunction with clinical data to improve the prediction of disease risk.
  • Identifying differences in risk factors for people of different age, race or ethnicity.
  • Investigating shared exposures and diseases to identify co-occurrences of diseases and environmental exposures associated with multiple diseases.
  • Developing novel analytical methods to analyze gene-by-environment interactions.
  • Ascertaining the functional and phenotypic effects of genetic variants in specific genes of interest.
  • Improving the understanding of the causes and mechanisms of various diseases.