PEGS Data Freezes
The PEGS data are stored securely in a single centralized, shared repository to ensure consistent, reproducible, and comparable analyses. PEGS comprises a compatible, multi-dimensional collection of datasets in consistent and programmatically extractable formats, as shown in the figure on the right, and Data Components below. PEGS data are updated on a quarterly basis with additional participants, new variables, participant updates, and any additional data components. We are continually building analysis pipelines and workflows to enable efficient, reproducible, insightful, and collaborative research using the PEGS data.
Data Components
Data components available to researchers from the PEGS cohort are listed with their description and sample size (the number of participants). The latest versions of the administered participant surveys are also provided.
Category | Component | Description | Documents | Number of Participants |
---|---|---|---|---|
Survey Data | Demographic and Administrative Data | Demographics, consent, address and administrative data for all participants | 19,445 | |
Health & Exposure Survey | Demographics, health, family history of disease, environmental exposures, socioeconomic status and lifestyle | Health & Exposure Survey (338KB) | 9,449 | |
External Exposome Survey (Exposome A) | Residential and occupational environmental exposures | External Exposome Survey (27MB) | 3,618 | |
Internal Exposome Survey (Exposome B) | Medication use, physical activity, stress, sleep, diet, genetics and reproductive history | Internal Exposome Survey (13MB) | 3,071 | |
Diabetes Screener Survey | Diabetes screener administered to participants with self-reported diabetes | Diabetes Screener Survey (69KB) | 227 | |
Eczema Screener Survey | Eczema screener administered to participants with self-reported eczema | Eczema Screener Survey (92KB) | 329 | |
Right-not-to-know Main Survey | Right-not-to-know Survey administered for incidental findings reports | 231 | ||
Right-not-to-know Cognitive Interview Survey | Right-not-to-know Cognitive Interview administered to assess awareness of incidental findings reports | Right-not-to-know Cognitive Interview Survey (1MB) | 12 | |
Medication Data | Anatomical Therapeutic Chemical (ATC) Codes | ATC codes for self-reported free-text medication names from the Internal Exposome Survey (Exposome B) as per the World Health Organization's (WHO's) ATC classification system | 2,263 | |
Geospatial Data | Geocodes (GIS) | Geocoded participant addresses from five study events with mapping coordinates | 18,462 | |
Hazards Data | Exposure estimates and proximity measures calculated using geospatial linkages from the following databases - Atmospheric Composition Analysis Group (ACAG), Toxics Release Inventory (TRI), Center for Air, Climate, and Energy Solutions (CACES), North Carolina Department of Environmental Quality (NCDEQ), Department of Transportation (DOT), Federal Aviation Administration (FAA), Federal Communications Commission (FCC) and the Nuclear Regulatory Commission (NRC) | 18,462 | ||
MERRA-2 Data (Earthdata) | Geospatial data linkages from the Modern Era Retrospective analysis for Research and Applications (MERRA-2) project containing consistent estimates of climate and environmental metrics from a range of satellite-based environmental observations | 17,273 | ||
Social Vulnerability Index (SVI) Data | Geospatial data linkages for CDC/ATSDR Social Vulnerability Index containing summaries of social determinants of health at the census tract level | 17,273 | ||
Environmental Justice Index (EJI) Data | Geospatial data linkage for CDC/ATSDR Environmental Justice Index containing summaries of environmental, social, and health factors at the census tract level | 17,273 | ||
Genomic Data | Candidate Gene/SNP Data | Candidate SNP data for a subset of participants for specific research goals | 12,316 | |
Single Nucleotide Variants (SNVs) | SNV and small indel genotypes derived from the whole-genome sequencing (WGS) data in plink's .bed/.bim/.fam format | 4,737 | ||
Structural Variants | Structural variant calls generated from the WGS data in .vcf format consisting of large deletions, duplications, and inversions | 4,737 | ||
Human Leukocyte Antigens (HLA) Genotypes | HLA genotypes identified from the WGS data for 20 HLA genes with up to six digits of specificity | 4,737 | ||
Telomeric Content | Aggregate telomeric content estimated from WGS reads reported as telomeric reads per GC content-matched million reads | 4,737 | ||
Local and Global Ancestry Estimations | Inferred local ancestry per chromosome after haplotype phasing and global estimates of percent ancestry for each participant | 4,730 | ||
Methylation Data | Genome-wide methylation profiling data using the Infinium MethylationEPIC v1.0 BeadChip Kit targeting 866,297 CpG sites | 4,724 |
Survey Summary
Categories of survey questions administered to the participants in the Health & Exposure Survey are provided.
Health & Exposure Survey | ||
---|---|---|
About Your Family's Health | Diabetes and Endocrine | Neurologic |
About Your General Health | Digestive | Occupation |
About Your Home Life | Exposures | Renal |
About Your Mood | Fatigue | Reproductive (Females Only) |
Bones, Joints, and Muscles | Hematological | Reproductive (Males Only) |
Cancer | Immune | Respiratory |
Cardiovascular | Lifestyle | Skin, Eyes, and Hair |
Categories of survey questions administered to the participants in the External Exposome Survey (Exposome Survey - Part A) are provided.
External Exposome (Exposome A) | |
---|---|
Characteristics of Current and Past Residences: • Agricultural Property Use • Garage and Basement • Heating and Cooling • Pesticides and Insecticides • Pets • Surrounding Area • Walls and Flooring • Water and Dampness | |
Chemical and Metal Exposures at Work | |
Hobby Exposures | |
Ultraviolet Light Exposures | |
Workplace Characteristics |
Categories of survey questions administered to the participants in the Internal Exposome Survey (Exposome Survey - Part B) are provided.
Internal Exposome (Exposome B) | |
---|---|
Chemotherapy/Radiation Therapy | Physical Activity |
Dietary Behavior | Reproductive History (Females Only) |
Dietary Intake | Sleep |
Genetic History | Stress |
Infectious Disease | Vitamins, Minerals, and Other Supplement Use |
Medications | Twin/Triplet Siblings and Birth Order |
Other |
Geospatial Data Summary
Source | Description | Examples |
---|---|---|
Geocodes (GIS) | Geocoded data from multiple participant-provided addresses from time of: initial enrollment, completion of the Health and Exposure Survey, completion of the External Exposome Survey and the longest-lived childhood address and the longest-lived adult address from the External Exposome. | Geographic coordinates (latitude and longitude) from multiple participant-provided addresses. |
Hazards | Exposure estimates computed from Department of Transportation (DOT) data. | Information from train tracks, rail depots and roadways, such as total major roadway length, distance to nearest rail depot, etc. |
Hazards | Exposure estimates computed from Federal Aviation Administration (FAA) data. | Information from aircraft departure and arrival sites - e.g., distance to nearest airport. |
Hazards | Exposure estimates computed from Federal Communications Commission (FCC) data. | Information from cellular network towers - e.g., nearest cell tower. |
Hazards | Exposure estimates computed from North Carolina Department of Environmental Quality (NCDEQ). | Distance to multi-pollutant point sources such as swine CAFOs, hazardous waste site, hazardous spill site, EPA superfund site, wastewater treatment plant release site, etc. |
Hazards | Exposure estimates computed from Nuclear Regulatory Commission (NRC) data. | Distance to nuclear power station. |
Hazards | Exposure estimates computed from Atmospheric Composition and Analysis Group (ACAG) data. | Particulate matter concentrations - PM2.5 total, PM2.5 sulfate, PM2.5 black carbon, etc. |
Hazards | Exposure estimates computed from Center for Air, Climate, and Energy Solutions (CACES) data. | Concentrations for multiple pollutants such as carbon monoxide, nitrogen dioxide, ozone concentration, etc. |
Hazards | Exposure estimates computed from Toxics Release Inventory (TRI) data. | Emissions for chemicals of interest such as benzene, ethylbenzene, xylene, toluene, etc. |
MERRA-2 data (Earthdata) | Geospatial data linkages from the Modern Era Retrospective analysis for Research and Applications (MERRA-2) project to assimilate a range of satellite-based environmental observations into a consistent estimate of climate and environmental metrics. | Particulate, gas, meteorological, and health-relevant exposure indicators such as - dust sedimentation, organic carbon emission bin, SO2 biomass burning emissions, sea-level pressure, etc. |
Social Vulnerability Index (SVI) | Geospatial data linkages for CDC/ATSDR Social Vulnerability Index, designed to consistently quantify multiple social determinants of health across the United States over time. | Consists of summaries of social determinants of health at the census tract level including an overall index, four component indexes (socioeconomic status, household characteristics, racial and ethnic minority status, and housing type/transportation), and source variables used to compute each index component (e.g., poverty, education, overcrowding, access to vehicle, etc.) |
Environmental Justice Index (EJI) | Geospatial data linkages for CDC/ATSDR Environmental Justice Index, containing summaries and ranks of the cumulative impacts of environmental injustice on health at the census tract level. | Consists of ranks for each census tract on 36 environmental, social, and health factors grouped into ten domains and three overarching modules - the environmental burden, social vulnerability and health vulnerability modules. |
All data on this website are reported from PEGS Data Freeze 3.1 created on 6/27/2023.