Skip Navigation
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.


The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Your Environment. Your Health.

Statistical Approaches for Assessing Health Effects of Environmental Chemical Mixtures in Epidemiology Studies

July 13-14, 2015

Day one

Day Two


In September, 2011, the NIEHS workshop entitled "Advancing Research on Mixtures: New Perspectives and Approaches for Predicting Adverse Human Health Effects" brought together experts from epidemiology, toxicology, exposure science, risk assessment, and statistics to identify key challenges in environmental chemical mixtures research and suggest approaches for addressing those challenges. This cross-disciplinary collaboration represents a necessary step in understanding the health effects associated with exposure to real-world mixtures. One important theme that emerged from the workshop was the need for development of novel statistical approaches and appropriate use of available statistical methods in the analysis of combined exposure data from epidemiological studies.

Analysis of exposure to environmental chemical mixtures is a well-known issue in environmental epidemiology, but the best methods for doing such analyses are not known. For example, environmental epidemiologists are typically faced with challenges such as: numerous potential exposures of interest; high degrees of correlation between some of these exposures; non-uniform data distributions; and a paucity of toxicologic data to use as guidance when constructing regression models of epidemiological data of exposure to mixtures. New methods for epidemiologic analysis of mixtures are being developed, but it is not known how well they perform or how they compare with conventional epidemiologic approaches such as analysis of effect measure modification or creation of simple exposure indices.

This current workshop will bring together mixtures experts (focusing on statisticians, epidemiologists, and toxicologists) with experience in developing and comparing experimental and statistical approaches to assess the human health effects of mixtures. Furthermore, the health effects associated with exposure to combined exposures are an important part of the NIEHS 2012-2017 Strategic Plan (i.e., Goal #4 Combined Exposures). The goals of this workshop are also critical to the NIEHS Superfund Research Program, which provides support for research that must address risk from the multiple contaminants found at contaminated sites.

The proposed workshop will be developed based on two simultaneous efforts that will begin 1 year prior to the workshop: (1) development of simulated mixtures databases to test statistical approaches; and (2) the identification of a "real world" data set(s) containing human health data and relevant mixtures to apply these statistical approaches. Both datasets will be provided to workshop participants; they will have 6 months to analyze the data sets using their specific approach and to write an abstract of their approach for each of the datasets to be presented at the workshop. Participants should consider working in multidisciplinary teams including epidemiologists, statisticians, and toxicologists.

This workshop will be very interactive with a majority of time spent discussing pros/cons of each of the statistical approaches and will address the following questions:

  1. Which methods are currently available for investigating/disentangling the combined biological effects of exposure to mixtures in epidemiology?
  2. Which statistical methods perform best across multiple datasets?
  3. What are the advantages/disadvantages of the available methods?
  4. Is the lack of good exposure data or size of study populations a rate limiting step?
  5. How do the statistical methods in combined exposure epidemiological studies compare to approaches used in toxicological studies of mixtures?
  6. How do methods developed with/applied to a simulated dataset translate to a "real-world" dataset?

Goals and Expected Outcomes of Workshop

This workshop will bring together experts in the fields of epidemiology and biostatistics to identify, develop, refine, and disseminate methods for quantifying the health effects of environmental chemical mixtures. Each expert participant will provide an abstract of their specific approaches for the simulated data sets and these will be provided to the other expert participants approximately 6 weeks before the workshop. Participants will come to the workshop and present their results. The workshop will result in a comprehensive document for publication that summarizes the findings from the workshop and outlines the best approaches and computational/conceptual/statistical models for determining or predicting health effects of mixtures. The workshop committee also welcomes analyses from other groups to be presented in a poster session.

Simulated Data Sets

Data Set #1:
Chemical Mixture Simulated Data

These synthetic data can be considered as the results of a prospective cohort epidemiologic study. The outcome cannot cause the exposures (as might occur in a cross-sectional study). Correlations between exposure variables can be thought of as caused by common sources or modes of exposure. The nuisance variable Z can be assumed to be a potential confounder and not a collider.

Structure of data file:

Name: DataSet1.xls (121KB)
Format: Excel file; the first row is a header each row represents a subject
Number of records: 500
Data per subject: Y, X1, X2, X3, X4, X5, X6, X7, Z
Y = outcome data, 1 continuous variable
X1 - X7= exposure data, 7 continuous variables
Z: potential confounder, binary

Additional information:
For purposes of Data Set #1, there is no loss to follow up, missing or censored data, mismeasurement of the variables (Y, Xi, Z), or many of the other potential biases. One may also assume that the seven exposure variables X1, X7 and Z are not intermediate variables and not colliders. There are no other confounders or effect measure modifiers. Random noise has been added to the outcome variable.

Data Set #2:
Mixture Simulated Data with an Environmentally Relevant Complex Correlation Pattern

This data file is intended to represent data from a cross-sectional study of 14 biomarkers (e.g., PCBs, dioxins, furans) from biomonitoring data potentially associated with a biomarker of effect (e.g., ALT as a marker of liver toxicity). Three covariates are also included, one binary and two continuous.

Structure of data file:

Name: DataSet2.xls (198KB)
Format: tab delimited Excel file; the first row is a header each row represents a subject
Number of records: 500
Variables: Y, X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, Z1, Z2, Z3
Y = The outcome, a continuous variable.
X1 - X14 = Fourteen exposure biomarkers. Each is a continuous variable.
Z1-Z2: Potential confounders that are continuous.
Z3: A potential confounder that is binary.

Additional information:
Data are complete (no missing data).

For both data sets, please answer as many of the following questions in your analysis:

  1. Which exposures contribute to the outcome? Are there any that do not? (Qualitative)
  2. Which exposures contribute to the outcome and by how much? (Quantitative)
  3. Is there evidence of "interaction" or not? Be explicit with your definition of interaction (toxicologists, epidemiologists and biostatisticians tend to think about this quite differently).
  4. What is the effect of joint exposure to the mixture? (Qualitative)
  5. What is the joint dose-response function? For example, if you can describe Y as a function of the exposures, what is your estimate of the function Y=f(X1,…,Xp)? (Quantitative)
  6. Provide metrics for your answer. For example, consider adjusted r square or root mean square error, etc.
  7. Analysts may also provide a description of the joint distribution of the exposure data.

Simulated Dataset Answers

Answers to Simulated Dataset #1 (148KB) 
Answers to Simulated Dataset #2 (62KB)

Statistical Code

Statistical Code for Workshop Participants NIEHS Epi Stats Workshop Codes (3MB)


Danielle Carlin, Ph.D.
Danielle J. Carlin, Ph.D.
Health Scientist Administrator
Tel 984-287-3244
530 Davis Dr
530 Davis Drive (Keystone Bldg)
Durham, NC 27713
to Top