Sharing Environmental Microbiome Data

JavaScript is disabled

Webcasts and videos will not work. Visit this guide for steps on enabling JavaScript.

By combining data on bacteria and microbes in the environment, researchers revealed how complex populations of microorganisms interact within an environment to provide useful information to improve strategies to remove hazardous substances from the environment. For definitions of common data science and sharing terms, see the glossary on the landing page. For more information about these use cases, please refer to the White Paper (-1B).

Deciphering Intra- and Cross-Kingdom Microbial Interactions for Bioremediation of Superfund Pollutants

Collaborating Institutions:

Duke University and University of Iowa (U of I) SRP Centers

Abstract:

Studies involving microorganisms that break down pollutants are often conducted with single microbial cultures or simplified bacterial communities in the laboratory. While these experiments are useful to uncover mechanistic insights, they do not capture the complex microbial interactions that exist in nature. Researchers at the Duke University and University of Iowa SRP Centers collaborated to establish a common computational framework and infrastructure, and to standardize sampling approaches to better share, integrate, and analyze large microbiome sequencing datasets collected from natural sediments across SRP centers. Both Duke and the University of Iowa are working to engineer microbes to clean up contaminants in the environment. The team hoped that by sharing data, they could shed light on how complex populations of microbes interact within an environment to provide useful information to improve bioremediation strategies.

The Use Case first sought to establish a reproducible and sharable microbiome bioinformatics pipeline focusing on their existing 16S rRNA high throughput sequencing data from PAH- and PCB-contaminated sediments, respectively, which could then be applied to other data types.

Research Question:

Can we establish a common computational framework, infrastructure, and standardized sampling approaches to enable data sharing, integration, and analysis of environmental microbiome data to facilitate precise bioremediation?

Data Sources and Data Sets:

Collaborator

Dataset Description

PMCID

U of I

16S high-throughput sequencing data from several sediment sampling locations within a PCB-contaminated wastewater lagoon. Available in NCBI’s Sequence Read Archive (SRA) under Bioproject number PRJNA382682.

Mattes et al. (2018)

Duke

16S/18S/fungal ITS Ion Torrent and lllumina MiSeq amplicon metagenomic data sets from several sediment sampling locations along the Elizabeth River as well as a former wood treatment facility in Yadkinville, NC and bench scale reactor work in addition to a single 16S amplicon metagenomic/ metabolomic dataset from Fundulus (fish gut and gills). Although not all data have been published to date, raw and processed data are available..

Ikuma and Gunsch 2012

Czaplicki and Gunsch 2016

Chang et al. (2019)

Data Repositories:

NCBI’s Sequence Read Archive (SRA); Metadata accompanying environmental microbiome raw sequence data submitted to SRA will be improved in near future.

Metadata:

Vocabularies Utilized:

Did not use any ontologies. They intended to use the Earth Microbiome Project Ontology (EMPO) however it was limited with respect to describing contaminated environments.

Approach:

The team sought to establish a reproducible and sharable microbiome bioinformatics pipeline focusing on their existing 16S rRNA high throughput sequencing data. All of the team’s experimental protocols and analysis code were uploaded to a GitHub repository and included version control to ensure methods were up to date as techniques and tools evolved. They evaluated computational procedures looking for ways maximize reproducibility and FAIRness of data management practices.

The team also developed a software container, which is a standalone, transferable, and executable assembly of software, to run their microbiome analyses. Their computing environment is in a container on the Singularity Hub and has all the software packages with source coded needed to run analyses, such as more than 60 R packages pre-installed, which greatly reduces the amount of time required for someone to develop and install the analysis pipeline.

Outcomes:

The team developed a Docker container (i.e., standalone, transferable and executable assemblies of software) to house their microbiome analysis pipeline (specifically for 16S rRNA) so anyone on EUC team could run their analyses reproducibly from any location. Also developed protocols for standardizing environmental sampling from soil, sequencing (e.g., DNA extraction and library preparation) and analysis, and visualization of environmental microbiome data which are stored on GitHub providing version control ensuring that even as protocols improve and change over time, the original methodology remains available. Both the container, protocols, and guidance documents will be made available in the future.

Integrated Datasets, Portals/Dashboards, Tools, Code:

Their Singularity containers are available online, instructions for running the containers are on GitHub, and their data is stored in Sequence Read Archive (SRA; e.g., BioProject Number PRJNA382682).

Two EUCs: Integrating and Creating Broad Access for Transcriptomic, Proteomic, Microbiome and Physicochemical Datasets of Phytoremediator and Phytostabilizer Plants; Data Interoperability for Investigating Biogeochemical Controls on Metal Mixture Toxicity Using Stable Isotopes and Gene Expression

Collaborating Institutions:

Integrating and Creating Broad Access for Transcriptomic, Proteomic, Microbiome and Physicochemical Datasets of Phytoremediator and Phytostabilizer Plants: University of Arizona (UA) and University of California (UC) San Diego SRP Centers

Data Interoperability for Investigating Biogeochemical Controls on Metal Mixture Toxicity Using Stable Isotopes and Gene Expression: UA SRP Center and researchers at the Colorado School of Mines (CSM)

Abstract:

Integrating and Creating Broad Access for Transcriptomic, Proteomic, Microbiome and Physicochemical Datasets of Phytoremediator and Phytostabilizer Plants: Soil and water with high levels of toxic metals, including cadmium, lead, mercury, and arsenic, can be harmful to human and environmental health. Traditional approaches to decontaminate heavy metals include excavating and removing soils, which can be costly and impractical. Using plants to take up metals and stabilize them is a cost-effective alternative, but there are many complex interactions and genes and pathways involved that are not well characterized. The University of Arizona (UA) SRP Center team worked with researchers from the University of California (UC) San Diego SRP Center to understand interactions between metals, microbes, and plants that help some plants tolerate contaminants and stabilize metals in contaminated soils. Specifically, they sought to investigate genomic, transcriptomic, microbiome, and physicochemical properties to identifying genes and pathways that enable plants to grow and stabilize metals in semi-arid environments.

Data Interoperability for Investigating Biogeochemical Controls on Metal Mixture Toxicity Using Stable Isotopes and Gene Expression: In a closely related Use Case, the UA SRP Center team collaborated with researchers from an SRP-funded individual research project at the Colorado School of Mines (CSM) to explore how remediation of mining waste affects biodiversity of terrestrial and aquatic systems. Researchers at CSM examined stream impacts of mining and the processes involved in recovery following clean-up, however complex interactions among metal mixtures are not well captured by current predictive toxicity models used by regulatory agencies. The group conducted toxicology experiments with aquatic organisms to uncover the molecular mechanisms involved in interactions between metals to improve model predictions. The UA team has tackled the issue of mining waste on the terrestrial side, looking at plants that can take up and stabilizing metals. They have tested a phytoremediation strategy called compost-assisted phytostabilization that promotes plant and root growth that locks metals underground.

Research Questions:

Can we combine data to better understand interactions between metals, microbes, and plants that help some plants tolerate contaminants and stabilize metals in contaminated soils? Can combining toxicity data in terrestrial and aquatic systems reveal new insights in predicting recovery after cleanup? Can we create a data analysis portal to enable robust analyses that shed light on the complex relationships between environmental factors, microbial communities, and remediation success?

Data Sources and Data Sets:

EUC	Data Type	Notes
UA and UC San Diego	Transcriptomic	Data sets from 24 phytostabilizer plant samples from shoots and roots grown in compost-amended mine tailings. Transcriptome data are the sum of an organism’s RNA transcripts and provide a picture of gene expression in specific cells or tissue under specific conditions.
	Microbiome	Generated from the rhizosphere-influenced and bulk soil samples collected from the pots grown with quailbush exposed to compost-amended mine tailings and potting soil. These data provide microbiome community composition metrics.
	Ionomic	Metal and elemental content (As, Cd, Cu, Fe, K, Mn, Na, Pb, Zn) of plant leaf, shoot and root samples from plants. These data sets provide information on the accumulation of toxicants and elemental nutrients in each sample.
	Physicochemical	Characteristics of mine tailings, compost, and potting soil, such as pH, total organic carbon, and metal/elemental content of the mine tailings, compost, and potting soil samples. These data indicate the state of the growth medium prior to planting and after plant growth and establishment.
UA and CSM	Water Chemistry	Data from North Fork Clear Creek, Colorado before and after acid mine drainage treatment, including total and dissolved concentrations of major and trace elements, as well as water chemistry variables. Discharge data allows for computation of metal loads. These data identify spatiotemporal trends in metal loading and remediation effectiveness.
	Biological	Data from North Fork Clear Creek obtained from benthic sampling performed over the past 3 years including the total abundance of benthic organisms, taxonomic benthic diversity, and algal biomass. These data indicate the biological response of stream communities to improved water quality associated with acid mine drainage remediation. (Kotalik et al. 2021)
	Modeling Toxicity	Model-computed toxicity for field data using the measured water composition, including dissolved organic carbon and water hardness. Toxicity of copper and zinc over the study period was calculated to determine the effectiveness of the acid mine drainage treatment to aquatic health.
	Mixture Toxicity Assays	Mortality for exposures to Cu, Cd, Ni, and Zn in mixtures-based toxicity studies in Daphnia Magna RNA seq data from Daphnia magna from metal mixture toxicity studies. Quantitative reverse transcription polymerase chain reaction (RT-qPCR) data for selected biomarkers from metal mixture toxicity studies.
	Chemical Analyses	Performed on samples from the Iron King Mine and Humboldt Smelter Superfund site (IKMHSS) mine tailings in Arizona that were collected prior to and during phytostabilization (Root et al., 2015). Total elemental composition and speciation was also performed to determine the impact of phytostabilization on metal mobility and bioavailability (Hammond et al., 2020).
	XAS and X-ray Diffraction	XAS and X-ray diffraction of biogeochemical processes affecting metal(loid) molecular stabilization and mobility in the root zone of plants during phytoremediation (Hammond et al., 2018).
	Microbial analysis	Analysis of bulk and rhizosphere samples from IKMHSS during compost-assisted phytoremediation (Valentin-Vargas et al., 2018; Honeker et al., 2019, Hottenstein et al., 2019).
	rRNA	16S rRNA, and gene abundance and activity during IKMHSS phytoremediation quantified using quantitative PCR and quantitative reverse transcription PCR (qRT-PCR) of DNA and cDNA extracts, respectively (Nelson et al., 2015; Honeker et al., 2019)
	Metagenomics	Metagenomic sequencing analysis of bulk and rhizosphere samples from IKMHSS during compost-assisted phytoremediation (unpublished).

Data Repositories:

Sequence data for both EUCs is being deposited in NCBI ‘s SRA using minimum information standards (i.e., Minimum Information about any (x) sequence -MixS), while environmental data will be deposited in the Knowledge Network for Biocomplexity (KNB) data repository.

Metadata:

Vocabularies Utilized:

Environment Ontology (EnvO) for metal concentrations, Plant Ontology for plant structures, BioCollections Ontology for observations and measurements, and Population and Community Ontology for biodiversity metrics.

Approach:

The research team focused on making data more standardized and interoperable, mapping terms to existing ontologies and contributing to ontologies by adding more terms and details as needed. The team is still in the process of analyzing and annotating transcriptomic datasets to then integrate that data into a single graph database hosted on the UC San Diego Superfund Portal.

Outcomes:

The team enhanced existing repositories and made progress in making data harmonized and interoperable.

Publications:

Ramírez-Andreotta MD, Walls R, Youens-Clark K, Blumberg K, Isaacs KE, Kaufmann D, Maier RM. 2021. Alleviating Environmental Health Disparities Through Community Science and Data Integration. Front. Sustain. Food Syst 5.

Integrated Datasets, Portals/Dashboards, Tools, Code:

CSM data on water chemistry and benthic invertebrate diversity along with preprocessing scripts are available on GitHub.

National Institute of Environmental Health Sciences

Webcasts

Your Environment. Your Health.

Sharing Environmental Microbiome Data

Deciphering Intra- and Cross-Kingdom Microbial Interactions for Bioremediation of Superfund Pollutants

Collaborating Institutions:

Abstract:

Research Question:

Data Sources and Data Sets:

Data Repositories:

Metadata:

Vocabularies Utilized:

Approach:

Outcomes:

Integrated Datasets, Portals/Dashboards, Tools, Code:

Two EUCs: Integrating and Creating Broad Access for Transcriptomic, Proteomic, Microbiome and Physicochemical Datasets of Phytoremediator and Phytostabilizer Plants; Data Interoperability for Investigating Biogeochemical Controls on Metal Mixture Toxicity Using Stable Isotopes and Gene Expression

Collaborating Institutions:

Abstract:

Research Questions:

Data Sources and Data Sets:

Data Repositories:

Metadata:

Vocabularies Utilized:

Approach:

Outcomes:

Publications:

Integrated Datasets, Portals/Dashboards, Tools, Code: