Researchers linked geospatial and other diverse datasets to create tools to visualize potential threats to human health. For definitions of common data science and sharing terms, see the glossary on the landing page. For more information about these use cases, please refer to the White Paper (1MB).
Linking Data From Laboratory and Field Investigations of Mercury Transformation, Bioaccumulation, and Remediation
Collaborating Institutions:
Dartmouth College SRP Center and SRP-funded individual research projects at Duke University and University of Maryland-Baltimore County (UMBC)
Abstract:
Mercury is one of the top chemicals of concern with regard to human health. Exposure increases the risk of diabetes, respiratory disease, and reproductive and developmental disorders. For fetuses, infants, and children, exposure can harm the developing nervous system and interfere with cognitive thinking and memory.
Mercury contamination can persist in aquatic ecosystems worldwide and is the most frequent cause of fish consumption advisories across the U.S. Methyl mercury (MeHg), the form that biomagnifies in the aquatic food web, is controlled by a range of geochemical, microbiological, transport, and ecological processes. Collaborators from the Dartmouth College SRP Center and SRP-funded individual research projects at Duke University and University of Maryland-Baltimore County (UMBC) set out to better understand the range of factors controlling mercury movement and transport in aquatic environments to evaluate the effectiveness of remediation strategies.
By compiling data from both controlled laboratory and microcosm experiments and larger scale mesocosm and field observations, the team also hoped to improve the context and scalability of lab data. Their aim was to create a centralized data platform to compare controlled experiments and field observations to improve insights into the environmental relevance of experimental findings.
Research Question:
What are relative roles of sediment and water column processes in determining bioavailability of inorganic mercury to methylating bacteria and MeHg to aquatic food webs? Are processes studied in lab experiments truly scalable to field scale processes? What factors in sediments and the water column control the flux of MeHg between these phases? Are there specific water quality and redox conditions where sediment amendments are most effective?
Data Sources and Data Sets:
Collaborator | Data Type | Data Source |
Dartmouth and collaborators at the University of Connecticut | Field sampling: data collected across several years at sites in Maine, New Hampshire, Connecticut, New Jersey, New York; included dissolved and particulate Hg and MeHg, temperature, salinity, pH, dissolved oxygen, dissolved organic carbon, etc. in water, sediment, and biota, as appropriate. | Buckman et al. (2021); Taylor et al. (2019)
|
UMBC and collaborators at the Smithsonian Environmental Research Center
| Field survey of marsh and adjacent mud flats across the Chesapeake Bay (Virginia and Maryland), and Maine. Field trial of the impact of activated carbon on Hg bioavailability in a marsh in Maine. Both data sets include detailed sediment and porewater geochemistry, and Hg methylation rates. | Gilmour et al. (2018) |
Duke | Outdoor field mesocosm experiments: Hg methylation and MeHg bioaccumulation factors in water, sediment, biofilms, and animals. Also compared passive sampling technologies for predicting Hg methylation and MeHg bioaccumulation potentials and evaluated activated carbon amendments in altering bioavailability and methylation. | (Neal-Walthall N) |
Data Repositories:
Metadata:
Vocabularies Utilized:
EnvO, ChEBI, and Units of Measurement Ontology (UO) (UO)
Approach:
The team’s initial plan was to integrate their data using an unstructured database, but they encountered challenges in combining data from individual labs. Even though all collaborators were mercury scientists, each lab named and stored measurements differently that best suited the specific research goal. To address this challenge, they pivoted to an approach involving loading their data into a structured database (PostgreSQL). They created a consistent naming convention across labs, leveraging existing ontologies, including some nomenclature from EnvO, ChEBI, and UO. Their terms included parameter names, definitions, and unit conversions (i.e., molar to weight) that would make their data talk to each other.
Next, the team developed a sharable data analysis pipeline for data cleanup and transformation in OpenRefine. OpenRefine is an open-source software which visualizes and manipulates large quantities of data all at once. They wrote code to tell the application how to translate data values from each collaborator’s separate Excel files into one normalized database using their standard terms. This process generated a new file that reproduces the same automated cleaning process so other researchers from separate organizations can replicate this EUC’s pipeline.
Outcomes:
In total, the EUC has compiled and harmonized five lab and field datasets, produced a consistent naming convention between labs, and completed a data dictionary describing the contents, format, and structure of the database and its elements. They are in the process of mapping from individual datasets to a unified public data repository, and plan to store their code and infrastructure documentation on GitHub.
Integrated Datasets, Portals/Dashboards, Tools, Code:
Code and documentation can be accessed via GitHub.
Making Fish Contaminant Data FAIR to Improve Fish Consumption Advisories
Collaborating Institutions:
Dartmouth College and Boston University SRP Centers
Abstract:
Fish consumption advisories are meant to help people make informed choices about consuming fish caught from local waters. However, guidelines vary significantly from location to location, or may be inconsistent within the same water body if recommendations are made by more than one jurisdiction. Researchers at the Boston University (BU) and Dartmouth College SRP Centers worked to create a searchable data platform containing fish tissue and environmental contaminant data from their centers as well as publicly available data.
By comparing similar types of environmental science data, they sought to compare fish consumption advisories to ultimately determine if they are protective of sensitive populations. Since there are no comprehensive U.S. databases of chemical contaminants in fish tissue, they also wanted to create a searchable database that allows determination of temporal and spatial evaluations.
Research Question:
Do fish contaminant data support protective fish consumption advisories?
Data Sources and Data Sets:
Dataset | Years | Details |
Great Lakes Fish Monitoring and Surveillance Program (Great Lakes Environmental Database) | 1999-2018 | The team conducted a preliminary mixtures analysis using data and reported fish contaminant levels tended to increase with fish size. The data also showed trends, such as PCB levels decreasing over time, mercury levels as well as pesticides, have remained relatively stable over the past two decades and where data exist, levels of PFOS are increasing. |
National Rivers and Streams Assessment | 2008-2009, 2013-2014 | The team conducted a preliminary analysis revealing higher correlations of PCB chemicals in urban sites compared to non-urban sites. |
National Coastal Condition Assessment | 2000-2006, 2010 | Evaluates four indices of condition—water quality, sediment quality, benthic community condition, and fish tissue contaminants – to evaluate the ecological condition and recreational potential of coastal waters. |
National Lake Fish Tissue Study | 1999-2003 | Data analyzed for the study included tissue concentrations for each target chemical (e.g., mercury) or chemical group (e.g., PCBs) and fish composite type (i.e., predator and bottom-dweller composites). |
Boston University and Dartmouth College SRP Centers | The composition and coverage of the above datasets were compared with center data collected at Superfund and contaminated sites, which include diverse marine and freshwater fish species, sample types (e.g., fillet, whole body), and concentrations of contaminants, including PCBs and mercury. The datasets helped to illuminate the potential and highlight the gaps that are seen across Superfund-generated data. |
Data Repositories:
Metadata:
Vocabularies Utilized:
Approach:
The team gathered EPA fish contaminant datasets including the National Rivers and Streams Assessment (NRSA; 2008-2009, 2013-2014), the National Coastal Condition Assessment (2000-2006, 2010), the National Lake Fish Tissue Study (1999-2003), and the Great Lakes Environmental Database (1999-2018). The composition and coverage of these datasets were compared with center data collected at Superfund and contaminated sites, which include diverse marine and freshwater fish species, sample types (e.g., fillet, whole body), and concentrations of contaminants, including PCBs and mercury. The U.S. EPA datasets helped to illuminate the potential and highlight the gaps that are seen across Superfund-generated data.
To integrate data into a single centralized repository, the team mapped metadata between data sources and normalized inputs using a customized ontology they developed. Their ontology aggregated and extended existing ontologies, including ecological, physiological, and environmental terms from EnvO and contaminant terms from the Chemical Entities of Biological Interest (ChEBI).
The team is building a relational database to combine their SRP center and external data so users can query all datasets at once. They began with a defined organization for column mapping and data types. The team recorded this schema and stored the utility code, which facilitates migration, duplication, and sharing, on GitHub.
Outcomes:
The team developed a new ontology and are working on a repository which underpins an interactive map visualization to provide a broad view of contamination nationwide (e.g., PCBs, mercury, other organic and inorganic pollutants). This tool will allow users to select specific PCB species, or other inorganic pollutants, and perform queries based on those they selected. This tool should help make data FAIR while creating opportunities for scientific collaboration among SRP environmental health researchers, local researchers, citizen science and community groups, Native American tribes, and federal and state government.
Integrated Datasets, Portals/Dashboards, Tools, Code:
Validate and Develop Visualization and Reproducibility Documentation for Source-Receptor Relationships for Toxicants
Collaborating Institutions:
University of Rhode Island (URI) and Massachusetts Institute of Technology (MIT) SRP Centers
Abstract:
Collaborators from the University of Rhode Island and Massachusetts Institute of Technology (MIT) SRP Centers set out to understand the link between sources of chemical emissions and their concentrations in the environment. More than 600 sites across the U.S. are contaminated by per and polyfluoroalkyl substances (PFAS), but the extent of transport away from these sites to potential human exposure pathways, such as inhalation, is virtually unknown. These source-receptor relationships link pollution emissions to their migration and deposition, as well as to human exposure and finally to resulting health effects. Such information is critical for accurate risk assessment and the development of effective remediation policies. By quantifying and visualizing source-receptor relationships and potential health and environmental impacts, the team sought to provide more detailed data to inform decision making.
They planned to investigate the commonalities in atmospheric deposition pathways of two classes of chemicals by integrating modeling data and measurement data from both centers. By combining and comparing their data, they hoped to develop robust source-receptor relationships for PFAS and PAHs, information that is essential for attributing exposures and health effects to Superfund sites or other sources of pollutants.
Research Question:
What are the source-receptor relationships for PFASs and PAHs in the Northeastern U.S.?
Data Sources and Data Sets:
Their existing datasets included PFAS and PAH modeling data generated using the GEOS-Chem atmospheric transport model a global 3-D chemical transport model for atmospheric composition driven by meteorological input from the Goddard Earth Observing System (GEOS) of the NASA Global Modeling and Assimilation Office.
Metadata:
Vocabularies Utilized:
The Semantic Web for Earth and Environmental Terminology (SWEET)
Approach:
The team developed an ontology for metadata specific to atmospheric toxicant modeling that can be extended to help integrate information from additional models. They developed and implemented a consistent procedure for harmonizing the spatial and temporal resolution of the data to enable geographic comparison. This involved setting up a common data formatting, or metadata, hierarchy that worked for these two datasets, but that could also be generally applicable to other chemical transport model outputs generated with GEOS-Chem. Using the SWEET ontology, their hierarchy started with source type, such as specific emitter or class of emitters. Each of these sources could emit a suite of chemical species which then potentially reach human receptors through different exposure pathways.
The team used a NetCDF (Network Common Data Form), which is a file format for storing multidimensional variables such as temperature, humidity, pressure, wind speed, and wind direction. Each of these variables can be displayed through a dimension, such as time, in geographic modeling software by making a layer or table view from the NetCDF file. The group used Python tools to transform and visualize the NetCDF data into their PFAS- and PAH-specific GEOS-Chem source-receptor model.
The researchers wrote an application programming interface (API) wrapper into this process, making their actions reproducible and accessible, and broadly applicable to different toxicants. An API wrapper is a language-specific (e.g., Python) package or kit that encapsulates multiple processes to make complicated functions easy to use.
Outcomes:
Results from these projects included visualizations and interactive maps of source-receptor relationships for the Northeast U.S. region which are available online. A manuscript exploring contrasting source-receptor relationships for PAHs and PFAS is still in progress, and they hope to continue work to link exposure pathways to health endpoints. In the future, they are considering containerizing the code they created to allow non-data scientists to host and apply the process on a local server to manipulate the data. This functionality would be of interest to those at state and local levels who want to know what sources and contaminant levels mean.
Integrated Datasets, Portals/Dashboards, Tools, Code:
All metadata, and visualization tools and model code are archived on GitHub and have been deposited to the Open Science Framework as part of this project.
Developing a Spatial Approach for Toxic Transferal From Industrial and Vacant Land Uses to Green Infrastructure
Collaborating Institutions:
Texas A&M University (TAMU), Brown University, and University of California (UC) San Diego SRP Centers
Abstract:
Disaster events such as flooding can spread harmful contaminants from current and former industrial sites into neighboring communities. Implementing new green infrastructure, such as rain gardens or food forests, has been linked to improved public health outcomes, decreased flood damage, and decreased concentrations of toxics in stormwater runoff. However, little is known about whether these systems are vulnerable to toxics transfer during extreme weather and other disasters. Researchers worked together to understand how land use, such as vacant or industrial land and green space, affect people’s exposure to harmful chemicals and impact community resilience and health.
They aimed to integrate diverse city, local, and federal data, with SRP center datasets on spatial land use, including green infrastructure, flood plains, vacant lot uses, public health outcomes, industrial land uses, and sociodemographic conditions to create interactive maps that could show how different factors contribute to an area’s vulnerability to toxicant transfer or flooding, for example.
Research Question:
How does land use, such as vacant or industrial land and green space, affect people’s exposure to harmful chemicals and impact community resilience and health?
Data Sources and Data Sets:
Dataset | Year | Source | Reference | Scale | Registered in |
Green Infrastructure Quantity and Quality | 2016 | Texas Natural Resources Information System DataHub
| USGS | U.S. Census Tract | ArcGIS Online (AGOL), Data Discovery Studio (DDS); with respective GUIDs
|
Vacant Addresses | 2016 | U.S. Postal Service | HUD | U.S. Census Tract | AGOL, DDS; with respective GUIDs
|
Vacant Lands | San Diego Association of Governments | AGOL, DDS; with respective GUIDs
| |||
Public Health Outcomes (14 factors) | 2016 |
| CDC | U.S. Census Tract | AGOL, DDS; with respective GUIDs
|
Social Vulnerability/Sociodemographic Conditions | 2016 | CDC/ATSDR SVI Data and Documentation
| CDC | U.S. Census Tract | AGOL, DDS; with respective GUIDs AGOL, DDS; with respective GUIDs
|
Industrial Land Uses | 2016 | Multiple (created from land use data); includes Brown’s Historical Industrial Site database that contains information on 6655 manufacturing sites in Rhode Island.
| Local | U.S. Census Tract | |
Flood Damage (Flood Plain) | 2016 | Spatial Hazard Events and Losses Database for the US (SHELDUS)
| FEMA | U.S. Census Tract | |
San Diego Historical Business Locations
| City of San Diego | City of San Diego | AGOL, DDS, Brown, Dataverse; DOIs, GUIDs
|
Data Repositories:
Data are shared in several repositories, including:
- Dataverse ,
- ESRI Dashboard , ToxPI and SuAVE platforms for analysis and visualization, plus analysis of the datasets in Jupyter Notebooks. Publishing the data in SuAVE lets users annotate data views (maps, statistical distributions) and share them with others
- All registered datasets can be found through ArcGIS Online based on their metadata, via online user interface or via Python code in a Jupyter Notebook.
- They are also available in the Brown Digital Repository and Data Discovery Studio which includes platform and tools for analysis of novel data layers and creation novel datasets.
Metadata:
Metadata Standards Utilized:
Vocabularies Utilized:
CINERGI Ontology (automated metadata enhancement), domain vocabularies North American Industry Classification System (NAICS), ChEBI, PubChem, MeSH
Approach:
The team integrated 6 geospatial datasets, ran analytics, and visualized data using the Toxicological Prioritization Index (ToxPi) software, which normalizes the data and weights factors contributing to risk and uses pie charts to visualize threats to a community in the form of a vulnerability score.
Outcomes:
Publications:
- Newman G, Malecha M, Atoba K. 2021. Integrating ToxPi outputs with ArcGIS Dashboards to identify neighborhood threat levels of contaminant transferal during flood events. J Spatial Science. [Abstract]
- Malecha ML, Kirsch KR, Karaye IM, Horney JA, Newman G. 2020. Advancing the Toxic Mobility Inventory: development of a Toxics Mobility Vulnerability Index and application to Harris County, TX. Sustainability (New Rochelle). 13(6): 282-291. [Abstract]
Integrated Datasets, Portals/Dashboards, Tools, Code:
The team developed an online interactive dashboard called the Toxics Mobility Vulnerability Index to visualize risks; useful for communities and decision makers.
Integrated datasets are shared in the locations described above, under “Repositories”.