Data Harmonization Use Case Environmental Health Language Collaborative

ehlclogo

Use Case Champion

Jeanette Stingone, Columbia University Mailman School of Public Health

Status

  • Active use case workgroup that meets monthly
  • Currently testing various data-cleaning and harmonization processes on example datasets
  • Currently developing a publication that encompasses Data Harmonization Use Case findings and recommendations

Purpose

The purpose of the Data Harmonization Use Case (DHUC) is to address the feasibility of using harmonized language for combining data across independent environmental health science (EHS) research studies. We are doing this by applying data and metadata standards to example datasets. Our goal is to develop a set of strategies and resources to facilitate and encourage data sharing and harmonization in current and future research. The expected impacts of DHUC are:

  • Reduced barriers to using data templates or other approaches that support data interoperability
  • Increased use and adoption of existing datasets across the environmental health research community
  • Increased interoperability between datasets across disparate studies and research initiatives
  • Increased awareness and use of machine-learning and artificial intelligence-enabled technologies to analyze existing data and generate new knowledge

Progress to Date

  • Selected example topic for finding datasets – Exposure to pollution and development of asthma
  • Developed list of existing metadata tools and templates for potential interoperability and comparison of datasets across different studies
  • Hosted a data harmonization workshop in January and February 2023 (1MB) to obtain stakeholder feedback and report progress to date.
  • Developed a rubric for identifying ontologies to use for describing data and findings
  • Developed a new resource “Example Ontologies for EHS Domains” containing existing ontology or semantic resources categorized by domain and sub-domain names, using the rubric for identifying ontologies
  • Conducted data mapping exercises on epidemiological asthma-related datasets from Human Health Exposure and Analysis Resource (HHEAR) to better identify challenges, gaps, and opportunities

Next Steps

  • Finalize data mapping exercises with HHEAR data sets
  • Develop and publish an EHS-relevant set of recommended practices and tools using the results of the mapping exercises and other group activities
  • Identify whether unique strategies and resources should be developed for users with different levels of data harmonization expertise
  • Add an ontology selection rubric and points of contact for researchers into the “Example Ontologies for EHS Domains” resource and plan communication strategies for distribution
  • Explore how DHUC efforts could be expanded to data streams beyond epidemiology studies, including social determinants of health

Expected Final Products

  • Publication that encompasses DHUC findings and recommendations.
  • Template-based approach and set of tools for informing data collection practices to integrate or harmonize data across studies.

How to Get Involved

DHUC is seeking feedback on criteria for including an existing resource, how to identify and organize ontology resources, and the overall approach to data harmonization. Send an email to [email protected] to be added to the Data Harmonization Use Case roster. You will receive meeting invitations, email updates, and access to the group’s collaboration platform on MS Teams.