Integrating Omics Data Across Model Organisms

JavaScript is disabled

Webcasts and videos will not work. Visit this guide for steps on enabling JavaScript.

Leveraging large omics datasets, such as genomics and metabolomics, researchers sought to shed new light on the underlying molecular mechanisms by which hazardous substances affect health. For definitions of common data science and sharing terms, see the glossary on the landing page. For more information about these use cases, please refer to the White Paper (-1B).

Two EUCS: Integration and Analysis of SRC-Generated Cardiometabolic Syndrome Data Streams From Animal Models, AND Refining Species-Conserved Adverse Outcome Pathways (AOPs) of AhR-mediated Adverse Effects

Collaborating Institutions:

Integration and Analysis of SRC-Generated Cardiometabolic Syndrome Data Streams from Animal Models: Michigan State University (MSU), University of Louisville (U of L), University of Kentucky (UK) SRP Centers

Refining Species-Conserved Adverse Outcome Pathways (AOPs) of AhR-mediated Adverse Effects: MSU and University of Iowa (U of I) SRP Centers

Abstracts:

Integration and Analysis of SRC-Generated Cardiometabolic Syndrome Data Streams from Animal Models: Collaborators at the MSU, U of L, and UK SRP Centers sought to combine data from laboratory-controlled animal studies to identify mechanisms by which exposure to Superfund contaminants promote cardiometabolic disease development and progression. Their objective was to combine data to identify better preventative and therapeutic intervention strategies and to improve the development of adverse outcome pathways (AOPs). AOPs are structured ways to represent biological events leading to adverse health effects designed to support greater use of mechanistic data in risk assessment and decision making.

Refining Species-Conserved Adverse Outcome Pathways (AOPs) of AhR-mediated Adverse Effects: In closely related work, a second Use Case for the MSU and the University of Iowa SRP Centers focused on characterizing the molecular mechanisms of AhR-mediated toxicity, a signaling pathway that regulates biological response to chemicals, and refine AOPs through data integration and reuse. Given the common goals and models used, the teams combined forces to tackle the two Use Cases together.

The teams aimed to integrate data on mouse toxicology experiments with different study designs, including transcriptomics (RNAseq), proteomics, metabolomics, and clinical chemistry data. While there were many existing repositories for individual types of data, such as the Gene Expression Omnibus (GEO) and the Metabolomics Workbench, none were designed for all the data types they needed to combine to capture the complexity of animal toxicology experiments.

Research Question:

Can we demonstrate that data from laboratory-controlled animal studies can be used to investigate cardiometabolic disease mechanisms and improve identification of responses elicited by exposure to Superfund contaminants? Can we further characterize the molecular mechanisms of AhR-mediated toxicity and refine adverse outcome pathways (AOPs) through data integration and reuse?

Modified Research Question: Update and strengthen the data sharing framework with larger long-term benefits.

Data Sources and Data Sets:

Data Types	MSU Tetrachlorodibenzodioxin	UK Polychlorinated biphenyls (PCBs)	U of L Volatile organic compounds (VOCs), PCBs	U of Iowa PCBs
RNA-Seq/ Transcriptomics	Nault et al. 2015* (Nault et al. 2016b) Nault et al. (2016a) * Fader et al. (2017) *	PCB126/LDLr-/- mouse/liver RNAseq unpublished	Unpublished liver data from Wahlang et al. (2014)	Gadupudi et al. (2018) Gadupudi et al. (2016a) Gadupudi et al. (2016b) Wu et al. (2016)
(Phospho)Proteomics			Hardesty et al. (2019a)* Hardesty et al. (2019b)* Unpublished liver proteomics from male and& female mice treated according to Lang et al. (2018)	Included above
Metabolomics/ Lipidomics	Nault et al. (2016b) Nault et al. (2017)	Deng et al. (2019) Petriello et al. (2018c) Petriello et al. (2018a)
Metagenomics (microbiome)	Stedtfeld et al. (2017)	Petriello et al. (2018c)	Unpublished data from Wahlang et al. (2016b) Unpublished data from Lang et al. (2018)
Metabolic phenotyping	Included above	Included above Wahlang et al. (2017b) Wahlang et al. (2016a) Wahlang et al. (2017a)	Included above
Clinical Chemistry	Included above	Included above Petriello et al. (2018b)	Included above	Included above
Histopathology	Included above	Included above	Included above
Eicasanoids				Included above
*Available in GEO (transcriptomics), Metabolomics Workbench/MetaboLights (metabolomics), or UC San Diego MassIVE (proteomics)

Data Repositories:

DataVerse, Gene Expression Omnibus (GEO), Metabolomics Workbench

Metadata:

Metadata Standards Utilized:

Minimum Information about Animal Toxicology Experiments (MIATE); Investigation, Study, Assay (ISA)

Approach:

The team evaluated FAIRness of existing repositories, implementing a Python library and package to validate data for parsability and consistency. They explored ways to improve consistency in metadata collection, assessing that it would be needed to standardize and reuse in vivo animal toxicology experiment data—called Minimum Information about Animal Toxicology Experiments (MIATE). Also used was the Investigation, Study, Assay (ISA) framework to develop a tool for collecting metadata.

Outcomes:

The research team evaluated how repositories support FAIR data and identified gaps in data sharing. Their internal systems were able to discern non-parsability as a major error affecting more than 5.5% of analyses including consistency errors between data formats, missing raw data, and inconsistency in field names as significant barriers to making data more FAIR. The team pinpointed key pieces of metadata needed to reuse animal toxicology experiment data and created data and metadata capture infrastructure. These findings helped in the development of a web-application for finding, accessing, integrating, and reusing datasets—creating an infrastructure that expanded upon the Tox Bio Checklist while also making their MIATE publicly available on fairsharing.org and GitHub.

Publications:

Smelter A, Moseley HNB. 2018. A Python library for FAIRer access and deposition to the Metabolomics Workbench Data Repository. Metabolomics14(5):64. [Abstract] [Full Text] ;

Integrated Datasets, Portals/Dashboards, Tools, Code:

MIATEv2:
mwtab Python package
Gene Expression Omnibus (GEO); GSE148339, GSE167328, GSE171941, GSE171942, GSE178168)
Developed a web-application for finding, accessing, integrating, and reusing datasets

Integrating Population Genomic Data to Understand Mechanisms of Chemical Susceptibility and Resistance

Collaborating Institutions:

Boston University (BU) and Duke University SRP Centers

Abstract:

Researchers collaborated to better understand the underlying mechanisms controlling susceptibility versus resistance to hazardous chemicals by integrating population genomic data from multiple populations of killifish that differ in their chemical sensitivity. By comparing their data to similar data on genetic variation in rodent models and in humans, they hoped to enhance the use of wildlife as environmental sentinels and models for human health.

The team leveraged two parallel projects exploring the genetic mechanisms underlying the evolved resistance to polycyclic aromatic hydrocarbons (PAHs), based on studies from Duke, and polychlorinated biphenyls (PCBs), from studies out of BU, in multiple populations of Atlantic killifish inhabiting sites contaminated with high levels of these chemicals. Initial analysis of genomic data from both projects identified variation in the aryl hydrocarbon receptor (AHR) signaling pathway as one common feature associated with the differential sensitivity. The AHR signaling pathway regulates the biological response of animals, including humans, to some PAHs and PCBs.

Research Question:

How does genetic variation influence sensitivity and resistance to Superfund chemicals? Can integrating large genome-wide sequencing datasets facilitate comparisons to similar data on genetic variation associated with chemical susceptibility and disease susceptibility in rodent models and humans?

Data Sources and Data Sets:

Data Type	BU Repository/ Database	BU Dataset Identifier	Associated Publications	Duke Repository	Duke Dataset	Associated Publications
Killifish genome data	NCBI BioProject	PRJNA269290 PRJNA323589	Reid et al. (2017) (Killifish genome) Reid et al. (2016) (whole genome re-sequencing for 384 individual killifish in 8 populations)	NCBI BioProject	PRJNA450424
	Dryad	dryad.t2888 dryad.68n87
RAD-seq data (genome sequence samples)						Osterberg et al. (2018) (from 270 individual fish from nine populations)
RNA-seq data			Reid et al. (2016) Oleksiak et al. (2011)		Unpublished data
Human (and other) SNP data:	NCBI SNP Human Genome Variation Society NCBI ClinVar Ensembl

Data Repositories:

NCBI‘s GEO and SRA, and Dryad

Metadata:

Used existing tools, including FastQC for quality checks, STAR to map RNAseq data, and Samblaster for identifying duplicates in WGS data.

Vocabularies Utilized:

Used existing vocabularies (not specified).

Approach:

The team created a harmonized bioinformatics analysis pipeline using standard genomics data formats and established methods and tools, such as FastQC, STAR, and Samblaster, and then remapped the new data to a new killifish reference genome assembly.

They loaded the data and associated metadata into the open-source genome browser JBrowse, which allows the data to be visualized and queried. Their resulting platform, SuperFunBase, is freely available along with all underlying data.

Outcomes:

The team was successful in identifying the best reference genome assembly and annotations, integrating data, and deploying a new platform, SuperFunBase. SuperFunBase allows users to look at portion of the killifish genome and see human genes that perform the same function while summarizing the ‘omics data in in different ways.

According to the team, this tool has already been useful in identifying variants that may play important functions, such as relating to resistance or susceptibility to harmful exposures. It can also be used to predict specific changes in proteins that would result from these variants.

Integrated Datasets, Portals/Dashboards, Tools, Code:

All the underlying data is available on their SuperFunBase platform. The team updated and made publicly available their killifish genome and gene annotation. They also constructed an Open Science Framework (OSF.io) project page to disseminate their work more broadly while promoting collaboration. All code written to produce the genome browser from source data is publicly available in a git repository on BitBucket.

Integration and Sharing of Xenobiotics-Associated Assays Across Species, Phenotypes, and Sites

Collaborating Institutions:

BU and Oregon State University (OSU) SRP Centers

Abstract:

Researchers collaborated to understand how to integrate xenobiotic assay data and make it accessible and interactive. Xenobiotics are chemicals, usually man-made, that originate outside of the body. They sought to combine existing mammalian gene expression assay data and chemical annotations related to adverse effects generated at the BU SRP Center with dose-response behavior and morphology data in zebrafish and Superfund site chemical composition data from the OSU SRP Center. Specifically, the collaborators sought to combine data across species to better understand the underlying mechanisms by which exposure to chemicals harm health. Their goal was to establish a data-driven taxonomy of compound classes based on changes to RNA that may represent shared modes of action.

Research Question:

Can we support exploratory data analysis, hypothesis generation, and shed light on the modes of action and adverse effects of chemicals by integrating and sharing assays across species? How can we integrate and make exposure data from mammalian in-vitro-based and zebrafish-based assays accessible?

Data Sources and Data Sets:

BU: Expression profile data for approximately 500 chemicals in human cell lines from the carinogenome project and for 78 chemicals in mouse cells from the adipogenome project, and other publicly available data sets and repositories (e.g., DrugMatrix, MSigDB, PubChem). Collectively, this dataset includes information on chemical carcinogenicity, genotoxicity, adipogenicity, connectivity to drugs (CMap), and expression and activity levels of genes and pathways in response to each chemical exposure.

OSU: Zebrafish-based morphological and behavioral screens for over 1,200 chemicals and expression profiling data for a subset of these chemicals; chemical concentration data and Superfund site chemical composition. OSU data is currently housed internally at the Pacific Northwest National Laboratory in a secure, firewalled data repository, called the Experimental Data Management System. OSU data can be downloaded from: http://datahub.pnnl.gov.

More than 200 screened chemicals overlap between BU and OSU datasets.

Data Repositories:

Metadata:

The team leveraged gene and pathway annotations from GeneCards, MSigDB, and Reactome for annotation of genes in their datasets.

Approach:

The research team used the R/Shiny interface and commands to develop the front end for two open-source software portals that serve as a repository of gene expression datasets and used APIs to allow the portals to talk to each other. They also included a security and privacy system which allows all users read access, and required login credentials to make additions and edit.

For example, the Xposome Portal is an interactive R/Shiny interface that facilitates chemical screening using the compiled high-throughput transcriptomic assay data and the data from the zebrafish assays when available. Users can drill down into information to see what genes are affected by a particular chemical and whether gene expression is increased or decreased and get more detailed information about the effects on particular genes for example. They can also interact with the data and perform analyses with a built-in tool they developed, called K2Taxonomer, that allows the user to look at changes in gene expression across a group of related chemicals to visualize their similarities or differences from other groups of chemicals in a heatmap

Outcomes:

The Xposome Portal, hosted at BU, and the SRP Data Analytics Portal hosted at OSU, ensure data is accessible to outside users.

Documentation for the Xposome portal is available online, and, all data from OSU can be accessed on their website.

Integrated Datasets, Portals/Dashboards, Tools, Code:

Xposome Portal; SRP Data Analytics Portal

National Institute of Environmental Health Sciences

Webcasts

Your Environment. Your Health.

Integrating Omics Data Across Model Organisms

Two EUCS: Integration and Analysis of SRC-Generated Cardiometabolic Syndrome Data Streams From Animal Models, AND Refining Species-Conserved Adverse Outcome Pathways (AOPs) of AhR-mediated Adverse Effects

Collaborating Institutions:

Abstracts:

Research Question:

Data Sources and Data Sets:

Data Repositories:

Metadata:

Metadata Standards Utilized:

Approach:

Outcomes:

Publications:

Integrated Datasets, Portals/Dashboards, Tools, Code:

Integrating Population Genomic Data to Understand Mechanisms of Chemical Susceptibility and Resistance

Collaborating Institutions:

Abstract:

Research Question:

Data Sources and Data Sets:

Data Repositories:

Metadata:

Vocabularies Utilized:

Approach:

Outcomes:

Integrated Datasets, Portals/Dashboards, Tools, Code:

Integration and Sharing of Xenobiotics-Associated Assays Across Species, Phenotypes, and Sites

Collaborating Institutions:

Abstract:

Research Question:

Data Sources and Data Sets:

Data Repositories:

Metadata:

Approach:

Outcomes:

Integrated Datasets, Portals/Dashboards, Tools, Code: