Integrating Data to Help Elucidate Causes of Environmentally Influenced Disease
DERT Success Story
Carolyn Mattingly, Ph.D.
Environmental exposures are thought to play a role in the development of many common diseases, such as asthma and cancer, but how these exposures lead to disease is often unclear. Traditionally, the critical data needed to understand these environment-disease associations better have been scattered throughout the scientific literature, requiring researchers to comb through numerous publications to gain insight on how a chemical exposure may contribute to disease. To facilitate and expedite this process for the environmental health research community, Carolyn Mattingly, Ph.D., and her team have been working for over a decade to develop and expand the Comparative Toxicogenomics Database (CTD), a centralized, publicly available resource that systematically integrates the data needed to make connections between chemical mechanisms of action and potential impacts on human health.
“A lot of emerging evidence suggests that environmental exposures lead to disease, but in many cases the research community doesn’t really understand how those chemicals are causing disease,” said Mattingly, who initially developed CTD at the Mount Desert Island Biological Laboratory before moving to North Carolina State University in 2012. “By integrating relevant data on chemicals, genes, proteins, and diseases into a centralized resource we can help researchers make connections about how the environment affects health that would otherwise be very time consuming to do on their own.”
To populate CTD, Ph.D.-level scientists, called biocurators, use standardized vocabularies and structured notation to curate and annotate information from the scientific literature on chemical-gene, chemical-disease, and gene-disease interactions. From this curated data, CTD generates predicted associations, called inferences. For example, if a chemical in CTD has a curated interaction with a gene and that same gene has a curated association with a disease from another publication, then an inferred relationship is established between the chemical and the disease (see figure). These inferences provide potential molecular links between otherwise disconnected data, which can help researchers generate testable hypotheses concerning the origin of environmentally influenced diseases.
As of September 2015, CTD included more than 27 million chemical-gene-disease connections. Additionally, over 660 publications now cite CTD data, and 60 other databases incorporate and promote CTD content, greatly enhancing access to the information and increasing the global reach of the resource. “The success of this project is really a reflection of an amazing and dedicated group of biocurators and software developers,” noted Mattingly.
Integrating Exposure Data to Provide Real-World Context
In 2014, CTD marked its 10-year anniversary. Moving forward, Mattingly and her team are expanding the scope of CTD content. One area of expansion includes curation and integration of exposure data into CTD.
“One of the major challenges with integrating exposure data into CTD was the lack of a standardized vocabulary to describe the diverse range and study designs for exposure data,” explained Mattingly. “You can’t start curating the data without some kind of structure.” To bridge this gap, Mattingly and colleagues worked with exposure scientists to develop a standardized framework, or Exposure Ontology (ExO), which has since allowed them to begin incorporating exposure data into CTD.
By providing researchers information from population-based exposure studies – such as dose, life stage at exposure, and population demographics – the exposure data adds real-world context to the more experimentally based core chemical-gene-disease data already in CTD. Furthermore, by virtue of integrating exposure data into CTD, the exposure science community will now have a resource that not only centralizes its data but connects the data to a broader biological and mechanistic framework.
“Typically, exposure epidemiology studies examine exposure-disease relationships but don’t have associated mechanism-based data,” explained Mattingly. “The overlap between the exposure data we are curating now and the core chemical-gene-disease information we have been curating all along will provide new opportunities for researchers to explore potential mechanisms of environmentally-influenced health outcomes.” For example, a researcher interested in an epidemiology study linking organophosphate pesticide exposure with reduced IQ can now access all core CTD data associated with that pesticide, including associated genes, proteins, and molecular pathways. This may provide researchers a potential mechanistic link explaining how the pesticide exposure leads to reduced IQ.
The curation and integration of exposure data into CTD is an ongoing process. To date, Mattingly and her team of biocurators have reviewed almost 2,000 publications and have curated data from approximately 1,300 publications into the database. Moving forward, they will be developing new tools within CTD to facilitate novel analyses and visualization of data within and across large population-based studies. Recently, the team curated the 111 individual exposure publications related to the Agricultural Health Study. The CTD team will continue to seek feedback from the environmental health research community to ensure that the database remains a current and valuable research resource.
- Davis AP, Grondin CJ, Lennon-Hopkins K, Saraceni-Richards C, Sciaky D, King BL, Wiegers TC, Mattingly CJ. 2015. The Comparative Toxicogenomics Database’s 10th year anniversary: update 2015. Nucleic Acids Res 43(Database issue):D914-D920. [Abstract]
- Davis AP, Wiegers TC, Roberts PM, King BL, Lay JM, Lennon-Hopkins K, Sciaky D, Johnson R, Keating H, Greene N, Hernandez R, McConnell KJ, Enayetallah AE, Mattingly CJ. 2013. A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions. Database (Oxford) 2013:bat080. [Abstract]
- Mattingly CJ, McKone TE, Callahan MA, Blake JA, Cohen Hubal EA. 2012. Providing the missing link: the Exposure Science Ontology ExO. Environ Sci Technol 46(6):3046-3053. [Full Text]
- King BL, Davis AP, Rosenstein MC, Wiegers TC, Mattingly CJ. 2012. Ranking transitive chemical-disease inferences using local network topology in the comparative toxicogenomics database. PLoS One 7(11):e46524. [Abstract]