Skip Navigation
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.


The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Your Environment. Your Health.

Text Mining Improves Chemical-Gene-Disease Curation

Carolyn J. Mattingly, Ph.D.
North Carolina State University
NIEHS Grants R01ES014065, R01ES019604

NIEHS grantees report that text mining can help rank more relevant scientific research for inclusion in the Comparative Toxicogenomics Database (CTD). The CTD is a public resource that provides information on chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles.

The researchers used a text-mining approach that assigns each article a document relevancy score, with a high score indicating that the article is more likely relevant for the CTD. They tested this approach on 14,904 articles covering seven heavy metals and found that integrating text mining with their current system of manual curation helped prioritize more relevant articles, increasing productivity by 27 percent and novel data content by 100 percent.


Citation: Davis AP, Wiegers TC, Johnson RJ, Lay JM, Lennon-Hopkins K, Saraceni-Richards C, Sciaky D, Murphy CG, Mattingly CJ. 2013. Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the Comparative Toxicogenomics Database. PLoS ONE 8(4): e58201.

to Top