Text Mining Improves Chemical-Gene-Disease Curation
Carolyn J. Mattingly, Ph.D.
North Carolina State University
NIEHS Grants R01ES014065, R01ES019604
NIEHS grantees report that text mining can help rank more relevant scientific research for inclusion in the Comparative Toxicogenomics Database (CTD). The CTD is a public resource that provides information on chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles.
The researchers used a text-mining approach that assigns each article a document relevancy score, with a high score indicating that the article is more likely relevant for the CTD. They tested this approach on 14,904 articles covering seven heavy metals and found that integrating text mining with their current system of manual curation helped prioritize more relevant articles, increasing productivity by 27 percent and novel data content by 100 percent.
Citation: Davis AP, Wiegers TC, Johnson RJ, Lay JM, Lennon-Hopkins K, Saraceni-Richards C, Sciaky D, Murphy CG, Mattingly CJ. 2013. Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the Comparative Toxicogenomics Database. PLoS ONE 8(4): e58201.
▲ Up: Dietary Nicotine Associated with Lower Parkinson’s Disease Risk (http://www.niehs.nih.gov/research/supported/sep/2013/dietary-nicotine/index.cfm)
▼ Down: Air Pollution Associated with Increased Risk for Serious Birth Defects (http://www.niehs.nih.gov/research/supported/sep/2013/birth-defects-air-pollution/index.cfm)