Skip Navigation

Your Environment. Your Health.

Researchers Remove Noise from Metabolomics Data

Gary J. Patti
Washington University
R01ES022181

When performing untargeted metabolomics studies, which profile all metabolites in a sample, scientists often detect tens of thousands of signals. These signals were traditionally thought to indicate distinct metabolites. Using a new approach, NIEHS-funded researchers revealed that the actual number of unique metabolites in a typical metabolomics analysis may be close to one-tenth as large as previously thought.

Examining the metabolites in E. Coli, the research team looked for signals arising from contamination, artifacts, and something they called “degenerate features” — when one metabolite shows up as many different signals. They found thousands of previously unreported degenerate features, with some metabolites showing up as more than 150 signals. Removing these features reduced the number of unique analytes from approximately 25,000 to fewer than 2,961. After removing additional contaminants and other poorly resolved components from the data, they further reduced the number of unique analytes to less than 1000.

This substantial reduction in data was more than five-fold greater than that reported in previously published studies. Based on these results, the authors suggested an alternative approach to untargeted metabolomics that relies on thoroughly annotated reference data sets to help identify the noise. To aid in this effort, they created the creDBle database to provide scientists conducting metabolomics studies with access to annotated reference data sets.

Citation: Mahieu NG, Patti GJ. 2017. Systems-level annotation of a metabolomics data set reduces 25,000 features to fewer than 1,000 unique metabolites. Anal Chem 89(19):10397-10406