Environmental Factor, October 2011, National Institute of Environmental Health Sciences
Conference highlights toxicogenomics, bioinformatics and computational biology
By Eddy Ball
NIEHS scientists participating at the meeting included members of the TIES organizing committee, Pierre Bushel, Ph.D., shown above, and Rick Paules, Ph.D.; Deputy Director Rick Woychik, Ph.D., who gave opening remarks; and Trevor Archer, Ph.D., who spoke on the critical roles of chromatin in transcription and development. (Photo courtesy of Steve McCaw)
Quackenbush raised many more questions than he gave answers. He said his audience might feel the way Enrico Fermi said he felt following an especially difficult talk. “Before I came here I was confused about this subject.” Fermi remarked. “Having listened to your lecture, I am still confused, but on a higher level.” (Photo courtesy of Steve McCaw)
Wright, above, argued that linking toxicology with omics and genetics is very important, but it is also a very difficult analytical task, considering that each experiment may generate a terabyte or more of data and integrating a series of studies presents a daunting computational task. (Photo courtesy of Steve McCaw)
The third international Toxicogenomics Integrated with Environmental Sciences (TIES) conference was held Sept. 15-16 in Chapel Hill, N.C. The conference was webcast in two-way transmission to researchers at the U.S. Environmental Protection Agency's National Center for Environmental Assessment in Washington, D.C. and Health Canada, and in one-way video transmission to other sites worldwide.
Nearly 200 specialists in biology, toxicology, statistics, and bioinformatics gathered at the William and Ida Friday Center for Continuing Education. They explored issues surrounding the use of increasingly complicated and promising platforms that generate rapidly expanding volumes of data on gene, protein, and metabolic patterns of expression as well as emerging technologies such as cellular imaging, epigenetics and predictive modeling. The goal of this kind of research is to characterize and predict molecular responses to environmental exposures on a global scale, to advance both biomedical research and regulatory science.
Sponsors of the conference included NIEHS, the University of North Carolina at Chapel Hill (UNC-CH), the Society of Toxicology, Health Canada, the U.S. Food and Drug Administration (FDA), and the SAS Institute.
One important theme of the meeting might be expressed this way - Be careful when you wish for more data, because you might end up facing more difficulty than you ever imagined managing and interpreting all that new information.
Shaking the pillars of the paradigm of average
As the first speaker in session one “Bioinformatics - Revealing pathways and biological systems underlying biological conditions,” Harvard University computational biologist John Quackenbush, Ph.D.(http://18.104.22.168/faculty/john-quackenbush/) , set the tone for his talk and, arguably, the entire conference by quoting mathematician Samuel Karlin. “The purpose of models,” Quackenbush told the audience, “is not to fit the data but to sharpen the questions.”
As biomedical research segues from a laboratory science to an informational science, Quackenbush argued, it becomes important to pay attention to the phenomenology of variance, as well as to the averages typically imposed upon biological data. A holistic approach using rank-ordered-based analysis of gene expression outliers, he said, may offer scientists insight into how the degree of variance influences the phenotype and progression of disease through epigenomic alterations, and provides the spark for evolutionary development.
The data speak, but we must invent their language
One of a host of biostatisticians speaking at the meeting was Fred Wright, Ph.D.(http://genomics.unc.edu/faculty/webpages/wright.html) , of UNC-CH, who spoke on expression quantitative trait locus (eQTL) analysis and variation in RNA expression.
As Wright explained, the new FastMAP eQTL analysis is several orders of magnitude faster than previous methods. However, he cautioned, it is still up to biostatisticians to develop the model for identifying the most informative single nucleotide polymorphisms (SNPs) and significant combinations for statistical analysis.
In addition to determining which candidates and combinations of SNPs are causal, Wright explained, researchers have to consider several other issues, such as the most informative tissue types for eQTL analysis, whether DNA and RNA are from the same patient, and the possibility that specific DNA sequences used as probes themselves contain SNPs that may affect outcomes.
Grounding toxigenomics in the realm of public health
Two talks, by NIEHS grantees Rebecca Fry, Ph.D.(http://www.sph.unc.edu/?option=com_profiles&Itemid=1891&profileAction=ProfDetail&pid=714233563) , of UNC-CH, and David Threadgill, Ph.D.(http://cals.ncsu.edu/genetics/index.php/people/david-threadgill/) , of North Carolina State University, brought the potential of integrated toxicogenomics home for listeners - both in terms of human health and in terms of environmental exposures in North Carolina.
Fry reported on her work using gene-expression analysis in human subjects to explore the two faces of arsenic, as a chemical that triggers gene expression pathways, which promote oncogenesis and tumor progression, and as a chemotherapeutic agent in the form of arsenic trioxide, which can target some forms of cancer in patients with certain gene expression patterns (see related story(http://www.niehs.nih.gov/news/newsletter/2011/april/science-fry/index.cfm)).
Threadgill reported on research inspired by epidemiological studies of exposure to trichloroethylene (TCE) in the water supply at Camp Lejeune, N.C.(https://clnr.hqi.usmc.mil/clwater/) , and the presence of arsenic in the slate belt of North Carolina. He set up experiments, using ten groups of genetically diverse mice exposed to various combinations of TCE and arsenic, to look at the pathology of exposed animals, gene expression patterns, and the potential synergy of the chemicals in mixture.
MAQC - Exploring the future of expression data platforms and analysis
A recurring question throughout the two days of talks was addressed directly in a special session on Microarray Quality Control (MAQC) that concluded the meeting. It confronted head-on the issue of whether the time is right to move from current microarray platforms to emerging next-generation sequencing platforms, which promise even more data about even more aspects of DNA sequence, gene expression, and epigenetic variation.
Moderated by Weida Tong, Ph.D., of FDA's National Center for Toxicological Research (NCTR), speakers surveyed the progress of MAQC through its first two phases, beginning in 2005, and looked forward to the next steps in MAQC III, Sequencing Quality Control (SEQC).
Tong reviewed the rigorous process of validating microarray data to answer the same questions about reliability of platforms and biomarkers that will face participants in SEQC. As with microarray, RNA-Seq platforms will generate massive datasets, multi-site validation to determine repeatability, reproducibility, and accuracy of base calling, mapping to the transcriptome, quantification and differential expression. Tong emphasized that transparency will be just as important in SEQC as it was in the two earlier phases of MAQC, in order to gain the confidence of the research and regulatory communities.
Talks about lessons learned by MAQC consultants Wendell Jones, Ph.D., of Expression Analysis, and Russell Wolfinger, Ph.D., of SAS, concluded the special session and the conference. Both challenged the assumption that more is necessarily better, arguing that researchers need to consider just how many reads they really need and how deeply they really need to probe, as they decide whether to move from microarray to RNA-Seq, with its expanded capacity for analysis.
Wolfinger reinforced Tong's comments about transparency, calling it the only ethical way to proceed. Referring to recent controversies over data integrity, he reminded the audience, “There are a lot of gotchas when it comes to data quality.” Wolfinger concluded with an allusion to Occam's razor and the principle of parsimony as a reasonable consideration when deciding whether to switch to RNA-Seq. “If two models seem to do about as well,” he said, “go for the simpler one.”