Pierre R. Bushel, Ph.D.
Pierre R. Bushel, Ph.D. is involved in or leads the following research projects within the Biostatistics and Computational Biology Branch:
Mito-nuclear Interactions Mediate Hyperoxia Induced Lung Injury
In collaboration with Steve Kleeberger, Ph.D., P.I. and Head of the Environmental Genetics Group
Mitochondria evolved over time to perform critical biological functions such as cellular energy production, oxidation-reduction processes and modulation of apoptosis. The human and rodent mitochondrial genomes are circular, double-stranded DNA, of approximately 16.5 kb in length and contains 37 genes coding for rRNAs, tRNAs and polypeptides. The nuclear genome encodes for several mitochondria proteins and conversely, the mitochondrial has been shown to impact epigenetic marks in the nuclear genome. Mutations in one or both of the genomes may at times lead to biological conditions and diseases. Bronchopulmonary dysplasia (BPD) is a chronic lung condition that affects neonates stemming from extended exposure to oxygen from mechanical ventilation during preterm. There is evidence that single nucleotide polymorphisms (SNPs) in the nuclear and mitochondrial genomes may lead to a predisposition to BPD. However, little is known of the system-wise genetic mechanism of susceptibility to BPD contributed from the interaction of the nuclear and mitochondrial genomes. My collaborator, associate and I are investigating the epistatic relationship between nuclear SNPs (nuSNPs) in or flanking genes that encode mitochondrial proteins and mitochondrial SNPs (mtSNPs) in as many as 29 neonate mice exposed to hyperoxia (> 95 % O2) or normoxia (20 % O2) and exhibit lung phenotypes that represent hyperoxia-induced lung injury characteristic of BPD.
Chemically-induced Hepatocellular Carcinoma Mutational Signatures
In collaboration with Arun Pandiri, Ph.D., M.S., D.A.C.V.P., D.A.B.T., P.I., Lead of the Molecular Pathology Group
The Division of the National Toxicology Program evaluates chemicals for carcinogenicity hazard using two-year rodent bioassays. Hepatocellular carcinomas (HCCs) arising spontaneously or due to chronic exposure to chemicals in B6C3F1/N mice are evaluated by whole exome sequencing (WES) to identify mutational signatures from somatic single nucleotide variants (SNVs). Chemical-induced HCCs have different modes of action and mutation burden rates. We are assessing chemical specific mutation signature in mouse HCCs and comparing these mouse HCC signatures to the human COSMIC database signatures.
Genomic Reference for Assessing Performance of Cancer Screening Panels
As part of the MAQC\SEQC
The SEquence Quality Control (SEQC2) arm of the Massive Assessment of Quality Control (MAQC) consortium, led by the U.S. Food and Drug Administration (FDA), is a continuation of successful prior efforts in transcriptomics validation and reproducibility. The consortium was recently chartered with examining best practices in DNA testing across a wide spectrum of methods (bioinformatic methods, reference material, FFPE impacts, somatic vs. germline detection, issues with detection of structural variants, etc). A working group component of SEQC2 was challenged with examining the reproducibility, sensitivity and accuracy of current (or in development) commercially available pan cancer tumor panels for solid tumors and liquid biopsies. This working group is investigating 1) the construction of a “ground truth” of additional DNA reference material and 2) the utilization of the crowdsource effort of SEQC to undertake the massive data generation, management, analysis and compilation. Ten cancer cell lines used to generate the Agilent Universal Human Reference RNA (UHRR) material are examined to develop a truth set (positives and negatives) for use with tumor cancer panels. The pooled specimen is being explored for panel development, panel validation, and for quality control.
Anticancer Therapeutics Gene Expression for Discrimination of Drug Combination Toxicity
It is believed that unique patterns of gene expression changes will be associated with drug combinations that have a higher risk of synergistic or additive toxicity as compared with either agent used alone. If this is true, the expression patterns can then be used to provide some initial degree of discrimination between drug combinations with higher risk of combination toxicity. Drug combinations we are interested in were chosen based on common target organ toxicities (that can be replicated in a preclinical model) and have the potential for actual clinical use as a combination. Temsirolimus is an intravenous drug for treatment of renal cell carcinoma. It is a specific inhibitor of mTOR and interferes with synthesis of proteins that regulate proliferation, growth, and survival of tumor cells. Oxaliplatin, a platinum compound, is used to treat colorectal cancer. Its cytotoxicity is thought to result from inhibition of DNA synthesis. Oxaliplatin forms both inter- and intra-strand crosslinks in DNA which prevent DNA replication and transcription, hence causing cell death. Gemcitabine is used to treat a variety of cancers, particularly breast cancer, ovarian cancer, non-small cell lung cancer, pancreatic cancer, and bladder cancer. It exerts its effect by blocking DNA synthesis resulting in cell death. Sunitinib, sorafenib and erlotinib are inhibitors of several tyrosine kinases and are used for the treatment of primary kidney cancer or imatinib-resistant gastrointestinal stromal tumor in the case of the former drug and non-small cell lung cancer, pancreatic cancer and several other types of cancer in the case of the latter drug. We are evaluating whether the combination therapy can be used to assess the effect of a single drug on gene expression and alternative polyadenylation switching or the synergistic effect and order of administration of the drugs.
Single Cell Sequencing Data Clustering via Extracting Patterns and Identification of Co-expressed Genes
We developed Extracting Patterns and Identifying co-expressed Genes for microarray gene expression data (EPIG) and bulk RNA-Seq data (EPIG-Seq). The methods rely on:
- extracting patterns in the data based on correlation, magnitude of change and signal-to-noise or dispersion and
- categorizing the expression profiles to the patterns. We are investigating the utilization of the EPIG framework for clustering samples based on single cell sequencing data to discover rare cell populations, to uncover regulatory relationships between genes, and to track the trajectories of distinct cell lineages during cell development or differentiation.
Deep Learning of Integrated Genomics Big Data for Elucidation of Regulatory Pathways Associated with Chemical Effects on Biological Systems
Exposure to environmental stressors or toxic chemicals alters regulatory systems. Toxicogenomics offers a useful paradigm to address environmental health concerns related to adverse effects from the exposures. Rather than a traditional (inductive) evaluation of toxicology from exposure — phenotype — target genes — perturbed pathways, a reverse toxicologic (deductive) approach from exposure — whole genome assessment — phenotypic signatures — perturbed pathways is more practical given advances in genomics, genetics, epigenetics and bioinformatics. Integration of massive data sets is one key to understanding the effects of chemicals and environmental stressors on biological systems. The other key is incorporating machine learning into the analysis so we able to “know what we don’t already know”. We are investigating deep learning of integrated toxicogenomics/genetics/epigenetics data sets and knowledgebase systems to elucidate perturbed regulatory pathways directly associated with chemical and environmental exposures that elicit toxic responses.
Extraction of Biological Themes from Genomic Data
Gene expression and epigenetics analyses provide ways to simultaneously monitor the regulation of thousands of transcripts and to evaluate heritable genome changes system-wide. Genes that are co-expressed are assumed to be co-regulated, in that they may be involved in related biological functions. They often times have epigenetic marks that control gene expression. Many algorithms have been applied to identify clusters of co-expressed genes and epigenetic control modules. While these algorithms are able to extract coherent gene expression patterns and definitive control regions, they fall short of providing biological interpretation. Typically, the interpretation relies on expert knowledge and information from published literature. However, some of these clusters or control modules may contain hundreds or thousands of genes or loci. Online searching to link information between genes and literature or manual interpretation of each individual gene becomes impractical. One way to address the biological interpretation problem is to use the Gene Ontology data resource (GO) which includes annotations of genes and gene products that have been previously curated from literature. However, GO-guided methods may bias the outcome of the clusters or control modules, where the functional relationships as revealed by the gene expression profiles cannot be identified. In addition, the very nature of the hierarchical structure of GO generalizes the annotation of the biological processes and molecular functions. We are investigating novel ways to extract biological themes from gene expression clusters and epigenetic control regions based on GO enrichment coupled with PubMed literature mining semantics.
Massive Genomic Informatics
The Massive Genomic Informatics (MGI) Team provides bioinformatics, gene expression, epigenetics, computational biology, machine learning and statistical genetics/genomics support as well as software, database and application development expertise to all institute scientists. The MGI Team associates includes Pierre R. Bushel, Ph.D., M.S., Head of the MGI; Jianying Li, M.S. and Brian Bennett, Ph.D., Bioinformaticians from the Integrative Bioinformatics Group; and Liwen Liu, M.S. Computational Biologist from the Molecular Genomics Core.