Skip Navigation
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Your Environment. Your Health.

Leping Li, Ph.D.

Biostatistics & Computational Biology Branch

Leping Li, Ph.D.
Leping Li, Ph.D.
Deputy Chief, Biostatistics & Computational Biology Branch and Principal Investigator
Tel 984-287-3836
Fax 919-541-4311
li3@niehs.nih.gov
P.O. Box 12233
Mail Drop A3-03
Durham, N.C. 27709

Research Summary

Leping Li, Ph.D., is Deputy Chief and a senior investigator in the Biostatistics and Computational Biology Branch. His research program has both computational and experimental components, and his staff is a multidisciplinary team. The early focus of the group was the development and implementation of computational/statistical methods to mine high-dimensional genomic data. Group members contributed to many areas of bioinformatics including:

  • Identification of transcription factors and their co-regulatory factor motifs
  • De novo motif discovery
  • Identification of enriched genomic loci in ChIP-seq and mRNA-seq data
  • Accurate anchoring alignment of divergent sequences
  • Gene set enrichment analysis for continuous non-monotone relationships
  • Sample classification and feature selection

Two years ago, Li established a wet lab so that computational discoveries through the group’s datamining efforts could be further pursued biologically.

Li’s team includes staff scientist Yuanyuan Li, Ph.D., research fellow Kai Kang, Ph.D., in methods development, and two wet lab postdoctoral fellows Melissa Li, Ph.D., and Wenling Li, Ph.D. The fellows in the wet lab work closely with the Xiaoling Li’s group in the NIEHS Signal Transduction Laboratory.

While the Li’s group continues to develop methods for mining high-dimensional genomic data, at present, it is primarily focused on two major areas of research: 1) tumor stroma in cancer progression and 2) classification and regression tree models.

Tumor Stroma in Cancer Progression

Gene expression profiling by RNA-sequencing of bulk tissue samples is widely employed to study tumor biology. Most tissues comprise multiple cell types. Biologists recognize that the value of such bulk expression profiles would be enhanced if they could be mined to uncover the proportion of each constituent cell type and their individual expression profiles. The group has developed a new complete deconvolution method, CDSeq, to estimate both sample-specific cell-type proportions and cell-type-specific expression profiles simultaneously using only bulk RNA-seq data.

CDSeq was benchmarked using several synthetic and experimental datasets with known cell-type composition. Its performance was compared to the performance of two state-of-art partial deconvolution methods, Cibersort and csSAM. Cibersort estimates sample-specific proportions of cell types given cell-type-specific expression profiles, and csSAM estimate cell-type-specific expression profiles given sample-specific proportions of cell types. Work to improve CDSeq is ongoing, e.g., optimal choice of hyperparameters. The group is also applying CDSeq to bulk RNA-seq data from tumor samples to try to gain insight into the roles of stroma in tumor biology.

Currently, the wet lab is studying two genes and their roles in cancer progression using both in vitro cell lines and in vivo mouse models. Both genes were differentially expressed between tumor and normal samples in nearly all TCGA (The Cancer Genome Atlas) tumor types and were expressed in tumor stroma.

Classification and Regression Models

Li’s group continues to develop and implement various classification methods, especially classification and regression tree (CART)-based algorithms. The group is particularly interested in XGBoost (eXtreme Gradient Boosting), a supervised machine learning algorithm based on an ensemble of decision trees that uses an optimized and distributed gradient boosting algorithm. The group has successfully applied XGBoost to genomic data and gene expression data for pan-cancer classification and for tumor-purity prediction. The group is currently collaborating with investigators in the Epidemiology Branch, the National Toxicology Program (NTP), and the NIEHS Clinical Research Unit on the analysis of transcriptomic and metabolomic data using tree-based classification and regression methods.

Software

  • coMotif
    A three-component mixture framework to model the joint distribution of two motifs as well as the situation where some sequences contain only one or none of the motifs.
  • EpiCenter
    EpiCenter is a powerful analysis tool of genome-wide mRNA-seq or ChIP-seq data for detecting differentially expressed genes or identifying changes in epigenetic modifications.
  • fdrMotif
    Determines the number of binding sites in each sequence of a probability model by performing statistical tests.
  • GA/KNN
    Selects the most discriminative variables for sample classification and may be used for analysis of microarray gene expression data, proteomic data or other high-dimensional data.
  • GADEM
    An unbiased de novo motif discovery tool implementing an expectation-maximization (EM) algorithm.
  • Genetic Algorithm Method for Optimizing a Position Weight Matrix
    Implements a simple method to improve a poorly estimated position weight matrix using chromatin immunoprecipitation data.
  • T-KDE
    T-KDE will identify the locations of constitutive binding sites. T-KDE, which combines a binary range tree with a kernel density estimator, is applied to ChIP-seq data from multiple cell lines.

Selected Publications

  1. Li Y, Krahn JM, Flake GP, Umbach DM, Li L. Toward predicting metastatic progression of melanoma based on gene expression data. Pigment cell & melanoma research 2015 28(4):453-463. [Abstract Li Y, Krahn JM, Flake GP, Umbach DM, Li L. Toward predicting metastatic progression of melanoma based on gene expression data. Pigment cell & melanoma research 2015 28(4):453-463.]
  2. Wells, M.L., Washington, O.L., Hicks, S.N., Nobile, C.J., Hartooni, N., Wilson, G.M., Zucconi, B.E., Huang, W., Li, L., Fargo, D.C., Blackshear, P.J. Post-transcriptional regulation of transcript abundance by a conserved member of the tristetraprolin family in Candida albicans. Mol. Microbiol., 2015, 95(6):1036-1053.   [Abstract Wells, M.L., Washington, O.L., Hicks, S.N., Nobile, C.J., Hartooni, N., Wilson, G.M., Zucconi, B.E., Huang, W., Li, L., Fargo, D.C., Blackshear, P.J. Post-transcriptional regulation of transcript abundance by a conserved member of the tristetraprolin family in Candida albicans. Mol. Microbiol., 2015, 95(6):1036-1053.  ]
  3. Choi, Y.-J., Lai, W.S., Fedic, R., Stumpo, D.J, Huang, W., Li, L., Perera, L., Brewer, B.Y., Brewer, B.Y., Wilson, G.M., Mason, J.M., Blackshear, P.J. The Drosophila Tis11 protein and its effects on mRNA expression in flies. J. Biol. Chem., 2014, 289(51):35042-60. [Abstract Choi, Y.-J., Lai, W.S., Fedic, R., Stumpo, D.J, Huang, W., Li, L., Perera, L., Brewer, B.Y., Brewer, B.Y., Wilson, G.M., Mason, J.M., Blackshear, P.J. The Drosophila Tis11 protein and its effects on mRNA expression in flies. J. Biol. Chem., 2014, 289(51):35042-60.]
  4. Niu L, Huang W, Umbach DM, Li L. IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data. BMC genomics, 2014, 15:862. [Abstract Niu L, Huang W, Umbach DM, Li L. IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data. BMC genomics, 2014, 15:862.]
  5. Zhang, X., Li, B., Ma, L., Li, L., Zheng, D., Li W., Chu, M., Mailman, R.B., Archer, T.K., Wang, Y. Transcriptional repression by specific SWI/SNF components affects pluripotency of human embryonic stem cells. Stem Cell Report, 2014, 3(3):460-474. [Abstract Zhang, X., Li, B., Ma, L., Li, L., Zheng, D., Li W., Chu, M., Mailman, R.B., Archer, T.K., Wang, Y. Transcriptional repression by specific SWI/SNF components affects pluripotency of human embryonic stem cells. Stem Cell Report, 2014, 3(3):460-474.]
  6. Hewitt, S.C., Li, L., Grimm, S.A., Winuthayanon, W., Hamilton, K.J., Pockette, B., Rubel, CA., Pedersen, L.C., Fargo, D., Lanz, R.B., DeMayo, F.J., Schutz, G., Korach, K.S. Novel DNA motif binding activity observed in vivo with an estrogen receptor alpha mutant mouse. Mol. Endocrinol. 2014, 28(6):899-911. [Abstract Hewitt, S.C., Li, L., Grimm, S.A., Winuthayanon, W., Hamilton, K.J., Pockette, B., Rubel, CA., Pedersen, L.C., Fargo, D., Lanz, R.B., DeMayo, F.J., Schutz, G., Korach, K.S. Novel DNA motif binding activity observed in vivo with an estrogen receptor alpha mutant mouse. Mol. Endocrinol. 2014, 28(6):899-911.]
  7. Li, Y., Umbach, D.M., Li, L. T-KDE: A method for analyzing genome-wide protein binding pat-terns from ChIP-seq data. BMC Genomics, 2014, 15:27. [Abstract Li, Y., Umbach, D.M., Li, L. T-KDE: A method for analyzing genome-wide protein binding pat-terns from ChIP-seq data. BMC Genomics, 2014, 15:27.]
  8. Li, Y., Hamilton, K.J., Lai, A.Y., Burns, K.A., Li, L., Wade, P.A., Korach, K.S. Diethylstilbestrol (DES)-stimulated hormonal toxicity is mediated by ERalpha alteration of target gene methylation patterns and epigenetic modifiers (DNMT3A, MBD2, and HDAC2) in the mouse seminal vesicle. Environ. Health Perspect., 2014, 122(3):262-8. [Abstract Li, Y., Hamilton, K.J., Lai, A.Y., Burns, K.A., Li, L., Wade, P.A., Korach, K.S. Diethylstilbestrol (DES)-stimulated hormonal toxicity is mediated by ERalpha alteration of target gene methylation patterns and epigenetic modifiers (DNMT3A, MBD2, and HDAC2) in the mouse seminal vesicle. Environ. Health Perspect., 2014, 122(3):262-8.]
  9. Madenspacher, J., Azzam, K., Gowdy, K., Malcolm, K., Nick, J., Aloor, D. J., Draper, D., Guardiola, J., Shatz, M., Menendez, D., Lowe, J., Lu, J., Bushel, P., Li, Leping, Merrick, A., Resnick, M.A. and Fessler, M. p53 Integrates host defense and cell fate during bacterial pneumonia. J. Experimental Medicine:  891-904, 2013.   [Abstract Madenspacher, J., Azzam, K., Gowdy, K., Malcolm, K., Nick, J., Aloor, D. J., Draper, D., Guardiola, J., Shatz, M., Menendez, D., Lowe, J., Lu, J., Bushel, P., Li, Leping, Merrick, A., Resnick, M.A. and Fessler, M. p53 Integrates host defense and cell fate during bacterial pneumonia. J. Experimental Medicine:  891-904, 2013.  ]
  10. Tennant, B., Robertson, A.G., Kramer, M., Li, L., Zhang, X., Beach, M., Thiessen, N., Chiu, R., Mungall, K., Whiting, C., Sabatini, P., Kim, A., Gottardo, R., Marra, M., Lynn, F., Jones, S.J.M., Hoodless, P.A., Hoffman, B.G. Identification and analysis of pancreatic islet enhancers. Diabetologia, 2013, 56(3):542-552. [Abstract Tennant, B., Robertson, A.G., Kramer, M., Li, L., Zhang, X., Beach, M., Thiessen, N., Chiu, R., Mungall, K., Whiting, C., Sabatini, P., Kim, A., Gottardo, R., Marra, M., Lynn, F., Jones, S.J.M., Hoodless, P.A., Hoffman, B.G. Identification and analysis of pancreatic islet enhancers. Diabetologia, 2013, 56(3):542-552.]
  11. Li Y, Huang W, Niu L, Umbach DM, Covo S, Li L. Characterization of constitutive CTCF/cohesin loci: a possible role in establishing topological domains in mammalian genomes, BMC Genomics, 2013, 14:553. [Abstract Li Y, Huang W, Niu L, Umbach DM, Covo S, Li L. Characterization of constitutive CTCF/cohesin loci: a possible role in establishing topological domains in mammalian genomes, BMC Genomics, 2013, 14:553.]
  12. Huang W, Loganantharaj R, Schroeder B, Fargo D, Li L. PAVIS: a tool for Peak Annotation and Visualization, Bioinformatics, 2013, 29(23):3097-9. [Abstract Huang W, Loganantharaj R, Schroeder B, Fargo D, Li L. PAVIS: a tool for Peak Annotation and Visualization, Bioinformatics, 2013, 29(23):3097-9.]
Back
to Top