Skip Navigation

Your Environment. Your Health.

Leping Li, Ph.D.

Biostatistics & Computational Biology Branch

Leping Li, Ph.D.
Leping Li, Ph.D.
Principal Investigator
Tel (919) 541-5168
Fax (919) 541-4311
P.O. Box 12233
Mail Drop A3-03
Research Triangle Park, NC 27709

Delivery | Postal
Delivery Instructions

Research Summary

Leping Li, Ph.D., and his staff are developing and implementing methods for detecting and discovering functional elements such as the cis-regulatory motifs in a set of sequences using Markov models and Expectation Maximization (EM) methods. Specifically, they developed an efficient sequence alignment algorithm for identifying conserved segments between two divergent sequences, e.g., promoter sequences. Li's group also worked on methods that improve the quality of motif models and a motif identification tool that controls the false discovery rate.

Li recently developed the GADEM software that can be applied to large scale sequence data for unbiased motif discovery. Currently, his group is developing a method that identifies transcription factor and its co-regulatory motifs in ChIP-seq datasets and computational/statistical methods for identifying genomic loci that are differentially enriched in sequence reads counts in ChIP-seq and mRNA-seq data.

  • Methods for identifying transcription factor and its co-regulatory factor motifs
  • De novo motif discovery and identification
  • Methods for identifying enriched genomic loci in ChIP-seq and mRNA-seq data
  • Accurate anchoring alignment of divergent sequences
  • A method for gene set enrichment analysis for continuous non-monotone relationships
  • A genetic algorithm/k-nearest neighbor (GA/KNN) method for microarray and proteomics data analysis

The source code and documentation for GA/KNN and GAPWM may be downloaded from the Biostatistics and Computational Biology Branch Resources page, but more information on GA/KNN appears on Li’s Studies page.


  • coMotif ("/Rhythmyx/assembler/render?sys_contentid=34836&sys_revision=3&sys_variantid=639&sys_context=0&sys_authtype=0&sys_siteid=&sys_folderid=" sys_dependentvariantid="639" sys_dependentid="34836" inlinetype="rxhyperlink" rxinlineslot="103" sys_dependentid="34836" sys_siteid="" sys_folderid="")
    A three-component mixture framework to model the joint distribution of two motifs as well as the situation where some sequences contain only one or none of the motifs.
  • EpiCenter
    ("/Rhythmyx/assembler/render?sys_contentid=34838&sys_revision=3&sys_variantid=639&sys_context=0&sys_authtype=0&sys_siteid=&sys_folderid=" sys_dependentvariantid="639" sys_dependentid="34838" inlinetype="rxhyperlink" rxinlineslot="103" sys_dependentid="34838" sys_siteid="" sys_folderid="")EpiCenter is a powerful analysis tool of genome-wide mRNA-seq or ChIP-seq data for detecting differentially expressed genes or identifying changes in epigenetic modifications.
  • fdrMotif
    ("/Rhythmyx/assembler/render?sys_contentid=34887&sys_revision=3&sys_variantid=639&sys_context=0&sys_authtype=0&sys_siteid=&sys_folderid=" sys_dependentvariantid="639" sys_dependentid="34887" inlinetype="rxhyperlink" rxinlineslot="103" sys_dependentid="34887" sys_siteid="" sys_folderid="")Determines the number of binding sites in each sequence of a probability model by performing statistical tests.
  • GA/KNN
    ("/Rhythmyx/assembler/render?sys_contentid=34892&sys_revision=3&sys_variantid=639&sys_context=0&sys_authtype=0&sys_siteid=&sys_folderid=" sys_dependentvariantid="639" sys_dependentid="34892" inlinetype="rxhyperlink" rxinlineslot="103" sys_dependentid="34892" sys_siteid="" sys_folderid="")Selects the most discriminative variables for sample classification and may be used for analysis of microarray gene expression data, proteomic data or other high-dimensional data.
  • GADEM ("/Rhythmyx/assembler/render?sys_contentid=34890&sys_revision=3&sys_variantid=639&sys_context=0&sys_authtype=0&sys_siteid=&sys_folderid=" sys_dependentvariantid="639" sys_dependentid="34890" inlinetype="rxhyperlink" rxinlineslot="103" sys_dependentid="34890" sys_siteid="" sys_folderid="")
    An unbiased de novo motif discovery tool implementing an expectation-maximization (EM) algorithm.
  • Genetic Algorithm Method for Optimizing a Position Weight Matrix
    ("/Rhythmyx/assembler/render?sys_contentid=34945&sys_revision=4&sys_variantid=639&sys_context=0&sys_authtype=0&sys_siteid=&sys_folderid=" sys_dependentvariantid="639" sys_dependentid="34945" inlinetype="rxhyperlink" rxinlineslot="103" sys_dependentid="34945" sys_siteid="" sys_folderid="")Implements a simple method to improve a poorly estimated position weight matrix using chromatin immunoprecipitation data.
  • T-KDE
    T-KDE will identify the locations of constitutive binding sites. T-KDE, which combines a binary range tree with a kernel density estimator, is applied to ChIP-seq data from multiple cell lines.


Selected Publications

  1. SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nature Biotechnology 2014 32(9):903-914.[Abstract]
  2. Niu L, Huang W, Umbach DM, Li L. IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data. BMC genomics 2014 15(October):862-.[Abstract]
  3. Liu J, Liu X, Wang W, McCauley L, Pinto-Martin J, Wang Y, Li L, Yan C, Rogan WJ. Blood Lead Concentrations and Children's Behavioral and Emotional Problems: A Cohort Study. JAMA pediatrics 2014 168(8):737-745.[Abstract]
  4. Li Y, Umbach DM, Li L. T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets. BMC genomics 2014 15:27[Abstract]
  5. Madenspacher JH, Azzam KM, Gowdy KM, Malcolm KC, Nick JA, Dixon D, Aloor JJ, Draper DW, Guardiola JJ, Shatz M, Menendez D, Lowe J, Lu J, Bushel P, Li L, Merrick BA, Resnick MA, Fessler MB. p53 Integrates host defense and cell fate during bacterial pneumonia.  Journal of Experimental Medicine 2013 210(5):891-904.[Abstract]
  6. Huang W, Loganantharaj R, Schroeder B, Fargo D, Li L. PAVIS: a tool for Peak Annotation and Visualization, Bioinformatics, 2013, 29(23):3097-9.[Abstract]
  7. Li Y, Huang W, Niu L, Umbach DM, Covo S, Li L. Characterization of constitutive CTCF/cohesin loci: a possible role in establishing topological domains in mammalian genomes, BMC Genomics, 2013, 14:553.[Abstract]
  8. Huang W, Li L, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics, 2012, 28(4):593-594.[Abstract]
  9. Huang, W., Li, L., Myers, J.R., and Marth, G.T. ART: a next-generation sequencing simulator. Bioinformatics, 2011, doi: 10.1093/bioinformatics/btr708.[Abstract]  
  10. Xu M, Weinberg CR, Umbach DM, Li L. coMOTIF: A method for Identifying Transcription Co-regulator Binding Sites in ChIP-seq Data. Bioinformatics (Oxford, England), 2011, 27(19):2625-2632.[Abstract]
  11. Xu, M., Weinberg, C.R., Umbach, D.M. and Li, L. coMOTIF: A Mixture Framework for Identifying Transcription Factor and a Co-regulator Motif in ChIP-seq Data. Bioinformatics, 2011, Epub ahead of print  
  12. Huang W, Umbach DM, Vincent Jordan N, Abell AN, Johnson GL, Li L. Efficiently identifying genome-wide changes with next-generation sequencing data. Nucleic Acids Research, 2011, 39(19):e130.[Abstract]
  13. Abell, A.N., Jordan, N.V., Huang, W., Prat, A., Midland, A.A., Johnson, N.L., Granger, D.A., Mieczkowski, P.A., Perou, C.M., Gomez, S.M., Li, L., Johnson, G.L. MAP3K4/CBP-regulated H2B Acetylation Controls Epithelial-Mesenchymal Transition in Trophoblast Stem Cells. Cell Stem Cell, 2011, 8(5):525-537.[Abstract]  
  14. Mercier, E., Droit, A., Li, L., Robertson, G., Zhang, X., Gottardo, R. An integrated pipeline for the genome-wide analysis of transcription factor binding sites from ChIP-Seq. PloS one, 2011, 6(2):e16432.[Abstract]  
  15. Gilchrist, D.A., Santos, G.D., Gao, Y., Fargo, D.C., Li, L., and Adelman, K. Pausing of RNA polymerase II disrupts DNA-encoded nucleosome organization to facilitate precise gene regulation. Cell, 2010, 143, 540-545.[Abstract]  
  16. Lai, A.Y., Fatemi, M., Dhasarathy, A., Malone, C., Sobol, S.E., Geigerman, C., Jaye, D.L., Mav, D., Shah, R., Li, L., and Wade, P.A. DNA methylation prevents CTCF-mediated silencing of the oncogene BCL6 in B cell lymphomas. J. Exp. Med., 2010, 207, 1939-50.[Abstract]  
  17. Hoffman, B.G., Robertson, G., Zavaglia, B., Beach, M., Cullum, R., Lee, S., Soukhatcheva, G., Li, L., Wederell, E.D., Thiessen, N., Bilenky, M., Cezard, T., Tam, A., Kamoh, B., Birol, I., Dai, D., Zhao. Y.J., Hirst, M., Verchere, B., Helgason, C.D., Marra, M.A., Jones, S.J.M., and Hoodless, P.A., Locus co-occupancy, nucleosome positioning, and H3K3me1 regulate the functionality of FoxA2-, HNF4A-, and PDX1-bound loci in islets and liver. Genome Res., 2010, 20, 1037-51.[Abstract]  
  18. Hewitt, S.C., Li, Y., Li, L., and Korach, K.S. Estrogen-mediated regulation of Igf1 transcription and uterine growth involves direct binding of estrogen receptor alpha to estrogen-responsive elements. J. Biol. Chem., 2010, 285, 2676-2685.[Abstract]  
  19. Li, L. GADEM: A genetic algorithm guided formation of spaced dyads coupled with an EM algorithm for motif discovery. J. Comput. Biol., 2009, 16, 317-329.[Abstract]  
  20. Lin, R., Dai, S., Irwin, R.D., Heinloth, A.N., Boorman, G.A., and Li, L. Gene set enrichment analysis for non-monotone association and multiple experimental categories. BMC Bioinformatics, 2008, 9, 481.[Abstract]  
  21. Card, D.A.G., Hebbar, P.B., Li, L., and Archer, T.K. Oct4 transcriptionally activates pluripotency-associated miRNAs in human embryonic stem cells. Mol. Cell. Biol., 2008, 28, 6426-6438.  
  22. Gilchrist, D.A., Nechaev, S., Lee, C., Ghosh, S.K.B., Collins, J.B., Li, L., Gilmour, D.S., and Adelman, K. NELF-mediated stalling of Pol II can enhance gene expression by blocking promoter-proximal nucleosome assembly. Genes & Dev., 2008, 22, 1921-1933.[Abstract]  
  23. Li, L., Bass, R.L., and Liang, Y. fdrMotif: identifying cis-elements by an EM algorithm coupled with false discovery rate control. Bioinformatics, 2008, 24, 629-636.[Abstract]  
  24. Li, L., Liang, Y., and Bass, R.L. GAPWM: a genetic algorithm method for optimizing a position weight matrix. Bioinformatics, 2007, 23, 1188-1194.[Abstract]  
  25. Huang, W., Umbach, D.M., Ohler, U., and Li, L. Optimized mixed Markov models for motif identification. BMC Bioinformatics, 2006, 7, 279.[Abstract]  
  26. Huang W., Umbach, D.M., and Li L. Accurate anchoring alignment of divergent sequences. Bioinformatics, 2006, 22, 29-34.[Abstract]  
  27. Liu, D., Peddada, S.D., Li, L., and Weinberg, C.R. Phase analysis of circadian-related genes in two tissues. BMC Bioinformatics, 2006, 7:87.[Abstract]  
  28. Liu, D., Umbach, D., Peddada, S., Li, L., and Crockett, P.W., and Weinberg, C.R. A random-periods model for expression of cell cycle genes. Proc. Natl. Acad. Sci. USA, 2004, 101, 7240.[[Abstract]  
  29. Li, L., Umbach, D.M., Terry, P., and Taylor, J.A. Application of the GA/KNN method to SELDI proteomics data. Bioinformatics, 2004, 20, 1638.[Abstract]  
  30. Peddada, S.D., Lobenhofer, E.K., Li, L., Afshari, C.A., Weinberg, C.R., and Umbach, D. Selecting and clustering genes using order restricted inference methodology with applications to time-course microarray data. Bioinformatics, 2003, 19, 834.  
  31. Li, L., Weinberg, C.R., Darden, T.A., and Pedersen, L.G. Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics, 2001, 17, 1131.[Abstract]  
  32. Li, L., Darden, T.A., Weinberg, C.R., Levine, A.J., and Pedersen, L.G. Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Combinatorial Chemistry and High Throughput Screening, 2001, 4, 727.

Back to Top

Share This Page:

Page Options:

Request Translation Services