Biostatistics & Computational Biology Branch
Research Summary
Leping Li, Ph.D., is a senior investigator in the Biostatistics and Computational Biology Branch. His research program focuses on computational biology, and his staff is a multidisciplinary team. The early focus of the group was the development and implementation of computational/statistical methods to mine high-dimensional genomic data.
Historically, our group has focused on bioinformatics approaches to cancer genomics and sleep-related research. Building on this foundation, we have recently launched a new initiative centered on the development of deep learning models for toxicology.
Ongoing projects in the lab include:
Deep Learning on Toxicology Data - The U.S. Tox21 collaboration has generated a large reference library of high-throughput concentration–response assays. Here we present Tox21mer, a 43.5-million-parameter transformer that encodes each Tox21 concentration–response curve together with assay metadata into a 768-dimensional representation. Tox21mer was pretrained on ~2.5 million curves from 102 assay protocols and 6,727 compounds using masked-response reconstruction as the primary objective, with low-weight auxiliary supervision on assay outcome and AC50. To evaluate the learned representation, we trained lightweight probes on frozen embeddings from concentration–response curves of held-out compounds. The representation supported a macro-F1 of 0.985 for three-class outcome prediction (agonist, antagonist, inactive), a binary F1 of 0.994 for active/inactive prediction, and an R2 of 0.87 for log10(AC50). The learned embeddings formed coherent groupings by curve-class category. A masked-only pretraining variant retained near-baseline probe performance, indicating that the representation is learned largely from the self-supervised objective rather than from auxiliary labels. Ablation analyses further showed that predictive performance depends mainly on curve-level response-value distributions conditioned on assay context, with limited reliance on detailed within-curve ordering. Tox21mer thus provides a reusable foundation representation for Tox21 concentration-response data that can support extrapolation to untested compounds through integration with chemical features or distillation into chemistry-only student models for large-scale external screening.
Our group is also building a next-generation foundation model for NTP chronic and subchronic assay pathology data, unlocking the power of one of the world’s most extraordinary toxicology resources. This effort brings together tens of thousands of ultra-high-resolution, gigabyte-scale pathology images and millions of pathology report entries generated through NTP studies. By integrating these massive and richly informative datasets, we are creating AI tools that can transform how tissue-level toxicological effects are detected, interpreted, and predicted. This is an ambitious and exciting undertaking, and we are making remarkable progress. Our ultimate vision is to develop foundation models capable of predicting tissue pathology for previously untested chemicals, accelerating discovery and helping shape the future of predictive toxicology.
Software
- ART
Set of Simulation Tools - coMotif
A three-component mixture framework to model the joint distribution of two motifs as well as the situation where some sequences contain only one or none of the motifs. - GADEM
An unbiased de novo motif discovery tool implementing an expectation-maximization (EM) algorithm. - GA/KNN
Selects the most discriminative variables for sample classification and may be used for analysis of microarray gene expression data, proteomic data or other high-dimensional data. - SSAVE
SSAVE: Sleep Cycle and Spectrogram Analysis and Visualization for Electroencephalography Data - T-KDE
T-KDE will identify the locations of constitutive binding sites. T-KDE, which combines a binary range tree with a kernel density estimator, is applied to ChIP-seq data from multiple cell lines.
Selected Publications
- Leping Li, Jisoo Hwang, Keith Shockley, Yuanyuan Li. Tox21mer, A transformer foundation model for Tox21 high-throughput concentration–response curves data. Submitted.
- Md Rashidul Hasan, David M. Umbach, Min Shi, Chao Gu, Deryck Yeung, Amlan Talukder, Zheng Fan, and Leping Li. Seasonal variation in sleep apnea and other sleep parameters from diagnostic polysomnography data, 2003-2024. Communications Health, accepted.
- Leping Li, Min Shi, David M. Umbach, and Zheng Fan. Age Trajectories of O2 Saturation and Levels of Serum Bicarbonate or End-Tidal CO2 Across the Life Course of Women and Men: Insights from EHR and PSG Data. Biomolecules. 2025 Jun 17;15(6):884. [Abstract].
- Leping Li, Amlan Talukder, Deryck Yeung, Yuanyuan Li, David M. Umbach, John Gilmore, Zheng Fan. Comparison of overnight trends in relative power for specific frequency bands, sleep stages, and brain regions between patients with depressive disorder and matched control subjects. Psychiatry Res Neuroimaging. 2025 Aug; 351:112021. [Abstract].
- Leping Li, Min Shi, David M. Umbach, Katelyn Bricker, Zheng Fan. Sex- and Age-differences in Supine Positional Obstructive Sleep Apneas in Children and Adults. Sleep Breath. 2025 Feb 17; 29(1):106. [Abstract].
- Deryck Yeung, Amlan Talukder, Min Shi, David M. Umbach, Yuanyuan Li, Alison Motsinger-Reif, Janice J. Hwang, Zheng Fan, and Leping Li. Differences in brain spindle density during sleep between patients with and without type 2 diabetes. Comput Biol Med 2025, 184, 109484. PMID: 39622099. [Abstract].
- Nishanth Anandanadarajah, Deryck Yeung, Amlan Talukder, Yuanyuan Li, David M. Umbach, Zheng Fan, Leping Li. Detection of movement and lead-popping artifacts in electroencephalography data from overnight polysomnography studies. Signals 2024, 5(4), 690-704; PMC11687361. [Abstract].
- Amlan Talukder, Yuanyuan Li, Deryck Yeung, Min Shi, David M. Umbach, and Zheng Fan and Leping Li. OSApredictor: A tool for prediction of moderate to severe obstructive sleep apnea-hypopnea using readily available patient characteristics. Comput Biol Med 2024 Jun 19:178:108777. PMID: 38901189. [Abstract].
- Amlan Talukder, Deryck Yeung, Yuanyuan Li, Nishanth Anandanadarajah, David M. Umbach, Zheng Fan, Leping Li. Comparison of power spectra from overnight electro-encephalography between patients with Down syndrome and matched control subjects. J Sleep Res. 2024 Feb 27:e14187. PMID: 38410055. [Abstract].
- Li L, Perera L, Varghese SA, Shiloh-Malawsky Y, Hunter SE, Sneddon TP, Powell CM, Matera AG, Fan Z. 2023. A homozygous missense variant in the YG box domain in an individual with severe spinal muscular atrophy: a case report and variant characterization. Front Cell Neurosci. 17:1259380. doi: 10.3389/fncel.2023.1259380. [Abstract]
- Li W, Nakano H, Fan W, Li Y, Sil P, Nakano K, Zhao F, Karmaus PW, Grimm SA, Shi M, Xu X, Mizuta R, Kitamura D, Wan Y, Fessler MB, Cook DN, Shats I, Li X, Li L. 2023. DNASE1L3 enhances antitumor immunity and suppresses tumor progression in colon cancer. JCI Insight. 8(17):e168161. doi: 10.1172/jci.insight.168161. [Abstract]
- Talukder A, Li Y, Yeung D, Umbach DM, Fan Z, Li L. SSAVE: A tool for analysis and visualization of sleep periods using electroencephalography data. 2023. Front Sleep. 2:1102391. doi: 10.3389/frsle.2023.1102391. [Abstract]
- Li L, Umbach DM, Li Y, Halani P, Shi M, Ahn M, Yeung DSC, Vaughn B, Fan ZJ. Sleep apnoea and hypoventilation in patients with five major types of muscular dystrophy. 2023. BMJ Open Respir Res. 10(1):e001506. doi: 10.1136/bmjresp-2022-001506. [Abstract]
- Nodzenski M, Shi M, Krahn JM, Wise AS, Li Y, Li L, Umbach DM, Weinberg CR. GADGETS: a genetic algorithm for detecting epistasis using nuclear families. 2022. Bioinformatics. 38(4):1052-1058. doi: 10.1093/bioinformatics/btab766. [Abstract]
- Tang S, Zhang Z, Oakley RH, Li W, He W, Xu X, Ji M, Xu Q, Chen L, Wellman AS, Li Q, Li L, Li JL, Li X, Cidlowski JA, Li X. 2021. Intestinal epithelial glucocorticoid receptor promotes chronic inflammation-associated colorectal cancer. JCI Insight. 6(24):e151815. doi: 10.1172/jci.insight.151815. PMID: 34784298. [Abstract]
- Li Y, Umbach DM, Krahn JM, Shats I, Li X, Li L. 2021. Predicting tumor response to drugs based on gene-expression biomarkers of sensitivity learned from cancer cell lines. BMC Genomics. 22(1):272. [Abstract Li Y, Umbach DM, Krahn JM, Shats I, Li X, Li L. 2021. Predicting tumor response to drugs based on gene-expression biomarkers of sensitivity learned from cancer cell lines. BMC Genomics. 22(1):272.]
- Gagliano T, Shah K, Gargani S, Lao L, Alsaleem M, Chen J, Ntafis V, Huang P, Ditsiou A, Vella V, Yadav K, Bienkowska K, Bresciani G, Kang K, Li L, Carter P, Benstead-Hume G, O'Hanlon T, Dean M, Pearl FM, Lee SC, Rakha EA, Green AR, Kontoyiannis DL, Song E, Stebbing J, Giamas G. 2020. PIK3Cδ expression by fibroblasts promotes triple-negative breast cancer progression. J Clin Invest; doi: 10.1172/JCI128313 [Online 3 March 2020]. [Abstract Gagliano T, Shah K, Gargani S, Lao L, Alsaleem M, Chen J, Ntafis V, Huang P, Ditsiou A, Vella V, Yadav K, Bienkowska K, Bresciani G, Kang K, Li L, Carter P, Benstead-Hume G, O'Hanlon T, Dean M, Pearl FM, Lee SC, Rakha EA, Green AR, Kontoyiannis DL, Song E, Stebbing J, Giamas G. 2020. PIK3Cδ expression by fibroblasts promotes triple-negative breast cancer progression. J Clin Invest; doi: 10.1172/JCI128313 [Online 3 March 2020].]
- Yuanyuan Li, David M. Umbach, Adrienna Bingham, Qi-Jing Li, Yuan Zhuang and Leping Li. Putative Biomarkers for Predicting Tumor Sample Purity Based on Gene Expression Data. BMC Genomics Volume 20, Article number: 1021 (2019). [Abstract Yuanyuan Li, David M. Umbach, Adrienna Bingham, Qi-Jing Li, Yuan Zhuang and Leping Li. Putative Biomarkers for Predicting Tumor Sample Purity Based on Gene Expression Data. BMC Genomics Volume 20, Article number: 1021 (2019).]
- Kang K, Meng Q, Shats I, Umbach DM, Li M, Li Y, Li X, Li L. CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data. PLoS Comput Biol., 2019,15(12):e1007510. [Abstract Kang K, Meng Q, Shats I, Umbach DM, Li M, Li Y, Li X, Li L. CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data. PLoS Comput Biol., 2019,15(12):e1007510.]
- Igor Shats, Jason G. Williams, Juan Liu, Leesa J. Deterding, Chaemin Lim, Xiaojiang Xu, Thomas A. Randall, Ethan Lee, Wenling Li, Wei Fan, Jian-Liang Li, Marina Sokolsky, Alexander V. Kabanov, Leping Li, Jason W. Locasale and Xiaoling Li. Bacteria Boost Mammalian Host NAD Metabolism by Engaging the Deamidated Biosynthesis Pathway. Cell Metabolism, accepted. [Abstract Igor Shats, Jason G. Williams, Juan Liu, Leesa J. Deterding, Chaemin Lim, Xiaojiang Xu, Thomas A. Randall, Ethan Lee, Wenling Li, Wei Fan, Jian-Liang Li, Marina Sokolsky, Alexander V. Kabanov, Leping Li, Jason W. Locasale and Xiaoling Li. Bacteria Boost Mammalian Host NAD Metabolism by Engaging the Deamidated Biosynthesis Pathway. Cell Metabolism, accepted.]
- Nguyen TA, Grimm SA, Bushel PR, Li J, Li Y, Bennett BD, Lavender CA, Ward JM, Fargo DC, Anderson CW, Li L, Resnick MA, Menendez D. Revealing a human p53 universe. Nucleic Acids Res, 2018, 46(16):8153-8167. [Abstract Nguyen TA, Grimm SA, Bushel PR, Li J, Li Y, Bennett BD, Lavender CA, Ward JM, Fargo DC, Anderson CW, Li L, Resnick MA, Menendez D. Revealing a human p53 universe. Nucleic Acids Res, 2018, 46(16):8153-8167.]
- Ungewitter EK, Rotgers E, Kang HS, Lichti-Kaiser K, Li L, Grimm SA, Jetten AM, Yao HH. Loss of Glis3 causes dysregulation of retrotransposon silencing and germ cell demise in fetal mouse testis. Sci Rep. 2018, 8(1):9662. [Abstract Ungewitter EK, Rotgers E, Kang HS, Lichti-Kaiser K, Li L, Grimm SA, Jetten AM, Yao HH. Loss of Glis3 causes dysregulation of retrotransposon silencing and germ cell demise in fetal mouse testis. Sci Rep. 2018, 8(1):9662.]
- Miao YL, Gambini A, Zhang Y, Jefferson WN, Padilla-Banks E, Bernhardt ML, Huang W, Li L, Williams CJ. Mediator complex component MED13 regulates the mouse oocyte-to-embryo transition and is required for postimplantation development. Biol Reprod. 2018, 98(4):449-464. [Abstract Miao YL, Gambini A, Zhang Y, Jefferson WN, Padilla-Banks E, Bernhardt ML, Huang W, Li L, Williams CJ. Mediator complex component MED13 regulates the mouse oocyte-to-embryo transition and is required for postimplantation development. Biol Reprod. 2018, 98(4):449-464.]
- Roy S, Moore AJ, Love C, Reddy A, Rajagopalan D, Dave S, Li L, Murre C, Zhuang Y. Id proteins suppress E2A-driven innate-like T cell development prior to TCR selection. Front Immunol. 2018, 9:42. [Abstract Roy S, Moore AJ, Love C, Reddy A, Rajagopalan D, Dave S, Li L, Murre C, Zhuang Y. Id proteins suppress E2A-driven innate-like T cell development prior to TCR selection. Front Immunol. 2018, 9:42.]
- Li Y, Krahn JM, Flake GP, Umbach DM, Li L. Toward predicting metastatic progression of melanoma based on gene expression data. Pigment cell & melanoma research 2015 28(4):453-463. [Abstract Li Y, Krahn JM, Flake GP, Umbach DM, Li L. Toward predicting metastatic progression of melanoma based on gene expression data. Pigment cell & melanoma research 2015 28(4):453-463.]
- Wells, M.L., Washington, O.L., Hicks, S.N., Nobile, C.J., Hartooni, N., Wilson, G.M., Zucconi, B.E., Huang, W., Li, L., Fargo, D.C., Blackshear, P.J. Post-transcriptional regulation of transcript abundance by a conserved member of the tristetraprolin family in Candida albicans. Mol. Microbiol., 2015, 95(6):1036-1053. [Abstract Wells, M.L., Washington, O.L., Hicks, S.N., Nobile, C.J., Hartooni, N., Wilson, G.M., Zucconi, B.E., Huang, W., Li, L., Fargo, D.C., Blackshear, P.J. Post-transcriptional regulation of transcript abundance by a conserved member of the tristetraprolin family in Candida albicans. Mol. Microbiol., 2015, 95(6):1036-1053. ]
- Choi, Y.-J., Lai, W.S., Fedic, R., Stumpo, D.J, Huang, W., Li, L., Perera, L., Brewer, B.Y., Brewer, B.Y., Wilson, G.M., Mason, J.M., Blackshear, P.J. The Drosophila Tis11 protein and its effects on mRNA expression in flies. J. Biol. Chem., 2014, 289(51):35042-60. [Abstract Choi, Y.-J., Lai, W.S., Fedic, R., Stumpo, D.J, Huang, W., Li, L., Perera, L., Brewer, B.Y., Brewer, B.Y., Wilson, G.M., Mason, J.M., Blackshear, P.J. The Drosophila Tis11 protein and its effects on mRNA expression in flies. J. Biol. Chem., 2014, 289(51):35042-60.]
- Niu L, Huang W, Umbach DM, Li L. IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data. BMC genomics, 2014, 15:862. [Abstract Niu L, Huang W, Umbach DM, Li L. IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data. BMC genomics, 2014, 15:862.]
- Zhang, X., Li, B., Ma, L., Li, L., Zheng, D., Li W., Chu, M., Mailman, R.B., Archer, T.K., Wang, Y. Transcriptional repression by specific SWI/SNF components affects pluripotency of human embryonic stem cells. Stem Cell Report, 2014, 3(3):460-474. [Abstract Zhang, X., Li, B., Ma, L., Li, L., Zheng, D., Li W., Chu, M., Mailman, R.B., Archer, T.K., Wang, Y. Transcriptional repression by specific SWI/SNF components affects pluripotency of human embryonic stem cells. Stem Cell Report, 2014, 3(3):460-474.]
- Hewitt, S.C., Li, L., Grimm, S.A., Winuthayanon, W., Hamilton, K.J., Pockette, B., Rubel, CA., Pedersen, L.C., Fargo, D., Lanz, R.B., DeMayo, F.J., Schutz, G., Korach, K.S. Novel DNA motif binding activity observed in vivo with an estrogen receptor alpha mutant mouse. Mol. Endocrinol. 2014, 28(6):899-911. [Abstract Hewitt, S.C., Li, L., Grimm, S.A., Winuthayanon, W., Hamilton, K.J., Pockette, B., Rubel, CA., Pedersen, L.C., Fargo, D., Lanz, R.B., DeMayo, F.J., Schutz, G., Korach, K.S. Novel DNA motif binding activity observed in vivo with an estrogen receptor alpha mutant mouse. Mol. Endocrinol. 2014, 28(6):899-911.]
- Li, Y., Umbach, D.M., Li, L. T-KDE: A method for analyzing genome-wide protein binding pat-terns from ChIP-seq data. BMC Genomics, 2014, 15:27. [Abstract Li, Y., Umbach, D.M., Li, L. T-KDE: A method for analyzing genome-wide protein binding pat-terns from ChIP-seq data. BMC Genomics, 2014, 15:27.]
- Li, Y., Hamilton, K.J., Lai, A.Y., Burns, K.A., Li, L., Wade, P.A., Korach, K.S. Diethylstilbestrol (DES)-stimulated hormonal toxicity is mediated by ERalpha alteration of target gene methylation patterns and epigenetic modifiers (DNMT3A, MBD2, and HDAC2) in the mouse seminal vesicle. Environ. Health Perspect., 2014, 122(3):262-8. [Abstract Li, Y., Hamilton, K.J., Lai, A.Y., Burns, K.A., Li, L., Wade, P.A., Korach, K.S. Diethylstilbestrol (DES)-stimulated hormonal toxicity is mediated by ERalpha alteration of target gene methylation patterns and epigenetic modifiers (DNMT3A, MBD2, and HDAC2) in the mouse seminal vesicle. Environ. Health Perspect., 2014, 122(3):262-8.]
- Madenspacher, J., Azzam, K., Gowdy, K., Malcolm, K., Nick, J., Aloor, D. J., Draper, D., Guardiola, J., Shatz, M., Menendez, D., Lowe, J., Lu, J., Bushel, P., Li, Leping, Merrick, A., Resnick, M.A. and Fessler, M. p53 Integrates host defense and cell fate during bacterial pneumonia. J. Experimental Medicine: 891-904, 2013. [Abstract Madenspacher, J., Azzam, K., Gowdy, K., Malcolm, K., Nick, J., Aloor, D. J., Draper, D., Guardiola, J., Shatz, M., Menendez, D., Lowe, J., Lu, J., Bushel, P., Li, Leping, Merrick, A., Resnick, M.A. and Fessler, M. p53 Integrates host defense and cell fate during bacterial pneumonia. J. Experimental Medicine: 891-904, 2013. ]
- Tennant, B., Robertson, A.G., Kramer, M., Li, L., Zhang, X., Beach, M., Thiessen, N., Chiu, R., Mungall, K., Whiting, C., Sabatini, P., Kim, A., Gottardo, R., Marra, M., Lynn, F., Jones, S.J.M., Hoodless, P.A., Hoffman, B.G. Identification and analysis of pancreatic islet enhancers. Diabetologia, 2013, 56(3):542-552. [Abstract Tennant, B., Robertson, A.G., Kramer, M., Li, L., Zhang, X., Beach, M., Thiessen, N., Chiu, R., Mungall, K., Whiting, C., Sabatini, P., Kim, A., Gottardo, R., Marra, M., Lynn, F., Jones, S.J.M., Hoodless, P.A., Hoffman, B.G. Identification and analysis of pancreatic islet enhancers. Diabetologia, 2013, 56(3):542-552.]
- Li Y, Huang W, Niu L, Umbach DM, Covo S, Li L. Characterization of constitutive CTCF/cohesin loci: a possible role in establishing topological domains in mammalian genomes, BMC Genomics, 2013, 14:553. [Abstract Li Y, Huang W, Niu L, Umbach DM, Covo S, Li L. Characterization of constitutive CTCF/cohesin loci: a possible role in establishing topological domains in mammalian genomes, BMC Genomics, 2013, 14:553.]
- Huang W, Loganantharaj R, Schroeder B, Fargo D, Li L. PAVIS: a tool for Peak Annotation and Visualization, Bioinformatics, 2013, 29(23):3097-9. [Abstract Huang W, Loganantharaj R, Schroeder B, Fargo D, Li L. PAVIS: a tool for Peak Annotation and Visualization, Bioinformatics, 2013, 29(23):3097-9.]