Biostatistics & Computational Biology Branch
Complete deconvolution: The branch developed a novel complete deconvolution method, CDSeq, that uses only RNA-Seq data from bulk tissue samples to simultaneously estimate both cell-type proportions and cell-type-specific expression profiles. Using several synthetic and real experimental datasets with known cell-type composition and cell-type-specific expression profiles, we showed that CDSeq performed well in estimating cell-type proportions and cell-type-specific gene expression profiles from heterogeneous tissue samples.
Methods and Applications for Sample Classification and Prediction: The branch develops and implements various computational/statistical methods for mining high-dimensional data, including the genetic algorithm/k-nearest neighbor (GA/KNN) method and methods based on classification and regression trees (CART). The branch continues to improve and refine those tools as needs change. The CART-based methods are especially powerful for mining big data as they are computationally efficient. Besides efficiency, the CART-based approached have several other advantages. Those methods have the flexibility in handling data containing both categorical and continuous data. The tree-based methods can capture higher order interactions effects. The results from those applications are intuitive and interpretable.
Nuclear–Mitochondrial Epistasis: Approximately 1,800 nuclear genes in the mouse and human encode for mitochondria proteins. The mitochondrial genomes are about 16.5 kb in size and contain 37 genes that encode 13 proteins, 22 tRNA, and 2 rRNAs. Single nucleotide polymorphisms (SNPs) in both genomes are known to cause diseases. The branch develops statistical and computational methods to test the interaction of SNPs in the mitochondria and nuclear genomes of breast cancer female subjects, neonates exhibiting bronchopulmonary dysplasia or strains of mice exposed to excessive levels of oxygen for association with tissue phenotypes or morphological characteristics that may identify variants correlated with adverse biological outcomes.
Perturbation Network Analysis: The recent advancement of single cell “Omics” technology allows us to obtain highly resolved molecular phenotypes directly from individual cells from patient samples that can be used to define cell states, understand cell circuitry, study developmental processes including cellular responses and toxicity to drugs or chemicals and ultimately optimize targeted therapies for individualized medicine. Single cell data presents huge methodology related issues related to scalability, visualization, interpretability, dynamics and mixtures. Current methods are in their early stages, thereby limiting major types of scientific inquiry. We develop optimal visualization, dynamic and spatial network models at a multiscale level to study complex perturbations of biological processes or systems.