Skip Navigation

Your Environment. Your Health.

Broadly Applicable Statistical Methods

Biostatistics & Computational Biology Branch

The following investigators are involved in research related to this area: Pierre Bushel, Keith Shockley, and David Umbach, and Shanshan Zhao

Examples of ongoing projects include:

Disease risk assessment and prediction: With the improved statistical tools we are developing, the current population-based data could enable us to assess disease risks in the population efficiently and provide more accurate personalized risk prediction. For example, with regression methods for multiple time-to-disease outcomes, one could assess the effect of a risk factor on the hazard of each outcome, as well as on inter-disease dependency. Also, members of BCBB are developing improved models to characterize the spatial and temporal distribution of disease risks, and to identify important underlying environmental risk factors associated with the geographical and temporal patterns. We are also developing methods to use detailed family history data to improve existing breast cancer risk prediction model, and a competing risk model to handle pre-disease occult conditions (such as DCIS preceding invasive breast cancer). Topics related to risk evaluation are also studied, including ROC analysis and measurement error in mediation analysis.

High-dimensional data analysis: Methods are also being developed for analyzing high dimensional data, such as those arising in genomic studies (e.g. gene expression, CpG methylation) and toxicology. For example, toxicologists interested in studying the effects of a toxicant on an animal’s genome, conduct dose-response microarray studies to compare different dose groups in terms of the expressions of thousands of genes, resulting in a large number of statistical tests. Quantitative high through screening (qHTS) assays are being developed by toxicologists and pharmacologists in order to screen thousands of compounds efficiently, informatively, and inexpensively. Analysis of the resulting datasets presents numerous challenges because they use nonlinear statistical models, such as the Hill model, and the asymptotic p-values obtained from such analysis are not necessarily reliable. Members of the Branch are developing methods for analyzing such complex data.