August 14 – 15, 2018
Background and Scientific Need
With rapidly developing technology and more efficient data collection procedures, environmental health scientists are now collecting vast amounts of data. These data sets, termed "Big Data", can be large, complex, multidimensional, diverse, and are often generated using new technologies. They are associated with basic, translational, clinical, social, behavioral, environmental, or informatics research questions. Such data types may include imaging, phenotypic, genotypic, molecular, clinical, behavioral, environmental, and many other types of biological and biomedical data. Data science has emerged from its roots in applied statistics, analytics, and bioinformatics as a new area of research to meet the challenges in sharing, accessing, analyzing, and interpreting big data. "Data science", defined as the extraction of useful knowledge directly from data through a process of discovery, or of hypothesis formulation and hypothesis testing (https://bigdatawg.nist.gov/), refers to the management and execution of the end-to end data processes.
National Institutes of Health (NIH) has made early efforts to address the gap between the needed and existing biomedical data science skills through investments in training and education as part of the Big Data to Knowledge (BD2K) Initiative. The programs and Funding Opportunity Announcements released had two main, and somewhat separable, goals: 1) improving big data skills of biomedical scientists; and 2) increasing the number biomedical data scientists. These NIH-wide efforts were not domain-specific and were intended to develop resources which could benefit all NIH institutes.
This workshop brought together experts from relevant research disciplines to examine existing data science and environmental health science (EHS) resources (trainee pipelines, mentors, research), identify how these resources can address EHS-specific training goals in data science, and make recommendations for National Institute of Environmental Health Sciences (NIEHS) in data science training.
Workshop Objective and Framework
The overarching goal of the workshop was to develop an overall strategy to build a data science competent EHS workforce. The workshop was organized into three major sessions. The first session was designed to understand the current state of data science in the EHS domain as it relates to training, and through the evaluation of representative scientific 'use cases' (nominated by The Division of Extramural Research and Training (DERT) program branches), current limitations for data science training in EHS will be identified. The second session examined existing training resources relevant to the intersection of EHS and data science and will relate EHS training goals to the accomplishments of BD2K. The final session formulated how to build EHS training in data science and will be a discussion with participant input into questions formulated before the workshop and during the planning.