Skip Navigation
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.


The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Your Environment. Your Health.

Data Science Initiatives

Data Science Training

The goal of this training series is to increase researchers' skills in data analysis, visualization, machine learning, and graph analytics.

Past training topics include:

  • Databases and Data systems
  • Environmental Health Science Datasets
  • Analysis Methods and Tools using Python
  • Data Products – Overview
  • Graph Analytics
  • Introduction to R
  • Machine Learning using R
  • Introduction to Git and GitHub

Resources for Environmental Health Data Science Training

  • Intro to Statistics: Making Decisions Based on Data
    This course will cover visualization, probability, regression and other topics that will help you learn the basic methods of understanding data with statistics.
  • Introduction to Data Science
    Tour the basic techniques of data science, including both SQL and NoSQL solutions for massive data management (e.g., MapReduce and contemporaries), algorithms for data mining (e.g., clustering and association rule mining), and basic statistical modeling (e.g., linear and non-linear regression).
  • Introductory Machine Learning
    ML is a key technology in Big Data, and in many financial, medical, commercial, and scientific applications. This course balances theory and practice, and covers the mathematical as well as the heuristic aspects.
  • Johns Hopkins Reproducible Research MOOC on YouTube
    This course will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.
  • Learn the Command Line
    The Command Line is a vital tool, allowing you to run programs, write scripts, automate tasks, and combine simple commands right on your computer. Learn how to use the Command Line to work with data—a tool most developers use every day.
  • Mining Massive Datasets
    This class teaches algorithms for extracting models and other information from very large amounts of data. The emphasis is on techniques that are efficient and that scale well.
  • National Human Genome Research Institute YouTube Channel
  • NIH Bioinformatics at NIAID Training Resources
  • NIH Data Science Training Portal
    This website is for data science courses on the NIH campus. Here, you can discover and register for upcoming short courses. You can give input about what topics you want to learn about and request courses.
  • Tackling the Challenges of Big Data - MIT Professional Education
    This course will survey state-of-the-art topics in Big Data, looking at data collection, data storage and processing and extracting structured data from unstructured data, systems issues, analytics, visualization, and a range of applications.
  • The Data Scientist's Toolbox
    The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.
  • The inTelligence And Machine lEarning (TAME)
    Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research.
  • The Johns Hopkins Data Science Specialization
    This specialization covers the concepts and tools you'll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results.

Data Science Seminar Series

This seminar series is featuring thought-leaders from research universities and the biomedical industry

Data Management at NIEHS Core Labs

The NIEHS Core Labs use a number of paper-based applications and rely heavily on human labor for data management. As part of a pilot program, ODS is implementing iRODS technology to aid in automating aspects of the Core Labs workflow. iRODS uses machine-based actional rules to automate and expedite data processes. The iRODS implementation will utilize the system’s robust permissions system as well as metadata catalogue to enable controlled, and where possible, automatic addition of file tags to the files produced through the workflow process. Searches within the metadata catalogue using the tags will enable scientific queries that are currently too labor intensive to perform as well as facilitate easy generation of budget, resource, and regulatory reports.

Environmental Health Language Collaborative

The Office of Data Science is coordinating the Environmental Health Language Collaborative. The Collaborative is a new initiative to advance community development and application of a harmonized language for describing Environmental Health Science (EHS) research. The Collaborative is part of the NIEHS effort to establish standards for EHS data and metadata that are crucial for enabling efficient data sharing, integration, and analysis of environmental data, and for advancing discovery in environmental health research. Learn more about this initiative.

to Top