Data Management and Sharing Plan Development

clipboard with checkmarks and lines

NIEHS Data Management and Sharing Plan Checklist

A Data Management and Sharing Plan is a plan describing the data management, preservation, and sharing of scientific data and accompanying metadata. NIH has developed guidance for recommended elements of a Data Management and Sharing Plan. The plan should describe in two pages or less the proposed approach to managing, preserving, and sharing the scientific data and accompanying metadata to be generated through the grant proposal. NIH has developed an optional DMS Plan format page that aligns with the recommended elements of a DMS Plan. A preview of this format page is available now, with a final fillable format version available by Fall 2022.

NIEHS encourages data management and sharing practices to be consistent with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles. The NIEHS Scientific Data Resources webpage lists resources relevant to the development of Data Management and Sharing Plans.

The plan should address each of the following elements:

Element 1: Data Type
Description of the scientific data to be generated and shared throughout the grant.

NIH defines Scientific Data as the recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications. Scientific data do not include laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens.

Metadata are data that provide additional information intended to make scientific data interpretable and reusable (e.g., date, independent sample and variable construction and description, methodology, data provenance, data transformations, any intermediate or descriptive observational variables).

  • In general terms, describe the types and amount of scientific data to be generated and/or used in the research project (e.g., RNA-seq, targeted LC-MS, and epidemiological survey data of research participants). Descriptions should indicate the data type, level of aggregation (e.g., individual, summarized), and/or the degree of data processing that will occur.
  • Describe which scientific data from the project will be preserved and shared and provide the rationale for the decision. Researchers should decide which data to preserve, and share based on NIH goals to maximize data sharing, but accounting for ethical, legal, and technical factors that may impede sharing.
  • Briefly describe the metadata and any other documentation (e.g., study protocols, data collection instruments, data dictionaries) that will be made accessible to allow interpretation of the scientific data.

Element 2: Related Tools, Software and/or Code
Information on related tools, software, and/or code.

  • Indicate whether specialized tools or software are needed to access, manipulate, or reuse shared scientific data. If applicable, list the name(s) of needed tools/software and specify how the tools can be accessed (open source and freely available, generally available for a fee in marketplace, available only from the research team).
  • Indicate how new software developed under the grant will be shared. If new software will be created during the project to collect, process, or analyze data, specify how this software will be made available.

Element 3: Standards
Description of standards to be applied to the scientific data and associated metadata.

Data standards are documented agreements on representation, format, definition, structuring, tagging, transmission, manipulation, use and management of data. Resources such as FAIRsharing and The Digital Curation Centre provide information on available data and metadata standards. The use of Common Data Elements (see NIH Common Data Elements (CDE) Repository), standard data collection tools (see NIH Disaster Research Response (DR2) Resources Portal, PhenX Toolkit (consensus measures for Phenotypes and eXposures)), and existing ontologies (see Environmental Health Language Collaborative) are highly encouraged. See the EHS Ontology Resource Catalog for a compilation of organizations, ontologies/terminologies, and tools useful to harmonizing environmental health research.

NIH has provided additional information to assist in selecting suitable Data Repositories for NIH-funded research (NOT-OD-21-016). Primary consideration should be given to discipline or data-type specific repositories. If no appropriate discipline or data-type specific repository is available, researchers should consider other options, including generalist repositories, institutional repositories, or cloud-based data repositories.

  • Indicate data and metadata standards to be applied in the research project. Standards may include data formats, consensus measures, common vocabularies, and other documentation. While many scientific fields have developed and adopted common data standards, others have not. In such cases, the plan may indicate that no consensus data standards exist. While NIEHS does not generally require specific standards, we are seeking to increase adoption of standards that facilitate data integration and harmonization. You are encouraged to contact NIEHS if you would like help in determining if standards exist and which standards are appropriate.

Element 4: Data Preservation, Access, and Associated Timelines
Plans for data preservation.

  • Provide names of the repositories where scientific data and metadata arising from the project will be archived. NIEHS encourages the use of established data repositories that meet desirable characteristics. You are encouraged to contact NIEHS if you are unable to identify a suitable data repository or have questions on selection of a repository.
  • Describe how data will be findable and identifiable. Indicate how persistent identifiers (PIDs), such as Digital Object Identifiers (DOIs), Open Researcher and Contributor (ORCID) IDs, and Research Organization Registry (ROR) IDs, will be assigned to identify data, people, organizations, or other entities. Indicate whether the data and/or metadata will be indexed in a searchable resource. If use of PIDs is not possible, indicate why they cannot be used.
  • Describe the anticipated timeframes for preserving and sharing scientific data. Specify when the scientific data will be made available to other users (i.e., no later than time of an associated publication or end of the performance period, whichever comes first) and for how long the scientific data will be made available. NIEHS encourages researchers to share scientific data as soon as possible and to make scientific data available for as long as they anticipate it being useful for the larger research community, institutions, and/or the broader public.

Element 5: Access, Distribution, or Reuse Considerations
Description of factors affecting access, distribution, or reuse of scientific data.

  • Describe with whom the data will be shared and under what conditions. Indicate if access to scientific data will be controlled (i.e., made available only after approval and the mechanism for data access request). NIEHS expects that researchers maximize the appropriate sharing of scientific data generated, consistent with privacy, security, informed consent, and proprietary issues.
  • If applicable, provide a rationale for why access, distribution, or reuse of data will be restricted. In cases where data access is controlled, there is still considerable value to the community to freely access summary and aggregate data. Indicate if access to summary and aggregate data will be restricted.
  • Describe any applicable factors affecting access, distribution, or reuse of scientific data. Include information related to:
    • Informed consent (e.g., disease-specific limitations, particular communities’ concerns.
    • Privacy and confidentiality protections (i.e., de-identification, Certificates of Confidentiality, and other protective measures) consistent with applicable federal, Tribal, state, and local laws, regulations, and policies.
    • Restrictions imposed by federal, Tribal or state laws, regulations, or policies or existing or anticipated agreements (e.g., with third party funders, with partners, with Health Insurance Portability and accountability Act (HIPAA) covered entities that provide Protected Health Information under a data use agreement, through licensing limitations attached materials needed to conduct the research or any other consideration which may limit the extent of data sharing.
    • Data sharing agreements, licenses, and/or any other considerations that may limit the extent of data sharing or reuse.

To get familiar with how and when NIH expects data to be shared and learn how to safeguard the privacy of human participants while sharing scientific data visit Data Sharing Approaches.

Investigators may request funds toward data management and sharing in the budget and budget justification sections of their applications. To learn more, visit the NIH webpage on Budgeting for Data Management and Sharing.

Element 6: Oversight of Data Management and Sharing
Plans for oversight of data management and sharing.

  • Identify the individual(s) (e.g., titles, roles) who will be responsible for executing the various components of data management over the course of the research program.
  • Describe how compliance with the data management and sharing plan will be monitored and managed.
