
Background information
The Division of Translational Toxicology (DTT), in partnership with ICF and Evidence Prime, developed Dextr, a web-based tool to accelerate data extraction in literature reviews. Dextr uses automated approaches, including machine learning and large language models, to: 1) identify and extract entities, like the animal model or species, and 2) enable users to then review, edit, and confirm the entries. This approach balances automation with expert oversight, providing a more efficient workflow without sacrificing transparency or accuracy.
Key Advantages:
- Maintains accuracy while cutting extraction time nearly in half.
- Extracts complex concepts (e.g., multiple experiments, exposures, and doses within a single study).
- Links extracted elements within studies for richer, machine-readable annotated exports.
- Addresses unique challenges of environmental health literature through a simple user interface.
Features:
- Employs large language models, natural language processing models, and RegEx-based extraction approaches.
- Supports controlled vocabularies for structured categorization.
- Offers single-extractor and quality control (QC) validation modes.
- Extracts data from tables.
For more information or to request access to explore the tool, email Vickie R. Walker ([email protected]) .
Documents
- Walker VR, Schmitt CP, Wolfe MS, Nowak AJ, Kulesza K, Williams AR, Shin R, Cohen J, Burch D, Stout MD, Shipkowski KA, Rooney AA. 2022. Evaluation of a semi-automated data extraction tool for public health literature-based reviews: Dextr. Environ Int 159:107025. doi: 10.1016/j.envint.2021.107025. [Abstract]
- Nowak A, Kunstman P. 2018. Team EP at TAC 2018: Automating data extraction in systematic reviews of environmental agents. Paper presented at: National Institute of Standards and Technology Text Analysis Conference. Gaithersburg, MD. [Abstract]