Applications to Investigate Adverse Effects of Chemicals on Human Health and Environment Workshop

Session 1: Chemicals and Similarity: Structures and Bioactivity

Molecular Similarity and Activity Cliffs

José Medina-Franco, Ph.D., F.R.S.C., Universidad Nacional Autónoma de México

We discussed the basis for quantifying structure similarity and its applications in studying structure-property relationships. We discussed the concept of activity and property cliffs, showing specific examples where subtle structural changes are associated with significant and unexpected changes in biological activity.

Similarity Between Bioactive Molecules From a Structural Bioinformatics Perspective

Esther Kellenberger, Ph.D, Université de Strasbourg

This presentation explored the concept of similarity and focus on bioactivity with examples from drug design. It also introduced the concept of endpoint specific similarity and activity cliffs.

Molecular Descriptors and Fingerprints as a Way to Represent Molecular Structures

Andrea Mauri, Ph.D., Alvascience

This presentation explored how the molecular structure can be represented in different ways. A common representation is made using molecular descriptors and fingerprints. Molecular descriptors and fingerprints can be used to encode specific and general information about the molecular structure. An overview of different molecular representations was provided, focusing on how the molecular structure is transformed into a numerical representation that can be used with machine-learning techniques for similarity-based tasks.

Session 2: Supervised and Unsupervised Approaches

Molecular Similarity – in the Eye of the Beholder, and Why Tailoring It to a Problem Is Necessary to Make It Meaningful

Andreas Bender, Ph.D., University of Cambridge

Molecular similarity can be defined in any arbitrary way, and a definition which is detached from a concrete problem statement is usually less useful in practice, and often outright meaningless. In this contribution we reviewed ways to represent molecules in the computer, and how representations detached from a concrete objective (unsupervised approaches) differ from representations which are linked to a concrete objective (supervised approaches).

Univariate vs. Multivariate for Clustering and Classification

Davide Ballabio, Ph.D, Università degli Studi di Milano-Bicocca

In this presentation, we saw how similarity between pairs of chemicals can support subsequent supervised classification and unsupervised clustering. Differences between clustering and classification will be highlighted, as well as the benefits of multivariate representation of chemical structures with respect to univariate ones when dealing with chemical modelling.

Supervised Vs. Unsupervised Approaches: Why One Typically Needs Both Anyway

Denis Fourches, Ph.D, Oerth Bio

The choice of a particular machine-learning approach for analyzing a given dataset is often not a one-way street. In this talk, after a brief summary of some of the most commonly used supervised and unsupervised learning techniques, I discussed several examples to illustrate the synergies between different techniques and why running multiple approaches in parallel can lead to more interpretable and reliable modeling results.

Session 3: How and When to Apply the Different Approaches

Similia Similibus Toxicus?

Alexander Tropsha, Ph.D., University of North Carolina at Chapel Hill

A popular Latin expression, “Similia similibus solvuntur,” implies that similar chemicals have similar solubilities. Can we have the same expectations concerning chemical toxicity — i.e., if a chemical is toxic, shall we expect that a similar chemical is also toxic? In fact, this expectation is behind a popular “Read-Across” approach to assessing chemical toxicity, which represents an example of unsupervised clustering and enables transparent interpretation. On the other hand, arguments based on machine-learning approaches using holistic chemical descriptors address the same issue of toxicity prediction using supervised learning methods. I compared and contrasted these approaches, as well as talked about possible means of their integration to achieve high accuracy of prediction while maintaining transparency of model interpretation.

Analogues Selection for Read-across: Combination of Supervised and Unsupervised Selection Methods

Alessandra Roncaglioni, Ph.D., Istituto Mario Negri

This presentation focused on some practical examples, the integration of QSAR and read-across, and the influence due to the distribution of similar compounds in the chemical space.

Best Practices on How to Select, Apply, and Interpret Supervised and Unsupervised Clustering and Classification Approaches

Todd Martin, Ph.D., U.S. Environmental Protection Agency

Many researchers build models based on unsupervised learning using random forest or nearest-neighbor approaches in which the set of molecular descriptors is not optimized for the data set of interest. In supervised feature selection, an iterative approach is utilized to find a small subset of the descriptors which maximize the performance of a QSAR method during internal cross validation. The advantage of supervised feature selection is that the selected descriptors are more relevant to the property or toxicity endpoint being modeled so that external prediction performance is theoretically increased. Another advantage is that supervised feature selection can reduce the set of descriptors to a small enough number that the biological/physicochemical significance of these descriptors can be readily described in a QMRF (QSAR Model Reporting Format) document. This talk outlined best practices for supervised and unsupervised learning of toxicity and physicochemical property data sets. The effect of supervised versus unsupervised feature selection on external prediction performance will be evaluated for several different QSAR methods (e.g., nearest analog, k-nearest neighbors, random forest, support vector machines).

Session 4: Applications and Examples

Chemical Similarity Under the EU REACH Regulation

Andrea Gissi, Ph.D., European Chemicals Agency

The presentation described the use of chemical similarity for different purposes under the EU REACH regulation. A combination of structural and mechanistic similarity is crucial for acceptable read-across results submitted by REACH registrants, while the calculation of structural similarity among substances with complex compositions is the key to ECHA’s grouping of substances for screening and prioritization purposes. The presentation also included a description of the relevant functionalities of the OECD QSAR Toolbox, a computer program for grouping substances, which is developed and made freely available by ECHA and OECD.

Practical Perspectives on the Development, Evaluation, and Application of in Silico Nams for Predicting Toxicity

Grace Patlewicz, Ph.D., U.S. Environmental Protection Agency

Categorisation and read-across approaches continue to play a significant role in informing an array of different research and policy needs within the EPA. This presentation touched upon a handful of projects where grouping approaches have been applied including 1) the structural categorisation approach developed for a PFAS inventory that informed work under the EPA’s National Testing Strategy; 2) new features and functionalities to enhance the Generalised Read-Across (GenRA) approach and the associated webapp; 3) development of a sustainable re-implementation of Analog Identification Methodology (AIM) features that can be used in other modelling pipelines; 4) revisiting the applicability of the Thresholds of Toxicological Concern (TTC) for other routes of entry or other chemistries.

Class-based Approaches to Evaluating Exposure, Hazard, and Risk: The Case for Taking Action on Organohalogen Flame Retardant Subclasses Using the NAS Roadmap

Andrew Rooney, Ph.D., and Suril Mehta, Dr.P.H., M.P.H., NIEHS, and Charles Bevington, M.P.H, NIEHSU.S. Consumer Product Safety Commission (CPSC)

This joint NIEHS Division of Translational Toxicology (DTT) and CPSC presentation explored class-based approaches (originally outlined in the recently published NAS report) to assessing OFRs, including scoping and evidence mapping to identify and organize existing OFR exposure, mechanistic, and toxicity studies to support research and analysis decisions; investigating and prioritizing OFR carcinogenicity data to conduct cancer hazard evaluations; and exposure, risk, and regulatory considerations of OFR subclasses, building on NAS and OECD approaches.