Shape Screening
Virtually screen billion compound libraries quickly with 3D shape-based similarity
Screen Purchasable or Synthesizable Compounds Libraries using GPU-Accelerated Shape-Based Alignment
3D-based shape screening approaches have demonstrated ability to identify hits in virtual screening and are commonly employed to process screening libraries of tens of millions of compounds. Recent studies have demonstrated that increasing the size of screened chemical space from 10s to 100s of millions of compounds in 3D-based virtual screening produces tighter binding hits with greater scaffold diversity for hit-to-lead development in drug discovery programs. Screenable chemical space has recently grown in excess of 1011 - 1020 through tools like Schrödinger's Pathfinder and reaction-based enumeration like approaches employed to create Enamine REAL and other similar libraries. At this scale it is intractable technically and financially to screen with traditional 3D-based virtual screening methods such as pharmacophore modeling and docking. Even CPU-based 3D shape screening becomes costly to run at this scale.
Sitting at the intersection of the very large screening libraries and significant speedups from running algorithms on GPU trend, Schrödinger’s GPU accelerated shape screening algorithm (GPU Shape) addresses the desire to 3D screen very large virtual libraries by leveraging the speed and efficiency improvements GPUs offer over CPUs without the need for high-cost, high-memory GPU servers. In contrast to a recent docking-based virtual screen of 138 million compounds that required 5 years of CPU time to run, a recent GPU Shape experiment screened a version of the Enamine REAL database with 1.85 billion compound states against the 100 diverse targets from DUD-E. Screening required about 90 hours on a single GPU per target. As the calculations are highly distributable in a GPU cluster, when maximally distributed across 376 GPUs the 1.85 billion compound state screening requires about 14 minutes. Most importantly this speed up is achieved with strong early enrichment and chemical diversity in high 3D-similarity compounds.
Table 1. Number of targets that recovered less than 25%, 25-50%, 50-75% and more than 75% active test set molecules in the screening of Enamine REAL library. Top 0.001% and 0.01% corresponds to top ranked 3221 and 32209 compounds.
Figure 1. The Bemis-Murcko scaffold/compound ratio of the top 100, 500 and 1000 hits. As reference, the ratio for the DUD-E data set is 0.25.
The Advantage of Shape Screening
The goal of shape-based screening is simple and straightforward: Given the structure and shape of a compound known to bind to a target, shape-based screens will identify new compounds with shapes (and, if desired, other properties) that are similar to the known binder. The approach is consistent with physical chemistry intuition: a receptor "sees" the shape and electrostatic properties of a molecule that binds to it, so if a new compound matches the shape and electrostatic properties of a known binder then it is likely to bind as well.
Shape Screening is an effective tool for lead optimization studies, where rapid flexible superposition of multiple similar molecules is essential to understanding SAR. Shape Screening is also ideally suited for use in the early stages of lead discovery. Shape Screening does not require a target crystal structure or well-developed SAR sets that might be necessary to create a reliable pharmacophore model. Only a single known active query compound is needed.
Shape Screening can run in shape-only mode, or it can incorporate atom-type similarity when aligning and scoring. Shape Screening also includes a unique mode that describes each structure as a collection of pharmacophore features rather than individual atoms. This pharmacophore-based mode produces the highest database enrichments.
Speed and performance:
Shape Screening can screen approximately 600 conformers per second and has been shown to outperform other shape-based methods in virtual screening enrichment studies for a wide range of targets.1
GPU accelerated:
Shape Screening can be run on CPU or GPU without the need for a dedicated workstation or a preloaded server storing all compounds in expensive RAM. GPU and CPU versions give results that are identical in > 99.9% of ligand comparisons.
A novel method for aligning compounds:
Shape Screening uses pairwise atom distance distributions to identify atom triplets that afford rapid trial alignments between the query compound and the structures being screened. The best trial alignments are subjected to a refinement step that improves the overall superposition and maximizes shape similarity.
Intuitive overlays:
A benefit of Shape Screening’s alignment algorithm is that common scaffolds will in most cases be neatly overlaid (see image above) – as one would expect in a series of structurally similar lead compounds.
Rapid determination of shape similarity:
Shape Screening uses an empirically verified model of shape similarity, wherein molecular volumes are approximated using rapidly calculated sums of pairwise atomic overlaps.
Efficient generation of bioactive conformers:
Conformer generation is a necessary component of any shape-based screening algorithm. Shape Screening relies on the well-validated program ConfGen.2
Superior enrichments:
As a result of its unique capacity to align pharmacophore features, Shape Screening outperformed competing shape-based methods in virtual database screens involving 11 diverse targets and 25,000 decoys.1 Shape Screening yielded an average enrichment factor in the top 1% (EF(1%)) of 33.2, compared to 25.6 and 23.5 for ROCS-Color and SQW, respectively.
Fully prepared databases of purchasable compounds from Enamine, Enamine REAL, MilliporeSigma, MolPort, Mcule, Mcule ULTIMATE, and Wuxi LabNetwork:
Schrödinger has partnered with Enamine, MilliporeSigma, MolPort, Wuxi and Mcule to provide GPU Shape screening bin files for compounds available from the Enamine REAL, Wuxi LabNetwork virtual library and Mcule ULTIMATE databases, as well as Enamine's "Stock Screening Compounds Collection" and "Enamine REAL Database", MilliporeSigma's "Aldrich Market Select", Mcule's "Screening Collection Phase Database" and "Ultimate Database", MolPort's "Screening Compound Database", and WuXi's "Shape Screening Database", respectively.
___________
1Sastry, G.M.; Dixon, S.L.; Sherman, W., "Rapid Shape-Based Ligand Alignment and Virtual Screening Method Based on Atom/Feature-Pair Similarities and Volume Overlap Scoring," J. Chem. Inf. Model., 2011, 51, 2455-2466
2Watts, K.S.; Dalal, P.; Murphy, R.B.; Sherman, W.; Friesner, R.A.; Shelley, J.C.; "ConfGen: A Conformational Search Method for Efficient Generation of Bioactive Conformers," J.Chem. Inf. Model., 2010, 50, 534-546.
Citations and Acknowledgements
Schrödinger Release 2023-1: Phase, Schrödinger, LLC, New York, NY, 2021.
ö Sastry, G.M.; Dixon, S.L.; Sherman, W., "Rapid Shape-Based Ligand Alignment and Virtual Screening Method Based on Atom/Feature-Pair Similarities and Volume Overlap Scoring," J. Chem. Inf. Model., 2011, 51, 2455-2466