Accelerating Exploration of Chemical Matter: A New Paper Showcases the Synergies of the Schrödinger Drug Discovery Platform

Schrödinger recently published a paper in the Journal of Chemical Information and Modeling detailing how the Schrödinger platform blends physics-based methods with machine learning to explore significantly more chemical space in the lead optimization phase of drug discovery. The paper also showcases the practical application of a new Schrödinger application, PathFinder.

Schrödinger’s drug discovery group used these tools to design inhibitors of cyclin-dependent kinase 2 (CDK2) as a proof of concept. The team was able to explore more than 300,000 ideas and identify more than 100 ligands — among them, four unique cores — that are predicted to meet key parameters (including an IC50 <100 nM). A process that would have otherwise taken years was completed in under a week using Schrödinger’s drug discovery platform.

The bottom line: “The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns,” the paper states.

Sathesh Bhat, Ph.D., our senior director of computational chemistry, explains the paper’s key findings:

Explain what this paper means. Why is it so exciting?

It is very exciting, and here’s why: When you’re doing lead optimization on a drug discovery project, you are trying to cast a wide net — but in truth, you are not able to explore much chemical space. Think about it: There are 1060 potential molecules out there in the chemical universe. Yet over the course of a typical drug discovery project, you might be synthesizing at a maximum 2,000 to 5,000 compounds.

Our paper shows that you can use advanced computing to vastly expand that universe — to profile far more molecules, far more accurately and efficiently than many had thought possible. Our platform enables you to rapidly profile billions of compounds in the lead optimization phase without the massive cost of actually synthesizing all these ideas.

This is important because finding an efficacious molecule for some targets can be like looking for a needle in a haystack because there are many properties you are trying to optimize simultaneously. If you only pull out 1,000 pieces of hay, you’re probably not going to find it. If you look a billion times, you have a better chance.

The paper showcases Schrödinger’s new PathFinder technology. Could you explain what PathFinder does?

Pathfinder is a reaction-based enumeration tool that enables the rapid exploration of synthetically tractable ligands. The combination of retrosynthetic analysis, reaction-based enumeration, and robust filtering in an easy-to-use graphical user interface differentiates PathFinder from other available tools.

Coupled with multiparameter optimization (MPO), docking, machine learning, and cloud-based free energy perturbation (FEP) simulations, PathFinder provides a streamlined approach to rapidly create and evaluate large sets of synthetically tractable, lead-like, potent ligands that are of significant interest in drug discovery.

Does the exploration of chemical space described in this paper rely on AI?

It does, but with a very important twist.

Traditional AI, often also called machine learning, is limited. The system is only as good as the data sets you use to train it. So, imagine a hot new target comes along that looks to play an important role in driving certain cancers. It’s new, remember, so maybe in the entire scientific literature only one or two compounds that hit this target have ever been described. Existing AI programs won’t be of any help at all if you’re trying to discover compounds that might interact with this target. There’s just not enough data to train them.

Our approach, as documented in this paper, is quite different. It relies on our deep understanding of physics — specifically, the laws of classical and quantum mechanics as they apply to atomistic systems.

Because our platform is able to generate de novo compound designs and then assess them using first-principle physics-based methods — even if the available scientific literature and ligand datasets around the target in question are sparse — we can prosecute projects that traditional AI companies would likely not be able to prosecute.

Please run us through the numbers to explain how this works in a very concrete way.

Imagine that we generate a million ideas for compounds that could hit a particular target through our platform’s de novo design capabilities. We take a random cut to select 1,000 molecules from that initial million. We then run our physics-based FEP+ (free energy perturbation) program to evaluate the potency of the molecules against the target. Now we have a strong idea which of those 1,000 are most feasible as potential therapeutic compounds. We then feed the data on those 1,000 compounds into our machine learning algorithms. In effect, those virtual compounds become the AI training set. We have now trained our AI program on data that’s highly relevant to our target of interest.

Next, we ask the machine learning algorithm to take another cut at the initial 1 million compounds it generated. Having learned from the virtual, physics-based data set we just fed it, the program is now equipped to take a far more critical look at those molecules. It tells us which are the most plausible therapeutic compounds. We can then take that set, run FEP+ and rank-order them, feed that data back into the AI neural network and run the whole cycle again if we’d like. This process is known as active learning or reinforcement learning, and is currently one of the cutting-edge approaches in the ML field.

Importantly, we also stop at certain stages to synthesize the compounds in a lab and run them through experimental assays so we can confirm our virtual predictions against actual experimental data. That’s a crucial step, which gives us added confidence in our modeling.

How does this new process help you design compounds with optimal molecular properties?

That’s a great question.

The human brain finds it incredibly difficult to optimize for multiple endpoints at once. If I sit you down and tell you to design a compound that’s this particular molecular weight, has this polar surface area, has to fit in a specific pocket in the target, can’t look like any of its competitors, and on and on with dozens of other parameters, you would have a very difficult time generating a large number of ideas. When we’re faced with a long list like that, we typically pick the two or three parameters we think are most important, focus on those and forget the rest. Unfortunately, the parameters that are ignored can often come back to bite us later in the drug discovery process.

The computer, however, will try to optimize all the parameters at once, which is a huge advantage.

It’s also important to note that our algorithms are synthetically aware, meaning they have been programmed with crucial information about chemical synthesis. They use that as a filter, ensuring that the molecules they rank highly are, in fact, possible to synthesize without undue struggle.

This publication set a record, right?

It did, and we’re very proud of it. Previously, no group of authors had run more than several hundred FEP calculations for a publication. We far exceeded that by running more than 5,000 FEP calculations for this paper.

Finally, which Schrödinger programs did you use in this paper?

We used Glide, FEP+, PathFinder, AutoQSAR, LigPrep, and more – it showcases a major component of our platform.

Any final thoughts?

I started with this thought and I’ll end with it as well: Drug discovery teams have always faced a great many practical restrictions on the amount of chemical space they could explore in search of a potent new therapeutic compound. We’ve broken that constraint — and in this publication, we showed how we did it and why it matters. Ultimately, the end-goal is that by searching more chemical space, we can arrive at better and safer drugs for patients.

Back To Top