Many physics-based, structure-based drug design (SBDD) methods, such as free energy perturbation (FEP+)1, require accurate, atomic-level detail of the target protein in complex with a member of the ligand series being modeled to perform optimally. Consequently, the domain of applicability of SBDD is limited by the availability of high-resolution crystal structures of protein-ligand complexes. However, highly similar structures may be available even when the exact protein-ligand complex structure is not. IFD-MD is used to bridge this gap. It takes the available structure and predicts atomic details of the protein-ligand complex structure needed for SBDD. It is capable of accurately predicting the binding pose of a ligand starting from a structure of the target protein with a very different bound ligand. In combination with Prime Homology Modeling, it can even predict a target protein-ligand complex structure without any X-ray structures of the target protein, using a structure of a highly homologous protein as a starting point.

Often the critical differences in protein conformation between the available structure and the one with the ligand bound are movements of a small number of binding site side chains or a small loop motion. These small changes can have a large impact on the accessible ligand binding modes and the interplay of protein, ligand, and water structures. IFD-MD uses a combination of docking algorithms, water thermodynamics, empirical scoring functions, implicit solvent force field energies, and explicit solvent metadynamics trajectories to explore the motions of the target protein and simultaneously determine their relative energies. This technology allows teams to create accurate “first looks” at the protein ligand interactions for novel active compounds before a crystal structure is solved and even allows for accurate predictions of protein-ligand complex structures when starting from homology models.

Predicting Ligand Binding Modes for Novel Chemical Matter

Structure-based hit-to-lead and lead optimization efforts rely on accurate structural models of the ligand binding mode and the surrounding protein environment. For hits originating from virtual screens, the pose determined during screening typically does not include potential induced-fit binding effects, where the protein conformationally rearranges to accommodate the ligand, often resulting in reduced accuracy. For hits derived from most high-throughput experimental screens, not even this limited pose information is available. Often times the only structural data available at this stage will be structures of the protein solved with a natural product or a ligand with an entirely different core. Until a structure can be solved, a process that can take weeks, drug discovery teams will have to proceed without being able to use many of the structure-based tools that can accelerate their efforts.

IFD-MD fills in this gap by generating highly accurate binding poses of novel ligands when the only structures available for the protein of interest have it bound to a ligand different from the hit. This procedure, simultaneously predicts differences in protein conformation when a novel ligand binds and how those differences likely impact the binding pose of the novel ligand. As changes in protein conformation are often, but not always, limited to small loop rearrangements and side-chain motions, IFD-MD concentrates on predicting the effect of such protein conformational changes. Its unparalleled accuracy allows it to predict induced fit effected binding poses with far more confidence, allowing full structure-based design enablement in hit-to-lead soon after conformation of a hit.

Figure 1. IFD-MD Results for 415 retrospective cross-docking experiments taken from publically available structures compared to other docking methods (left) and broken up by protein class (right). In each experiment, the binding pose for one ligand was predicted started with a holo structure of the target protein bound to a different ligand. In over 90% of cases, a pose within 2.5Å ligand heavy-atom RMSD of experimentally determined binding pose was within the top 2 poses predicted by IFD-MD and in over 80% of those cases, the top-scoring pose was within 2.5Å heavy-atom ligand RMSD of the experimentally determined binding pose. For comparison, IFD2006 can identify a pose under 2.5Å RMSD in the top 2 in 60% of these cases and identify such as pose as the top-ranked pose in 50% of these cases.

For example, take the prediction of a binding mode for a P1 heterocycle-Aryl based thrombin inhibitor. A crystal structure of this inhibitor was reported in 2004 (PDBID 1SL3), but we will predict the binding mode of this inhibitor using the structure of thrombin bound to a D-Phe-Pro-Arg-Type thrombin inhibitor solved in 2003 ( PDBID 1NZQ). Even though these ligands are very different, they do share some protein-ligand interactions. For example, both the P1 heterocycle-Aryl based inhibitor structure used as a template and the D-Phe-Pro-Arg-Type inhibitor whose binding pose is predicted have backbone hydrogen bond interactions with Ser214 and Gly216, even though very different chemical groups are making these interactions. Not only can IFD-MD predict that these hydrogen bonds, and not others that the P1 heterocycle-Aryl based inhibitor is making with thrombin, are key interactions that are conserved between the binding modes of two highly disparate ligands, it can predict which chemical groups on the D-Phe-Pro-Arg-Type are making these key interactions with thrombin. In addition to these key conserved interactions, IFD-MD can also predict several interactions unique to the D-Phe-Pro-Arg-Type inhibitor as well as predict proper vectors for a solubilizing group in the inhibitor. The resulting pose has a ligand heavy atom RMSD of 1.4Å and successfully predicts all of the key contacts and vectors for the D-Phe-Pro-Arg-Type inhibitor from a P1 heterocycle-Aryl inhibitor structure. This technology would have allowed a structure-based drug discovery on the D-Phe-Pro-Arg-Type inhibitor to proceed before a complex structure of any ligand in that class was solved.

Figure 2. IFD-MD Results for the prediction of the binding pose for a novel inhibitor for thrombin using the structure of a highly disparate thrombin inhibitor as a starting point. IFD-MD can predict key interactions necessary for effective structure-based drug design before.

Predicting Ligand Binding Poses using Homology Modeling and Retrospective Affinity Data

FEP+ makes it possible for computational chemists to accurately predict the relative binding affinities of congeneric ligands using a combination of accurate physics and an accurate structure of the protein-ligand complex. However when it is combined with Prime Homology Modeling and IFD-MD it can also do the reverse and validate a predicted structure of the protein-ligand complex using the relative binding affinities of a series of congeneric ligands. This is valuable when assessing a set of potential protein-ligand poses and relies on the accuracy of FEP+ to recognize when the physics of protein-ligand binding aren’t correct, leading to poor reproduction of known binding affinities. A homology model that reproduces the binding affinities of known ligands is likely to be accurate and can be used for a range of structure-based drug design methods, including using FEP+ to prospectively predict the binding affinities of novel compounds. This allows SBDD methods to be deployed in cases where no crystal structures of the desired protein are available with any ligand bound.

Figure 3. Flow chart for the prediction of ligand binding poses using homology models and retrospective affinity data using Schrodinger tools (upper) and an example of this protocol (lower). In the example the homology model of TYK2 complexed with drug-like ligand was created using a structure of the homologous protein JAK3 bound to a highly disparate ligand as a starting point. Using Prime Homology Modeling coupled with IFD-MD, two potential homology models are built, model #1 and model #2. FEP+ calculations were then run with both of these potential homology model to determine if either one could retrospectively predict the binding affinities of 14 ligands congeneric to the ligand in the homology model. Potential model #1, but not model #2, produce excellent agreement between the predicted and experimental binding affinities resulting in an FEP+ validated homology model which can be used to drive subsequent design around the TYK2 ligand using FEP+ or other structure-based drug design methodologies.

As an example, seven proteins for which there is publically available affinity data for congeneric ligand series, structures of the protein bound to a member of that series and structures available for proteins with around 30, 40 or 50% sequence identity to the protein of interest were identified. Nineteen homology models were built using a randomly selected template for each protein at each of the sequence identity cutoffs, except in cases where there was not a publically available template near that sequence identity cutoff. Prime Homology Modeling was used to place the backbone and IFD-MD was used to place the ligand and the surrounding sidechains. Each model was evaluated for its ability to accurately predict retrospective affinities of a small series of compounds around the ligand of interest using FEP+. The first homology model which could accurately predict the affinities of the retrospective series was selected and compared to the crystal structure. Using templates with 50% sequence identity, all 5 of the proteins which had such a template returned a homology model that was predictive for the retrospective ligand affinity set and all 5 had ligand RMSD under 2.5Å. Using 40% sequence identity templates, predictive homology models were only found for 4 of the 6 proteins, but all 4 of these had a ligand RMSD under 2.5Å. Using 30% sequence identity templates predictive homology models were only found for 3 of the 7 proteins, but all 3 of these had ligand RMSDs under 2.5Å. While this method doesn’t always produce a predictive homology model, when it does produce one it is predictive because it has the correct ligand binding pose and is therefore likely to be useful for subsequent design work.

Figure 4. Results when using Prime Homology Modeling, IFD-MD and FEP+ to predict the binding pose of a ligand given the structure of a homologous template protein and the binding affinities of a congeneric series around the ligand of interest. For each example, a series of 10 homology models were built from a template around 50%, 40% or 30% sequence identity. Prime Homology Modeling was used to place the backbone and IFD-MD was used to place the ligand and the surrounding sidechains. Each model was evaluated for its ability to accurately predict a retrospective affinities of a small series of compounds around the ligand of interest. The first homology model which could accurately predict the affinities of the retrospective series was selected. The ligand heavy atom RMSD between that selected homology model and the crystal structure is then reported. In cases which are reported as “No Template”, there was no publically available template structure around that sequence identity so no homology was attempted. In cases reported as “no prediction”, none of the top 10 poses was able to accurately predict the retrospective affinity data so no homology model would be returned by this protocol. In the cases where there was a template and there was a homology model that could correctly predict the retrospective affinity, all had ligand RMSDs below 2.5Å.


Schrödinger’s IFD-MD provides unprecedented accuracy when predicting the binding modes of novel chemical matter. This accuracy allows structure-based drug design (SBDD) techniques to be confidently deployed even when the only structures available of the target protein are with ligands different from the ligand series being designed. It further enables SBDD techniques to be deployed in some cases where the only structure available of the target protein is a homology model if the homology model is from a protein with enough similarity to the target protein, typically greater than 40% sequence identity. This dramatically expands the applicability domain of SBDD by enabling it to be deployed throughout a structure-enabled design effort, even before the enabling crystal structures are available, and even allows some ligand design efforts without access to high-quality protein-ligand complex structures to benefit from structure-based design techniques.

Interested in learning more about this topic or other Schrödinger solutions?



  1. Wang, L. et al. Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. J. Am. Chem. Soc. 2015, 137, 7, 2695-2703. DOI: 10.1021/ja512751q
  2. Miller, E. et al. A Reliable and Accurate Solution to the Induced Fit Docking Problem for Protein-Ligand Binding ChemRxiv. 2020, Preprint, DOI: 10.26434/chemrxiv.11983845
Back To Top