The energetics of ligand protonation, deprotonation, and tautomerization can significantly affect the binding energy of a compound. In this article, Glide product manager and Director of Structure-Based Research Matt Repasky discusses how incorporating accurate predictions of such state penalties leads to an improvement in docking results.
A significant fraction of drug-like molecules are weak acids or bases with multiple low-energy ionization or tautomeric states. From this pool of candidate states, typically only a single state binds to a protein site. While the bound state is often the lowest energy state in solution, this is not always the case. Thus, when evaluating protein-ligand binding, all energetically accessible ligand ionization/tautomeric states should be considered per coumpound.
Typically, Glide's scoring function, GlideScore, is used to select the optimal state to use in rank-ordering compounds. However, it has not included an evaluation of the energy required for a ligand to attain a given ionization/tautomeric state. Consequently, selecting amongst ionization/tautomeric states of the same compound may not properly reflect the difficulty of generating a given ionization or tautomeric state. In past versions of the software Glide would sometimes select unphysical or high energy ionization or tautomeric states for a compound over more attractive states with slightly worse GlideScores.
In Glide 5.5, this issue has been addressed by incorporating ionization and tautomeric state penalties predicted by Epik, leading to large improvements in enrichment studies. Such improvements are seen across a range of targets including metalloproteins, as Epik 2.0 is able to generate ligand states best suited for interacting with metal ions.
Epik State Penalties
Epik employs a Hammet and Taft based approach to predicting energetically accessible ionization and tautomeric states. From the population of predicted ionization and tautomeric states for a compound, Epik determines the state penalty for each state by:
Epik State Penalty = -kBTln(Pcj)
Here Pcj is the population of one state that results from Epik's iterative algorithm, where at each step tautomerization and ionization are repeated until either the maximum number of iterations is achieved or the collection of structures does not change.
Epik 2.0, released as part of Schrödinger Suite 2009, can now generate ligand states that are best suited to interact with metal ions in a binding site, and predict corresponding state penalties. When enabled by selecting the "Add metal-binding states" toggle in the Epik interface, the program will also generate states optimal for interacting with metal ions in addition to the standard states. States generated specifically for metal chelation include the state penalty for the current state, generally large, and adjusted metal state penalties to be used in case of metal interaction with one or more ligand atoms. The adjusted metal state penalties reflect the reduced energy required to deprotonate chelating centers in the presence of a metal ion.
Combining Epik State Penalties with Glide
Glide users can choose to apply Epik state penalties during ligand docking jobs by selecting the "Add Epik state penalties to docking score" toggle in the Settings tab. Epik state penalties are combined with the GlideScore scoring function as follows:
DockingScore = GlideScore + Epik state penalty
Treating the combined state penalty and GlideScore as a separate scoring function, DockingScore, has several advantages. Users have the ability to work up virtual screening results based on either GlideScore or DockingScore. When the Epik State Penalties feature is active and the penalty has not been computed for a ligand, Glide reverts to the default behavior and sets DockingScore equal to the GlideScore. By default, poses returned by Glide are sorted by DockingScore in the output structure and report files, but because the terms are treated separately it is trivial to re-sort poses by GlideScore using the glide_sort utility.
When states have been generated by Epik's "Add metal-binding states" feature and there are no metal ions in the Glide enclosing box, states generated specifically for metal chelation will be skipped. This enables users to prepare a single ligand library suitable for docking into both metalloproteins and non-metalloproteins alike. When a metal ion is found in the enclosing box, if a ligand atom with a metal state penalty is found to lie within 3.0 A of a metal ion after docking, the metal state penalty from that atom is employed in computing DockingScore rather then the normal ligand state penalty. The metal state penalty accounts for the fact that the metal-binding ligand state, while high energy in solution, is much lower energy when interacting through the relevant functional group, which leads to a reduction in the magnitude of the penalty. This effect is illustrated in Figure 1 where the docked ligand has a state penalty of 2.89 kcal/mol, due to formation of the higher-energy hydroxamate state. However, the hydroxamate oxygen has a metal state penalty of only 0.072, and since that atom interacts with the zinc ion, the much smaller 0.072 state penalty is added to the GlideScore to generate the DockingScore.
Figure 1: Top-ranked pose showing active ligand 15507 interacting with a zinc ion in the binding site of a TNF-alpha converting enzyme (1ZXC). The hydroxamate group interacts with the zinc ion, causing Glide to apply a metal-binding state penalty of 0.07 kcal/mol rather than the state penalty of 2.89 kcal/mol that would otherwise have been assigned if the hydroxamate group did not interact with a metal ion.
Validation Results
Docking accuracy of Glide is unaffected since the Emodel scoring function has not been changed. Enrichment experiments were performed over 68 protein systems, using 997 unique decoy compounds and varying numbers of active compounds, both sets of which were subject to Epik state predictions. Epik state enumeration and penalty calculation was performed using the "Add metal-binding states" option enabled, with all other options at their default values. Epik expanded the number of decoy ligands to be docked to 2580 (2.58 states generated per input compound). Epik expanded known actives at a similar rate of 2.57 states per input compound, for a total of 2,571 ligands over the 68 screens. When selecting compounds for final ranking, the docked pose with the most negative GlideScore or DockingScore was kept. Otherwise, previously published enrichment calculation methodologies were employed.2
Table 1 compares the average performance of SP Glide 5.5 in enrichment experiments, using as definitions of enrichment EF'(1%), which measures the enrichment for assaying the top 1% of a ranked database, and ORD(40), which measures the average number of decoy ligands that outrank the top-ranked 40% of known actives. EF'(1%) is a measure of early enrichment, while ORD(40) is a more global enrichment metric. Enrichment improves with larger values of EF'(1%) and smaller values of ORD(40). The DockingScore experiments were performed employing Epik state penalties as described above.

For the 56 non-metalloprotein systems, rank-ordering compounds using DockingScore instead of GlideScore improves early enrichment by an average of over 50%. The median change observed was 24.8% for this dataset, indicating nearly half the systems studied found enrichment improvements of greater than 25%. Even larger improvements were observed for the twelve metalloproteins, with early enrichment improving by 85% on average and a median improvement of 45%. The distribution of changes in enrichment is shown in Figure 2. Nearly three times more systems saw early enrichment improved by more than 5% than were degraded by more than5%. Results from the ORD(40) metric are generally similar to those from early enrichment. Docking with HTVS sampling shows similar distribution of enrichment improvements, though the effect is smaller with average improvement in EF'(1%) of 24% across all systems.

What causes these large improvements in enrichment? It appears the largest effect comes from penalizing decoy ligands with good GlideScores that are in high-energy ionization or tautomeric states. Degradation of decoy DockingScores is larger then degradation in known actives, as most known actives interact in lower energy ionization or tautomeric states or are strong enough binders that they can absorb the state penalty and continue to rank well. An example is shown in Figure 3 where a decoy ligand docked into an LCK Kinase (1QPE) has two protonated tertiary amines with a state penalty of 1.29 kcal/mol. This ligand receives a GlideScore of -8.68 kcal/mol, and thus a DockingScore of -7.93 kcal/mol. Rank-ordering compounds by GlideScore, this ligand outranks all but eight known actives, but by DockingScore this ligand is outranked by forty known actives.

Figure 3: Docked pose for decoy ligand 700305 interacting with LCK kinase (1QPE). The state penalty is 1.29 for the formation of two charged tertiary amines. The amines do not interact with complementary groups on the protein.
In summary, Glide 5.5 has been modified to incorporate Epik state penalties to energetically penalize higher-energy ionization and tautomeric states. Glide 5.5 also takes advantage of improvements in Epik 2.0, allowing the creation of states specifically for binding to metalloproteins. The GlideScore and Epik State penalties are combined in the DockingScore. Rank-ordering compounds by DockingScore leads to significant improvements in enrichment compared to rank-ordering by GlideScore, for both metalloproteins and non-metalloproteins. A paper that describes the methodology in greater detail is in progress.
1. Shelley, J. C.; Cholleti, A.; Frye, L. L.; Greenwood, J. R.; Timlin, M. R.; Uchimaya, M. "Epik: a software program for pKa prediction and protonation state generation for drug-like molecules." J. Comp.-Aided Mol. Des. 2007, 21, 681-691.
2. Halgren, T. A.; Murphy, R. B.; Friesner, R. A.; Beard, H. S.; Frye, L. L.; Pollard, W. T.; Banks, J. L. "Glide: A New Approach for Rapid Accurate Docking and Scoring. 2. Enrichment Factors in Database Screening." J. Med. Chem. 2004, 47, 1750-1759.
3. Friesner, R. A.; Murphy, R. B.; Repasky, M. P.; Frye, L. L.; Greenwood, J. R.; Halgren, T. A.; Sanschagrin, P. C.; Mainz, D. T. "Extra Precision Glide: Docking and Scoring Incorporating a Model of HydrophobicEnclosure for Protein-Ligand Complexes." J. Med. Chem. 2006, 49, 6177-6196.
