Extra Precision (XP) Docking and Scoring: An Overview
Professor Friesner
is a founder of Schrödinger and Professor of Chemistry and Director of
the Center for Biomolecular Simulations at Columbia University. As
chairman of Schrödinger’s Scientific Advisory Board,
Professor Friesner provides strategic vision and guidance for
Schrödinger's scientific advancements. In this installment of Rich's
column, he describes advances in virtual screening methodology that
have been incorporated into the Glide docking suite.
Since its inception several years ago, many of you have been using Glide's
Extra Precision (XP) mode for enhanced docking accuracy. Glide XP is
one of Schrödinger's main research and development projects, one into
which we continue to invest major resources on an ongoing basis. It is
also a project that I spend quite a bit of time working on myself.
The first detailed write-up of the XP methodology has just appeared in the Journal of Medicinal Chemistry.
It contains a description of the XP sampling algorithms and scoring
function, with the principal emphasis on the latter. As an integral
part of the project, we have assembled large data sets from the
literature and the Protein Data Bank, which we have used to evaluate XP
performance for docking accuracy and enrichment in virtual screening. I
encourage those of you who are interested in understanding the details
of XP to consult this article, which can be found here.
The
present discussion will give an overview of the results described in
the XP publication. I will also briefly discuss some recent results
that we have obtained at Columbia, in which we validate new XP binding
affinity terms using explicit water molecular dynamics simulations. The
paper describing these results is currently in press in the Proceedings of the National Academy of Sciences, and should be available soon.
The
starting point of Glide XP scoring consists of terms that are common to
empirical scoring functions present in most docking software: an
atom-atom pair score that rewards contacts between lipophilic atoms on
the protein and ligand, a term favoring protein-ligand hydrogen bonds,
and an entropic penalty based upon the number of rotatable bonds in the
ligand. XP also imposes desolvation penalties for burial of protein or
ligand polar and charged groups.
The
desolvation penalty is assessed by adding explicit waters to promising
docked poses using a fast grid-based method, and then counting the
number of water molecules in the first and second shells of each polar
and charged group. These counts are compared with statistical averages
of water shells for analogous groups in known active compounds, and
penalties are assessed accordingly. The desolvation term plays a major
role in reducing false positives in XP virtual screening.
The
principal driving force for protein-ligand binding is the displacement
of water molecules from the protein active site. Water molecules in
hydrophobic environments have a tendency to lose orientational
flexibility and hence entropy — a classic example is water at a
hydrophobic wall, where the water molecules preserve their average
number of hydrogen bonds, but in order to do so preferentially exclude
geometries in which a hydrogen is pointing at the wall. Displacing such
waters into bulk solution, and replacing them with a ligand that is
well matched to the protein environment, thus yields a gain in free
energy, one that is relatively small per water molecule, but can add up
to a substantial value when integrated over the entire volume occupied
by the ligand.
The
lipophilic atom-atom pair term discussed above provides a heuristic
representation of this effect, and is typically parametrized based on
fitting to binding affinity data for a large number of protein-ligand
complexes. As such, the calculated value will be reflective of an
“average” protein active site environment. Similarly, the hydrogen
bonding term captures the free energy gain upon replacement of water
molecules which otherwise have to make hydrogen bonds to the protein
with ligand groups that can do so with a smaller loss of entropy.
An
empirical scoring function of this type can perform well for some
fraction of protein-ligand complexes. However, we have found that there
are environments that deviate substantially from the average, so much
so, in fact, that the standard approximations become highly inaccurate.
These environments are characterized by hydrophobic enclosure of the
ligand, in which a cluster of hydrophobic atoms on the ligand
(typically an aromatic ring, but other functional groups can also
exhibit this behavior) is “surrounded” on two sides by hydrophobic
protein groups (below). The enclosure implies that the water molecules
displaced by the ligand in this region would have particularly
unfavorable free energies, possibly due to a greater entropy loss than
is usual, or even the actual loss of a hydrogen bond (or, in the most
extreme case, dewetting of the cavity). Glide XP contains algorithms
that recognize such regions automatically and assign additional
favorable scores for binding affinity based on the geometry of the
cavity and structure of the ligand.

Hydrophobic enclosure in 1aq1.
A
particularly interesting structural motif occurs when hydrophobic
enclosure is combined with a small number of protein groups that
require hydrogen bonds. The hinge binding region in kinases is one such
example of this motif, where an aromatic ring of the ligand typically
makes 1-3 hydrogen bonds with protein backbone groups. XP recognizes
this region and assigns additional binding affinity to ligands that
form the necessary hydrogen bonds but are otherwise hydrophobic. The
idea is that water molecules that make hydrogen bonds to multiple,
closely spaced protein groups in a highly hydrophobic environment would
experience a substantially larger than usual entropy loss when
displaced by the ligand.
While
we have developed and validated XP parameters by examining the
performance of the methodology for a large number of virtual screening
data sets, the effects postulated above should be manifested in
accurate, all-atom simulations of the appropriate systems using an
explicit solvation model. In collaboration with Bruce Berne’s
group at Columbia, we have carried out such simulations for a number of
systems identified by XP as having regions of hydrophobic enclosure.
Our protocol is to remove the ligand from these systems, perform
molecular dynamics simulations, and assess the distribution of water
molecules via various statistical techniques (which can also be used to
estimate the entropy of waters in various locations).
For
the active site of the COX-2 receptor, which is highly hydrophobic, we
see dewetting of the cavity even on a short simulation timescale. For
the streptavidin/biotin complex, upon removal of biotin, 5 water
molecules form an ice-like ring which hydrogen bonds to the protein in
an enclosed region which formerly was occupied by the biotin ligand.
These water molecules have very low entropies as compared to bulk; in
effect, they have been frozen into position at room temperature. As a
result, their displacement by biotin leads to an exceptionally large
free energy of binding (~18 kcal/mol) — the largest of any complex in
the PDB, despite the small size of the ligand.
These
simulations demonstrate that Glide XP has a sound basis in the
atomistic physical chemistry of protein-ligand complexes, as well as
succeeding in explaining a wide range of empirical data. Glide XP has
already been used by many groups in the pharmaceutical and
biotechnology industry to facilitate both lead discovery and lead
optimization efforts. Ongoing improvements of both the scoring function
and sampling algorithms should make this technology even more
compelling in subsequent releases.