Binding Site Identification, Characterization, and Druggability Assessment with SiteMap
SiteMap is Schrödinger's program for identifying and characterizing ligand binding sites. In this article, Dr. Tom Halgren and Dr. Woody Sherman discuss the importance of target selection and characterization, the implementation of SiteMap, and results of validation tests that demonstrate the accuracy of SiteMap across a variety of targets
The ability to identify and
characterize binding sites is a key step in structure-based drug
design. In the case of a new project with a novel target the primary
binding site may not be known. Furthermore, it is often desirable to
find new sites on existing targets with the hope of gaining affinity,
specificity, or extending the intellectual property space through a
different mode of binding. On the other hand, in a discovery project
where the binding site is known it is still important to make
predictions about the potential druggability of the binding site. It is
estimated that 60% of small-molecule drug-discovery projects fail
because the target is found to not be druggable1. Moreover, it is estimated that only 10% of the proteins encoded in the human genome are druggable by oral small molecules2.
While a number of academic and commercial tools exist to help
researchers investigate these areas, there is still a need for an
intuitive, easy-to-use, and accurate physics-based tool for binding
site identification and druggability assessment. To address this, we
have developed SiteMap3, which can identify binding sites and predict target druggability. SiteMap
provides valuable information in the form of graphical maps of the
binding site and calculated properties. Together this information
brings new insight into the target system and fosters collaboration
between colleagues in structure-based drug design.
SiteMap expands upon a previously published procedure for characterizing binding sites4. In this capacity, it operates in a manner similar to Goodford’s GRID algorithm5
but employs a unique definition of hydrophobicity that is constructed
by adding an oppositely signed (positive) “electric-field penalty” term
to the vdW term:
Grid_phobic = vdW_energy - 0.30 * oriented-dipole_energy
Thus, according to SiteMap, hydrophobic regions are defined as regions in which "something" would like to be (as evidenced by a favorable vdW term), but water would not (as indicated by the lack of an appreciable electric field).
A SiteMap calculation has three stages. In the first, site points that are outside the protein, are reasonably enclosed, and have a vdW interaction potential above a defined threshold are grouped into sets to define the ‘sites'. Next, contour maps that express the character of the sites are prepared. Finally, sites produced in the first stage and energetic properties computed in the second stage are used to evaluate a final score. These properties include:
- size of the site (number of site points);
- amount of exposure to solvent;
- degree of enclosure by the protein;
- average grid contact strength with the protein;
- hydrophilicity
- hydrophobicity;
- and ratio of hydrogen bond donor to acceptor regions.
Based on a subset of the above properties, the SiteScore is calculated as:
SiteScore = 0.0733 n½ + 0.6688 e - 0.20 p
where n is the number of site points (up to 100), e is the enclosure score, and p is the hydrophilic score which is capped at 1.0 (the average for submicromolar sites examined) to limit the impact of hydrophilicity in charged and highly polar sites. The SiteScore is normalized such that the average across all submicromolar sites is approximately 1.0 (more than 500 proteins were considered).
In a large-scale study of 538 proteins taken from the PDBbind database6, SiteMap correctly identified the known binding site as the top-ranked site in 86% of the cases (see Table 1).
Moreover, SiteMap performed significantly better at binding site
identification with known high-affinity ligands (> 98% accuracy for
sites that bind ligands with subnanomolar affinity).
Table 1. Percent (%) success of SiteMap in locating co-crystallized sites in 538 proteins
|
Comparison
|
All sites |
< 1 nM |
| Best-scoring site is correct |
85.9 |
98.5 |
| Largest site is correct |
78.1 |
89.6 |
A site is considered correct if at least one atom of the co-crystallized ligand lies within 4 Å of the centroid of the site points.
In addition to the score employed for binding-site identification, SiteMap can be used to accurately classify the druggability of proteins as measured by their ability to tightly bind passively absorbed small molecules. This score, Dscore, uses the same properties as the SiteScore but with different coefficients:
Dscore = 0.094 n½ + 0.60 e - 0.324 p
The hydrophilic score is not
capped in this case, ensuring that hydrophobicity plays a larger role
in assessing druggability than in identifying binding sites. Table 2 shows the average SiteMap properties computed for a set of 63 targets (Cheng druggability set7)
representing 27 proteins, 22 of which had marketed drugs or
advanced-stage drug candidates as of November 2005. Among other
differences, this table shows that “undruggable” and “difficult” sites
typically are much more hydrophilic and much less hydrophobic than
“druggable” sites.
Table 2. Average SiteMap values across Cheng druggability set
| |
Dscore |
SiteScore |
Size (Number of site points) |
Enclosure |
Hydrophilic |
Hydrophobic |
| undruggable |
0.631 |
0.827 |
61 |
0.698 |
1.522 |
0.336 |
| difficult |
0.871 |
0.995 |
140 |
0.799 |
1.385 |
0.413 |
| druggable |
1.108 |
1.091 |
156 |
0.807 |
0.926 |
1.374 |
| all cases |
1.011 |
1.048 |
143 |
0.793 |
1.099 |
1.061 |
Figure 1. Percentage of correct SiteMap predictions for Cheng druggability set.
Across these 63 targets, SiteMap proved exceptional at classifying druggability. As seen in Figure 1,
SiteMap correctly classifies all of the “undruggable” targets and does
very well on the other classes. Some of the classifications in the
Cheng dataset may be argued, but overall it is a quality dataset in
which a large degree of effort was undertaken to generate a reasonable
classification.
Finally,
in characterizing binding sites, SiteMap provides a wealth of
quantitative and graphical information. This information can help guide
critical assessment of virtual hits during lead discovery, or direct
ligand structure modification towards enhanced potency or improved
physical properties during lead optimization. These attributes allow
SiteMap to complement techniques such as docking and computational lead
optimization in structure-based drug design. Figure 2 is an example of a SiteMap surface for thrombin with site points shown. Figure 3 shows the SiteMap hydrophobic, donor, and acceptor maps for the same target. PDB accession code 1ett was used.
Figure 2. SiteMap surface and site points (white spheres) for thrombin, specificity pocket.
Figure 3. Hydrophobic (yellow), donor (blue), and acceptor maps (red) for specificity pocket of thrombin.
In
summary, SiteMap combines a novel and highly effective algorithm for
rapid binding-site identification with easy-to-use property and
visualization tools. It correctly identifies the known site as the
top-scoring site in 86% of a set of 538 complexes taken from the
PDBbind database. Moreover, its accuracy increases to 88% when only
proteins that bind their co-crystallized ligands with submicromolar
affinity are considered, and to 98% when the affinity is subnanomolar.
In addition, SiteMap calculates a druggabilty score (Dscore) that
accurately accounts for the division of sites into “druggable”,
“difficult”, and “undruggable” targets. This score also provides
insight into the physical basis of these classifications. For
binding-site analysis, SiteMap provides a wealth of information in the
form of computed properties and graphical contour maps that distinguish
hydrophobic, hydrogen-bond donor, hydrogen-bond acceptor, and
metal-binding regions. This information can be used in a lead-discovery
application to quickly evaluate docking hits, or in a lead-optimization
context to suggest how a ligand structure might be modified to increase
its binding affinity or to improve its physical properties. SiteMap
calculations typically take a few minutes on a standard workstation for
proteins with 5,000 – 10,000 atoms (including hydrogens). A paper that
describes SiteMap in more detail has recently been submitted to the
Journal of Chemical Information and Modeling.
1 Brown D, Superti-Furga G. “Rediscovering the sweet spot in drug discovery.” Drug Discov. Today 2003, 8, 1067–1077.
2 Hopkins AL, Groom CR. “The druggable genome.” Nat. Rev. Drug Disc. 2002, 1, 727–730.
3 Halgren T. “New method for fast and accurate binding-site identification and analysis.” Chem. Biol. Drug Des. 2007, 69, 146-148.
4
Weber A, Halgren TA, Doyle JJ, Lynch RJ, Siegl PK, Parsons WH, Greenlee
WJ, Patchett AA. “Design and synthesis of P2-P1’-linked macrocyclic
human renin inhibitors.” J. Med. Chem. 1991, 34, 2692-2701.
5
Goodford PJ. “A computational procedure for determining energetically
favorable binding sites on biologically important macromolecules.” J. Med. Chem. 1985, 28, 849-857.
6 Wang R, Fang X, Lu Y, Yang C-Y, Wang S. “The PDBbind database: methodologies and updates.” J. Med. Chem. 2005, 48, 4111-4119. (PDBbind version 2004 was used in this work.)
7
Cheng AC, Coleman RG, Smyth KT, Cao Q, Soulard P, Caffrey DR, Salzberg
AC, Huang ES. “Structure-based maximal affinity model predicts
small-molecule druggability.” Nat. Biotechnol. 2007, 25, 71-75.