Professor Friesner is a co-founder of Schrödinger and Director of the Center for Biomolecular Simulation at Columbia University. As Chairman of Schrödinger's Scientific Advisory Board, Professor Friesner provides strategic vision and guidance for Schrödinger's scientific advancements. In this installment of Rich's column, he describes ongoing research to improve the prediction of long loops and side chains in Prime.
Predicting the structure of loop regions in proteins has been a central objective of biomolecular modeling for the past several decades. Loop structures can often assume several different low-energy conformations, some of which are considerably different from others – for example, the DFG-in and DFG-out activation loop in kinases such as ABL and p38. Structure-based drug design against these targets can benefit from the ability to accurately access the different loop conformations, particularly in the absence of a crystal structure. Similarly, the loop regions of homologous proteins show significant variations, and methods capable of modeling these variations will yield superior structures for virtual screening and lead optimization. In what follows, I focus on loop prediction in the context of the native protein environment; applications to homology modeling, which are in progress, will be the subject of a future article.
From its inception, Prime
has demonstrated substantial improvements in loop predictions when
compared to alternative methods in the literature. Initially, accurate
results were limited to roughly 10 residue loops. As we improved both
the sampling algorithms and the energy model in Prime, reliability in
predicting these short loops has improved considerably, but in
addition, the ability to model longer loops has also advanced
significantly. Table 1 presents a comparison of our latest development version of Prime with results from Prime version 1.5, taken from a paper [1] in the Journal of Chemical Theory and Computation.
The dramatic reduction in errors for longer loops is in great part due
to breakthroughs in developing more accurate models of continuum
solvation. These models are briefly explained below; those interested
in further details can consult refs. [1] and [2].
|
|
Uniform
Dielectric |
Variable
Dielectric |
Uniform
Dielectric + Hydrophobic |
Variable
Dielectric + Hydrophobic |
Variable
Dielectric + OptHydrophobic |
|
6 residue
|
0.48
|
0.40
|
0.46
|
0.41
|
0.39
|
|
8 residue
|
0.84
|
0.79
|
0.76
|
0.74
|
0.68
|
|
10 residue
|
1.27
|
0.73
|
1.05
|
0.76
|
0.80
|
|
13 residue
|
2.73
|
1.62
|
1.29
|
1.08
|
1.00
|
|
The
RMSD is the loop backbone RMSD while superimposing the rest of the
protein. The first two columns show the results with uniform dielectric
model and variable dielectric model. The next two columns show the
results when these two models are combined with the hydrophobic term.
The last column shows the results of our optimization of hydrophobic
term on the variable dielectric model by taking lysines out of
hydrophobic term. Hydrophobic and OptHydrophobic represent the original
hydrophobic term and the optimized hydrophobic term, respectively.
|
|||||
Our older results applied a previous-generation of the generalized Born/surface area (GB/SA) model – one which is substantially similar to those currently used in other programs. In these results there is a noticeable increase in the average RMSD from experiment beginning at ~10 residues and presenting a serious accuracy problem at 13 residues. Similar difficulties are observed by other groups. The origin of this increased error is not difficult to understand; for shorter loops the difference between the loop length and the end-to-end distance between the loop endpoints is typically rather small, thus, the loop has little “play” and the conformational space is greatly constrained by the need to satisfy the attachment points. In contrast, at around 10 residues the average loop begins to significantly exceed in length the distance between the attachment points, and at 13 residues, there is typically considerable excess length, which leads to an explosion in the size of phase space available to the loop. This explosion makes the sampling problem much more difficult, and also creates the possibility for a much larger number of incorrect structures, any of which may score better than the native structure due to problems with the scoring function.
To eliminate these incorrect structures, we have made two major modifications to the “standard” GB/SA continuum model. Firstly, we recognized that the surface area component of the model, while addressing hydrophobic effects in small molecule solvation free energy calculations, can yield very large errors in the context of larger scale structures such as proteins. Specifically, removing a loop from the body of the protein pulls out hydrophobic side chains from the loop that were “docked” into the hydrophobic core of the protein. When the loop is removed hydrophobic holes on the Ångström scale are therefore left behind. The “standard” GB/SA model grossly underestimates the free energy penalty associated with these holes; if one, or a few, water molecules were to occupy such holes, they would be unable to make one or more of their normal complement of hydrogen bonds. A continuum model cannot compute this correctly because it models water molecules as infinitesimal dipoles. Such structures simply do not appear in calculating small molecule solvation free energies, which generally do not form cavities of this type. There are a number of ways to approach this problem and we chose the simplest. This involved the addition of an empirical hydrophobic term to the energy function, similar to what is used to score protein-ligand docking. As shown in ref. [2], this yielded greatly improved prediction of long loops.
The second problem we identified with “standard” GB/SA (and PB/SA) models is the treatment of the internal dielectric constant of the protein. Various groups have used values ranging from 1 to 20, but none has proven entirely satisfactory. In ref. [1] we argue that the internal dielectric of the protein should depend upon which residues are interacting; charged residues induce a higher degree of polarization in their surroundings, and hence interactions involving a charged residue should have a correspondingly higher internal dielectric constant. We refer to our implementation of this idea as a variable dielectric model, and in this model an effective internal dielectric is defined for each pair of interacting residues. Using this model we obtained substantial improvements in the prediction of charged side chains, without reducing effectiveness in predicting neutral side chains, and this in turn resulted in further improvement in long loop prediction. The most dramatic effect of the new model is shown in Figure 1 below, which presents the distribution of NH4+—COO- distances, from the lysine and carboxylate residues, respectively, obtained from experimental crystal structures, and compared with single side-chain predictions from the fixed and variable dielectric models. The standard single dielectric model drastically overestimates the formation of salt bridges, as well as the N—O distance observed in these salt bridges. The variable dielectric model, while not perfect, nevertheless represents a dramatic improvement.
|
Figure 1. The distribution of NH4+—COO- distances from the lysine and carboxylate residues. The predictions of uniform dielectric 1 and the variable dielectric model are compared with native structures. The variable dielectric model eliminates the over prediction of salt bridges in the uniform dielectric model. |
When these new energetic terms are coupled to the increasingly powerful conformational sampling algorithms being built into Prime, prediction of increasingly longer loop structures becomes possible. The next release of Prime will contain technology that is reliable up to 13 residues in length. New developments in my academic group at Columbia have been successful in predicting 15 residue loops with good robustness, and we have had significant success for loops in the 18-20 residue range. Thus, continued technological progress in the coming years can be expected, and will be delivered in the integrated Schrödinger software suite.
[1] Zhu, K.; Shirts, M.; Friesner, R. “Improved Methods for Side Chain and Loop Predictions via the Protein Local Optimization Program: Variable Dielectric Model for Implicitly Improving the Treatment of Polarization Effects.” J. Chem. Theory Comput. 2007, 3, 2108-2119.
[2] Zhu, K.; Pincus, D.L.; Zhao, S.; Friesner, R.A. “Long loop prediction using the protein local optimization program.” Proteins. 2006, 65, 438-452.
Comments and questions on Dr. Friesner's column are welcome. Please send these via email to ask-rich@schrodinger.com, and we'll address particularly interesting topics in future newsletters.
