Advances in Biologics Modeling with BioLuminate
Over the past decade, the growth of biopharmaceuticals (biologics) as a fraction of total drug sales has been nothing short of spectacular. With this growth, we have witnessed concomitant increases in the resources applied to the discovery and development of new biologics within the pharmaceutical industry. In recognition of this shift, Schrödinger has affirmed its commitment to advancing the field of biologics modeling, and this commitment is reflected by a number of important publications in this area over the past few years, as well as the development of a number of new computational tools. In this article we highlight these publications and tools.
First, it is useful to provide an overview of BioLuminate, Schrödinger’s biologics modeling platform. BioLuminate is a comprehensive modeling package for biologics, with advanced simulation methods deployed through an intuitive user interface that is specifically designed for biologics. BioLuminate is the first comprehensive integrated modeling package to specifically address the key questions associated with the molecular design of biologics. BioLuminate provides access to tools for protein engineering, residue/alanine scanning, analysis of protein-protein interfaces, antibody modeling, protein aggregation prediction, identification of reactive hotspots (proteolysis, glycosylation, deamidation, and oxidation), and more. BioLuminate also serves as the entry point to PIPER, the protein-protein docking tool developed by the Vajda group at Boston University.1 The core algorithms in PIPER consistently perform best in CAPRI competitions when compared with other automated protein-protein docking servers (note: ClusPro is the automated server based on the algorithms in PIPER).2 For a more complete listing of BioLuminate features, please visit the product website.
In the rest of this article, we will focus on 4 primary application areas:
1. Antibody modeling
2. Protein-protein binding
3. Protein stabilization
4. Enzyme design
1. Antibody modeling
Antibodies are a critically important class of biopharmaceuticals, owing to both their ability to tightly and relatively specifically bind target antigen proteins, and the ability to raise antibodies to nearly any target protein of interest. These binding capabilities are facilitated by the sequence and conformational diversity of the six loops constituting the complementarity determining region (CDR), the so-called “hypervariable loops”. As such, the ability to accurately predict the structure of the hypervariable loops is critical to perform structural modeling of antibodies. Among the six loops, a single loop, designated “H3,” is the most diverse in structure, length, and sequence identity. Prediction of the three-dimensional structures of antibodies, especially the CDR loops, is an important step in the computational design and engineering of novel antibodies for improved affinity and specificity, as well as for engineering out potential liabilities such as aggregation. Although it has been demonstrated that the conformation of the five non-H3 loops can typically be accurately predicted by comparing their sequences against databases of canonical loop conformations, no such reliable connection has been established for H3 loops. We have published results for ab initio structure prediction of the H3 loop using conformational sampling and energy calculations performed with the program Prime on a dataset of 53 loops ranging in length from 4 to 22 residues.3 When the predictions are performed in the crystal environment and including symmetry mates, the median backbone root mean square deviation (RMSD) is 0.5 Å to the crystal structures, with 91% of cases having an RMSD of less than 2.0 Å. These results show promise for ab initio loop predictions applied to modeling of antibodies.
Recently, Schrödinger contributed blinded antibody structure predictions to the Second Antibody Modeling Assessment (AMA-II) using a fully automatic antibody structure prediction method implemented in the BioLuminate software package. We employed a knowledge-based approach to modeling the CDR loops, using a combination of sequence similarity, geometry matching, and clustering of database structures. The homology models were further optimized with a physics-based energy function (VSGB2.0), which improves the model quality significantly. While H3 loop modeling remains a very challenging task, our ab initio loop prediction performed better than any other systematic approach for predicting the H3 loop conformation in the crystal structure context, and this approach allows improved results when refining the H3 loop in the context of a homology model. For the 10 human and mouse-derived antibodies in this assessment, the average RMSDs for the homology model Fv and framework regions are 1.19 Å and 0.74 Å , respectively (see Figure 1). The average RMSDs for the five non-H3 CDR loops range from 0.61 Å to 1.05 Å , and the H3 loop average RMSD is 2.71 Å using our knowledge-based loop prediction approach. However, when we subsequently apply the Prime ab initio approach to predicting the H3 conformations in the context of the crystal structure, the average RMSD deviation for H3 drops substantially to 1.40 Å. Notably, our method for predicting the H3 loop in the crystal structure environment ranked first among the seven participating groups in AMA-II, and our method made the best prediction among all participants for seven of the ten targets.
Figure 1. Predicted H3 loop structures (dark blue) from the blinded AMA-II antibody prediction assessment and corresponding crystal structures (turquois). From left to right: top: AM2-AM6; bottom: AM7-AM11.4
2. Protein-protein binding
Predicting changes in protein-protein binding affinity due to single amino acid mutations helps us better understand the driving forces underlying protein-protein interactions and design improved biotherapeutics. In BioLuminate, we use the MM-GBSA approach with the OPLS20055,6 force field and the VSGB2.0 solvent model7 to calculate differences in binding free energy between wild-type and mutant proteins. While this physics-based approach was originally developed and validated for calculating the energetics of small molecule binding, we have demonstrated that, without any changes, this model is also predictive for calculating changes to protein-protein binding affinity. For protein-protein binding validation, we compared predictions to experimental data for a set of 418 single residue mutations in 21 targets and found that the MM-GBSA model, on average, performs well at scoring these single protein residue mutations.6 Correlation between the predicted and experimental change in binding affinity is statistically significant and the model performs well at picking “hotspots,” or mutations that change binding affinity by more than 1.0 kcal/mol. The promising performance of this physics-based method with no tuned parameters for predicting binding energies suggests that it can be transferred to other protein engineering problems. Example correlations for predictions versus experimental binding energies for T-cell receptor beta complexes with SEC3 superanigen (1JCK) and the antibody D1.3 bound to hen egg white lysozyme antibody D1.3 (1VFB) are shown in Figure 2.
Figure 2. Examples of predicted binding energies versus predicted affinity using the MM-GBSA method in BioLuminate Residue Scanning (left 1JCK, right 1VFB).8
3. Protein stabilization
Protein engineering remains an area of growing importance in pharmaceutical and biotechnology research. Stabilizing the folded protein conformation is a frequent goal in projects that deal with affinity optimization, enzyme design, protein construct design, reducing the size of functional proteins, and crystallization. One way to stabilize a protein is through the introduction of disulfide bonds. We have developed a method for identifying positions in the protein where the introduction of Cysteine residues would encourage the formation of stabilizing disulphide bonds. This approach combines a physics based implicit solvent scoring function with a novel knowledge-based scoring function derived from an analysis of the geometries of disulphide bonds in protein structures available in the PDB. We assign relative weights to the terms that comprise our scoring function using a genetic algorithm and find that the native disulfide in the wild-type proteins is scored well, on average (within the top 6% of the reasonable pairs of residues that could form a disulfide bond). Overall, the benchmark results using this approach suggest it should be useful for triaging possible pairs of mutations for disulfide bond formation to improve protein stability. An example of a predicted disulfide bond is shown in Figure 3.
Figure 3. Comparison of the disulfide bond in the predicted (orange carbons) and X-ray (cyan carbons) structure of the lectin domain of the F17G fimbrial adhesin (1ZK5). Using the Cysteine Scanning module in BioLuminate, the true disulfide ranked first out of all possible disulfide-forming residue pairs. Cysteine residues are represented as ‘ball-and-sticks’ and adjacent residues to cysteines are represented as ‘thin-tube’; and all other residues are represented as ‘wire.’9
Recently, Michael Hanson and colleagues at Receptos were able to crystallize a new GPCR structure (Human Lysophosphatidic Acid Receptor 1, aka LPA1) after introducing a stabilizing disulfide bond predicted by BioLuminate.10 In short, candidates for introducing a stabilizing disulphide bond were generated using the Cysteine Scanning module in BioLuminate and five potential pairs of residues for cysteine mutation and stabilization were identified on the extracellular half of the receptor. After generation of the mutant constructs and testing for the best expression profile, the double mutant D204C and V282C was selected and successfully crystallized. Identification and synthesis of this double mutation yielded a crystallizable construct for a project where previous efforts had not been successful.
4. Enzyme design
Computational enzyme design is an emerging field that has yielded promising success stories, but where numerous challenges remain. Accurate methods to rapidly evaluate possible enzyme design variants could provide significant value when combined with experimental efforts, both by reducing the number of variants that need to be synthesized and by speeding the time required to reach the desired endpoint of the design process. To that end, extending our computational methods to model the fundamental physio–chemical principles that regulate activity in a protocol that is automated and accessible to a broad population of enzyme design researchers is essential. Within BioLuminate, we employ a physics-based implicit solvent MM-GBSA scoring approach for enzyme design. We have published several works that benchmark the computational predictions, applied to enzyme systems, against experimentally determined activities.10, 11, 12 Benchmark systems include: steroid binder protein; catalytic turnover for a Kemp eliminase; and catalytic activity for α-Gliadin peptidase variants. In each case, we find that we can accurately identify the most experimentally-active enzyme variants, suggesting that this approach could provide enrichment of active variants in real-world enzyme design applications.
In a collaboration with colleagues at Pfizer13, we explored the molecular basis of substrate recognition and binding in a S-stereoselective ω-aminotransferase (ω-AT), which naturally catalyzes the transamination of pyruvate into alanine. Our goal was to predict mutations that enhance the catalytic efficiency of the enzyme. The conversion of (R)-ethyl 5-methyl-3-oxooctanoate to (3S,5R)-ethyl 3-amino-5-methyloctanoate in the context of several ω-AT mutants was evaluated using the MM-GBSA protocol described above. We were able to correctly identify the mutations that yielded the greatest improvements in enzyme activity (20−60-fold improvement over wild type) and confirmed that the computationally predicted structure of a highly active mutant reproduced key structural aspects of the variant, including side chain conformational changes, as determined by X-ray crystallography. Overall, the MM-GBSA protocol has yielded encouraging results and suggests that computational approaches can aid in the redesign of enzymes with improved catalytic efficiency.
Figure 4. Superimposition of the x-ray and predicted structures of the most active variant of ω-AT (#414) The computational prediction with the lowest RMSD (1.0 Å) is shown in cyan and highest RMSD (1.4 Å) is shown in tan colored carbon atom.14
As suggested by the examples above, we are making great strides forward in the development and application of structure-based methods for biopharmaceutical design. These improvements reflect both the availability of an integrated software platform for such calculations (BioLuminate), and methodological improvements in approaches for specific problems. Looking ahead, we anticipate that these tools will become increasingly important to biologics design, where the integration of computational approaches during discovery and development is still in the ascendancy.
1. Kozakov, D.; Brenke, R.; Comeau, S.R.; Vajda, S. PIPER: An FFT-based protein docking program with pairwise potentials. Proteins, 2006, 65(2), 392–406
2. Janin, J. Protein–protein docking tested in blind predictions: the CAPRI experiment. Mol. BioSyst., 2010, 6, 2351-2362
3. Zhu, K. and Day, T. Ab initio structure prediction of the antibody hypervariable H3 loop. Proteins, 2013, 81(6), 1081-108
4. Zhu, K.; Day, T.; Warshaviak, D.; Murrett, C.; Friesner, R.; Pearlman, D.A. Antibody structure determination using a combination of homology modeling, energy-based refinement, and loop prediction. Proteins, 2014, 82(8), 1646–1655
5. Jorgensen, W.L.; Maxwell, D.S.; Tirado-Rives, J. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J Am Chem Soc, 1996, 118, 11225–11236
6. Shivakumar, D.; Williams, J.; Wu, Y.; Damm, W.; Shelley, J.; Sherman, W.; Prediction of absolute solvation free energies using molecular dynamics free energy perturbation and the OPLS force field. J Chem Theory Comput, 2010, 6, 1509–1519
7. Li, J.; Abel, R.; Zhu, K.; Cao, Y.; Zhao, S.; Friesner, R. The VSGB 2.0 model: A next generation energy model for high resolution protein structure modeling. Proteins: Structure, Function, and Bioinformatics, 2011, 79(10), 2794–2812
8. Beard, H.; Cholleti, A.; Pearlman, D.; Sherman, W.; Loving, K.A. Applying physics-based scoring to calculate free energies of binding for single amino acid mutations in protein-protein complexes. PLoS ONE, 2013, 8(12), e82849. doi:10.1371/journal.pone.0082849
9. Salam, N.; Adzhigirey, M.; Sherman, W.; and Pearlman, D.A. Structure-based approach to the prediction of disulfide bonds in proteins. PEDS, 2014, 27(10), 365-374
10. Chrencik et al. Crystal Structure of Antagonist Bound Human Lysophosphatidic Acid Receptor 1. Cell, 2015, 161(7), 1633–1643
12. Gannavaram, S.; Sirin, S.; Sherman, W.; Gadda, G. Mechanistic and Computational Studies of the Reductive Half-Reaction of Tyrosine to Phenylalanine Active Site Variants of d-Arginine Dehydrogenase. Biochemistry, 2014, 53(41), 6574-6583
13. Gannavaram, S.; Sirin, S.; Gadda, G. Mechanistic and computational studies on C-N bond oxidation in D-amino acids catalyzed by D-arginine dehydrogenase Y53F and Y249F (584.4). FASEB J., 2014, 28(1), Supplement 584.4
14. Sirin, S.; Kumar, R.; Martinez, C.; Karmilowicz, M.J.; Ghosh, P.; Abramov, Y.A.; Martin, V.; Sherman, W. A Computational Approach to Enzyme Design: Predicting ω-Aminotransferase Catalytic Activity Using Docking and MM-GBSA Scoring. J. Chem. Inf. Model., 2014, 54(8), 2334-2346