The Role of Conformations in DFT-based pKa Prediction
Background
In contrast to empirical pKa prediction, where the information about the molecule is normally conveyed by the types of atoms it contains and connectivities between those atoms (the so-called 2D structure), DFT-based pKa prediction deals with actual 3D structures. This means that the pKa values computed by a program that conducts DFT calculations might depend on the conformation of the input structure. How strong is the effect and how important is it to use low energy conformations as the input? In order to answer these questions let us consider the workflow of Schrödinger’s DFT-based pKa predictor, Jaguar pKa, which should have the same essential dependence on conformational effects as other DFT-based pKa predictors.
An important part of the Jaguar pKa workflow1 is the computation of the so-called “raw” pKa value which is assembled directly from DFT energies. This raw pKa is not the pKa that can be directly compared to the experimental value. In order to be on the same footing with the experimentally measured pKa the raw pKa still needs to be adjusted by empirical parameters that compensate for various errors of DFT and the associated solvation model. Because these empirical parameters can hardly account for conformational phenomena it is important to absorb conformational effects in raw pKa.
The formula for the raw pKa is:
where G is the free energy of the deprotonation of the molecule in solution, R is the universal gas constant, and T is the temperature. G is approximated by essentially a combination of the gas phase and solution phase DFT energies of the protonated and the deprotonated forms of the molecule:
Here E(A-)g, E(AH)g, E(AH)solv, and E(A-)solv are the gas phase DFT energy of the deprotonated form, the gas phase energy of the protonated form, the solvation energy of the protonated form, and the solvation energy of the deprotonated form, respectively. The non-essential for this discussion Constis unchanged from one pKa calculation to another and absorbs the effects related to the solvation of the proton.
The denominator in Equation 1 at room temperature (T = 298K) equals approximately 1.36 kcal/mol. This implies that an error in G equal to 1.36 kcal/mol will introduce an error of one pKa unit to the raw pKa value. Errors coming from inherent inaccuracies in DFT energies and the solvation model are responsible for the raw pKa being typically several pKa units larger than the final, corrected pKa value. The errors of DFT and the solvation model which do not fortuitously cancel in Equation 2 are more or less systematic and depend mostly on the local environment of the atom involved in the protonation/deprotonation process. This is why a parameterization model based on the linear empirical parameters a and b
tied to a specific functional group is successful in predicting pKa’s with high accuracy on the basis of Equations 1-2. Jaguar pKa contains sets of parameters a and b for over a hundred different functional groups, ranging from very specific to generic.
The errors related to conformational changes are much harder to correct. The nature of the conformational effects is global, not local, as the change in the global shape of the molecule might have a profound, unbalanced, and hardly predictable effect on the energies from Equation 2. As such, conformational errors cannot be accounted for by the simple empirical parameters associated with the functional group which undergoes a protonation event. Working erroneously with high-energy conformations that do not sufficiently characterize the protonated and the deprotonated forms of the molecule, as they exist in solution, might introduce unexpected errors in some or all of the energy terms of Equation 2. As we have seen from the previous paragraph, a G error on the order of several kcal/mol will result in an error of several pKa units in the final, empirically adjusted, pKa value. In practice, an erroneous choice of a conformation that turns out to be several kcal/mol higher in energy than the lowest conformation in the solution phase will often result in a outlier pKa value. Therefore, it is essential that DFT-based pKa calculations should be conducted with lowest energy conformations.
Improvements on the previous model
Even though a Jaguar pKa calculation could be automatically preceded by a conformational search conducted in the MacroModel program, practice showed that this procedure was still insufficient for accurate pKa predictions involving large, highly flexible molecules. At least three types of errors related to conformational effects were keeping the results sub-optimal.
(i) The MacroModel parameters chosen for the Jaguar pKa package were optimized for speed, not accuracy, thus often discarding low lying conformations and often not reaching these conformations because of cutoffs in the parameters.
(ii) Only the lowest lying conformations (one for the protonated form and one for the deprotonated form) were chosen after the MacroModel calculation and used in the subsequent DFT calculations. Thus, we were relying solely on the force field to find the lowest lying energy conformations. An inherent lower accuracy of the force field might occasionally yield a high energy conformation, which would compromise the DFT attempts at higher accuracy in the following steps of the workflow.
(iii) In cases where several low lying conformations were present their possibly slightly different contributions to the macroscopic pKa were discarded as no more that one conformation per protonation form was taken into account.
No doubt, these deficiencies contributed to somewhat disappointing results of an early version of Jaguar pKa.2 As we continued to enhance the technical and scientific aspects of the Jaguar pKa program in the course of the past few years a series of improvements aimed at eliminating errors (i)-(iii) have been carried out. New versions of Jaguar pKa treat the conformational space more thoroughly than before. The measures taken to address errors (i)-(iii) are indicated in the red font in the following illustration of the new, conformationally-aware Jaguar pKa workflow below.
Error (i) was easy to address after analyzing the effects that the settings and cutoffs previously employed in the built-in MacroModel component had on the generation of the conformational space. Several MacroModel settings were adjusted which resulted in an “aggressive” conformational search that substantially enriched the resulting conformational spaces. These adjustments, relaxing constraints, and increasing accuracy slowed down the conformational search step considerably, but the increase of timing was negligible compared to the total time the subsequent DFT calculations take.
The next improvement, addressing error (ii), involved re-optimizing the geometries of a pool of low-energy MacroModel conformations at the DFT level in conjunction with the Jaguar implementation of the PBF solvation model. A typical pool of conformation consisted of ten lowest-energy (according to the force field used in MacroModel) protonated and ten de-protonated conformations. This “re-scoring” allowed us to generate more realistic 3D structures of the conformations than those produced by the force field. In evaluating the effects of this step we noticed that the DFT method would not only adjust the atomic positions in some of the structures quite noticeably but would also often re-rank the conformations, so that the conformation that might not be considered lowest energy by the force field would be ranked as lowest energy by the DFT method. This step, unfortunately, increased the total time of the pKa calculation significantly, by a factor of the number of the conformations treated with the DFT method. However, as these DFT calculations are completely independent of one another, their parallelization is trivial and would not result in the increase of the overall computational time in comparison with the old protocol if an additional number of CPUs is available.
Lastly, to address error (iii), multiple conformations were given a chance to contribute to the final pKa value. This was achieved by “weighing”, or statistically “averaging” pKa contributions coming from each pair of protonated and deprotonated conformations. The lower the energy of the conformation the greater contribution to the final pKa value it had. The mathematical details and a description of the implementation of this new, improved protocol will be published elsewhere.4
Conclusion
In order to evaluate the impact of the improvements described in the previous section, we chose two sets of flexible molecules which had previously posed problems in DFT and empirical pKa predictions, apparently due to an inadequate treatment of the conformational effects.
The first test set contained flexible amidines which were considered as BACE-1 inhibitors with the common chemical formula.5
This set of molecules used to be a challenge for the old Jaguar pKa protocol which was suffering from its inability to account for errors (i)-(iii). Even though the overall performance of the old Jaguar pKa protocol was quite good (with the MUE on the order of 0.6 pKa units; see the table below) two types of problems were still evident. The first one was that the predicted pKa might depend strongly on the initial input conformation (see the values for the conformations A and B shown in the table; conformations A and B differ by a 180o flip of the amide bond in the chemical formula shown above). For example, the input conformation A of structure 67 yielded pKa = 6.0 whereas the input conformation B of the same structure produced pKa = 7.0 even when preceded with the built-in MacroModel conformational search. Obviously, this effect is highly undesirable as it creates a significant uncertainty in the final results and poses barriers for the reproducibility of the results by different groups of scientists. The second problem was that there were still several outliers (values with an error of more than 1 pKa unit; shown in red in the table). We believe that these problems were mostly the consequence of deficiencies (i) and (ii). Computational experiments with the new protocol suggested that deficiency (iii) had played a less prominent role.
Structure | Experimental pKa | Old Jaguar pKa, conf A | Old Jaguar pKa, conf B | New Jaguar pKa, conf A | New Jaguar pKa, conf B |
1b | 9.7 | 9.0 | 8.9 | 9.7 | 9.6 |
66 | 8.1 | 7.7 | 7.8 | 8.2 | 8.2 |
67 | 7.4 | 6.0 | 7.0 | 7.4 | 7.5 |
69 | 7.3 | 7.3 | 7.8 | 7.7 | 7.7 |
109 | 7.3 | 5.4 | 5.4 | 7.0 | 7.0 |
89 | 7.0 | 6.8 | 6.9 | 7.3 | 7.3 |
68 | 6.7 | 5.0 | 7.0 | 7.3 | 7.3 |
88 | 6.3 | 6.1 | 5.9 | 6.3 | 6.3 |
111 | 5.9 | 5.8 | 5.7 | 6.0 | 6.4 |
110 | 5.8 | 5.2 | 5.3 | 6.0 | 6.1 |
14d | 5.8 | 5.8 | 6.1 | 6.3 | 6.3 |
70 | 5.1 | 4.3 | 3.9 | 4.7 | 4.5 |
1a | 9.8 | 9.4 | 9.1 | 9.8 | 9.8 |
14a | 6.3 | 6.2 | 6.3 | 6.4 | 6.4 |
112 | 9.0 | 9.5 | 8.1 | 9.0 | 9.1 |
113 | 6.1 | 6.1 | 6.0 | 6.3 | 6.4 |
MUE | 0.56 | 0.54 | 0.20 | 0.27 |
The last two columns of the table show the performance of the new, conformationally-aware protocol which was described in the previous section. For these calculations, we used a pool of ten protonated and ten deprotonated conformations. From the table, we can see that both problems plaguing the previous results have now been essentially eliminated. The dependence of the final pKa results on the initial conformation is negligible (with the biggest difference being 0.4 pKa units for structure 111), and the number of outliers is now zero. Not surprisingly, the overall performance improved dramatically: the mean unsigned errors (MUE) have now dropped to the impressive low error of 0.2 - 0.3 pKa units, which was impossible to achieve with any of the previous protocols, empirical or non-empirical, that we have tried before.
The second test of the new protocol set a goal of being able to describe the subtle effects of stereochemistry on the pKa. Although quite rare, the effect is nevertheless observable in the experiment and is created by the different electrostatic environment in the vicinity of the dissociating proton, due to different multipole moments and even hydrogen bond patterns of the stereoconformations that are not mirror images of one another. The better performance of the new Jaguar pKa protocol with respect to describing these subtle effects could already be visible from the BACE-1 results above, as structures 66 and 67 are stereoisomers:
Structure | 67 | 67 |
Experimental pKa | 8.1 | 7.4 |
Old Jaguar pKa | 7.7, 7.8 | 6.0, 7.0 |
New Jaguar pKa | 8.2, 8.2 | 7.4, 7.5 |
The following four molecules are two pairs of stereoisomers, where each pair has a different experimental pKa, doubtless owing to the neighboring OH group that can interact with the protonatable secondary or tertiary amine group via a hydrogen bond. One stereoconfiguration can form such a bond directly whereas the other one, in which the amino-group and the hydroxy-group are separated by a greater distance, cannot.
Experimental pKa | 8.0 | 8.8 | 8.1 | 8.6 |
Old Jaguar pKa | 8.2 | 8.4 | 8.5 | 7.9 |
New Jaguar pKa | 7.7 | 9.4 | 8.7 | 9.2 |
Comparing the results produced by the old and the new Jaguar pKa protocols we can conclude that the old protocol, although quite accurate in predicting the absolute values (MUE = 0.43), fails to reproduce the trends in the pKa dependence on the stereoconfiguration. The new protocol, making use of ten protonated and ten deprotonated conformations, reproduces the pKa dependence correctly and yields accurate absolute pKa values. The MUE of the new protocol on these four molecules is 0.53, a little larger than that observed for the set of the BACE inhibitors, is probably due to the non-trivial electrostatic and solvation effects associated with the hydrogen bonds. Nevertheless, the performance of the new, conformationally-aware workflow is impressive. More testing of the new protocol would be desirable and will be reported in an upcoming publication.
References
Klicić, J. J.; Friesner, R. A.; Liu, S.-Y.; Guida, W. C., Accurate prediction of acidity constants in aqueous solution via density functional theory and self-consistent reaction field methods. J. Phys. Chem. A, 2002, 106, 1327–1335.
Liao, C.; Nicklaus M. C. Comparison of Nine Programs Predicting pKa Values of Pharmaceutical Substances. J. Chem. Inf. Model. 2009, 49, 2801–2812.
Marten, B.; Kim, K.; Cortis, C.; Friesner, R. A.; Murphy, R. B.; Ringnalda, M. N.; Sitkoff, D.; Honig, B., New Model for Calculation of Solvation Free Energies: Correction of Self-Consistent Reaction Field Continuum Dielectric Theory for Short-Range Hydrogen-Bonding Effects. J. Phys. Chem. 1996, 100, 11775-11778.
Bochevarov, A. D.; Watson, M. A.; Greenwood, J. R.; Philipp, D. M., Multiconformation, Density Functional Theory-Based pKa Prediction in Application to Large, Flexible Organic Molecules with Diverse Functional Groups, J. Chem. Theory Comput., 2016, 12 (12), 6001–6019.
Hilpert, H. et al., β‑Secretase (BACE1) Inhibitors with High in Vivo Efficacy Suitable for Clinical Evaluation in Alzheimer’s Disease. J. Med. Chem. 2013, 56, 3980−3995.