Skip to main content

Advertisement

Log in

PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results

  • Original Paper
  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Summary

We introduce PHASE, a highly flexible system for common pharmacophore identification and assessment, 3D QSAR model development, and 3D database creation and searching. The primary workflows and tasks supported by PHASE are described, and details of the underlying scientific methodologies are provided. Using results from previously published investigations, PHASE is compared directly to other ligand-based software for its ability to identify target pharmacophores, rationalize structure-activity data, and predict activities of external compounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

References

  1. Guner OF (2000) Pharmacophore perception, development, and use in drug design. International University Line, La Jolla, CA

    Google Scholar 

  2. Van Drie JH (2003) Curr Pharm Design 9:1649

    Article  CAS  Google Scholar 

  3. Topliss JG (1983) Quantitative structure-activity relationships of drugs, vol 19. Academic Press, New York

    Google Scholar 

  4. Martin YC (1978) Quantitative drug design: a critical introduction. Marcel Dekker, New York

    Google Scholar 

  5. Hansch C, Fujita T (1964) J Am Chem Soc 86:1616

    Article  CAS  Google Scholar 

  6. Gund P, Wipke WT, Langridge R (1974) Computer searching of a molecular structure file for pharmacophoric patterns, vol 3. Elsevier, Amsterdam, pp 33–39

    Google Scholar 

  7. Kier LB, Hall LH (1976) Molecular connectivity in chemistry and drug research. Academic Press, London

    Google Scholar 

  8. Hancsh C, Leo A (1979) Substituent constants for correlation analysis in chemistry and biology. Wiley, New York

    Google Scholar 

  9. Hopfinger AJ (1980) J Am Chem Soc 102:7196

    Article  CAS  Google Scholar 

  10. Van Drie JH, Weininger D, Martin YC (1989) J Comput-Aided Mol Design 3:225

    Article  CAS  Google Scholar 

  11. Lauri G, Bartlett PA (1994) J Comput-Aided Mol Design 8:51

    Article  CAS  Google Scholar 

  12. Van Drie JH (1997) J Comput-Aided Mol Design 11:39

    Article  CAS  Google Scholar 

  13. Chen X, Rusinko A, III Young SS (1998) J Chem Inf Comput Sci 38:1054

    Article  CAS  Google Scholar 

  14. Chen X, Rusinki A, III Tropsha A, Young SS (1999) J Chem Inf Comput Sci 39:887

    Article  CAS  Google Scholar 

  15. Greene J, Kahn S, Savoj H, Sprague P, Teig S (1994) J Chem Inf Comput Sci 34:1297

    Article  CAS  Google Scholar 

  16. Barnum D, Greene J, Smellie A, Sprague P (1996) J Chem Inf Comput Sci 36:563

    Article  CAS  Google Scholar 

  17. Martin YC, In Hansch C, Fujita T (eds) (1995) Classical and 3D QSAR in agrochemistry. American Chemical Society, Washington, DC, pp 318–329

    Google Scholar 

  18. Jones G, Willett P, Glen RC (1995) J Comput-Aided Mol Design 9:532

    Article  CAS  Google Scholar 

  19. Cramer RD, Patterson DE, Bunce JD (1988) J Am Chem Soc 110:5959

    Article  CAS  Google Scholar 

  20. Van Drie JH, In Guner OF (ed) (2000) Pharmacophore perception, development, and use in drug design. International University Line, La Jolla, CA, pp 517–530

    Google Scholar 

  21. Ligprep 2.0 (2006) Schrodinger, LLC, New York, NY

  22. MacroModel 9.1 (2006) Schrodinger, LLC, New York, NY

  23. Halgren TA (1996) J Comput Chem 17:520

    Article  CAS  Google Scholar 

  24. MacroModel 2.0 (2006) User Manual, Schrodinger LLC, New York, NY

  25. Chang G, Guida W, Still WC (1989) J Am Chem Soc 111:4379

    Article  CAS  Google Scholar 

  26. Kolossvary I, Guida WC (1996) J Am Chem Soc 118:5011

    Article  CAS  Google Scholar 

  27. SMARTS – Language for Describing Molecular Patterns, Daylight Chemical Information Systems, Inc., Aliso Viejo, CA

  28. Marshall GR, Barry CD, Bosshard HE, Dammkoehler RA, Dunn DA, In Olson EC, Christoffersen RE (eds) (1979) Computer-assisted drug design. American Chemical Society, Washington, DC, pp 205–226

    Google Scholar 

  29. Beusen DD, Marshall GR, In Guner OF (ed) (2000) Pharmacophore perception, development, and use in drug design. International University Line, La Jolla, CA, pp 23–45

    Google Scholar 

  30. Van Drie JH (1997) J Chem Inf Comput Sci 37:38

    Article  CAS  Google Scholar 

  31. Patel Y, Gillet VJ, Bravi G, Leach AR (2002) J Comput-Aided Mol Design 16:653

    Article  CAS  Google Scholar 

  32. Suling WJ, Reynolds RC, Barrow EW, Wilson LN, Piper JR, Barrow WW (1998) J Antimicrob Chemother 42:811

    Article  CAS  Google Scholar 

  33. Suling WJ, Seitz LE, Pathak V, Westbrook L, Barrow EW, Zywno-Van-Ginkel S, Reynolds RC, Piper JR, Barrow W (2000) Antimicrob Agents Chemoth 44:2784

    Article  CAS  Google Scholar 

  34. Debnath AK (2002) J Med Chem 45:41

    Article  CAS  Google Scholar 

  35. Maestro 7.5 (2006) Schrodinger, LLC, New York, NY

  36. World Drug Index (2001) Thomson Scientific

  37. Wold H, In Gani J (ed) (1975) Perspectives in probability and statistics, Papers in Honour of Bartlett MS on the Occasion of His Sixty-Fifth Birthday, Academic Press, London, pp 117–142

  38. Wold S, Ruhe H, Wold H, Dunn WJI (1984) SIAM J Scientific Stat Comput 5:735

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steven L. Dixon.

Appendices

Appendix A: selectivity estimation

In PHASE, the selectivity of a pharmacophore hypothesis H is defined as follows:

$$ {\rm Selectivity}({\rm H})=-\log_{10} [p({\rm H})], $$
(A1)

where p(H) is the probability that a random drug-like molecule will match the hypothesis, irrespective of any activity exhibited by that molecule toward the biological target in question. Given a database of drug-like molecules, it is straightforward to search that database for matches to a hypothesis, and thereby arrive at an estimate of selectivity based on that particular sample population of molecules. However, application of such a procedure is far too time-consuming to be practical when scoring a large number of hypotheses, so a rapid means of estimating selectivity based on the physical characteristics of a hypothesis is sought.

Van Drie [12] has shown that selectivities of two-point pharmacophores can be reliably estimated with respect to a given database using pre-tabulated probabilities that cover discrete distance ranges. He went on to show that highly selective three-point pharmacophores can be constructed by combining two-point pharmacophores with the highest selectivities. This is a natural consequence of the fact that the probability of matching a k-point pharmacophore \(\hbox{H}^{\langle k \rangle}\) is less than or equal to the probability of matching all (k· (k−1))/2 two-point pharmacophores embedded within \(\hbox{H}^{\langle k \rangle}\):

$$ p({\rm H}^{\langle k \rangle})\le p\left(\bigcap\limits_{i < j\le k} {{\rm H}_{ij}^{\langle k \rangle}} \right) $$
(A2)

Strict equality is not preserved because a given molecule may match each of the two-point pharmacophores even if it fails to contain a single arrangement of k features that matches \(\hbox{H}^{\langle k \rangle}\). Nevertheless, since matching the two-point pharmacophores is a necessary condition for matching \(\hbox{H}^{\langle k \rangle}\), the right-hand-side of Eq. A2 is of interest for purposes of estimating selectivity.

If the two-point probabilities are independent, then the following relation holds:

$$ p\left(\bigcap\limits_{i < j\le k} {{\rm H}_{ij}^{\langle k \rangle}} \right)=\prod\limits_{i < j\le k} p({\rm H}_{ij}^{\langle k \rangle}) $$
(A3)

Further, if sites i and j are separated by a distance of d ij , and their pharmacophore feature types are α(i) and α(j), respectively, then Eq. A3 can be rewritten in terms of probabilities of matching specific inter-feature distances to within a tolerance Δd:

$$ p\left(\bigcap\limits_{i < j\le k} {\rm H}_{ij}^{\langle k \rangle} \right)=\prod\limits_{i < j\le k} {p\left(d_{\alpha(i)\alpha(j)} \in [d_{ij} -\Delta d,\,d_{ij} +\Delta d]\right)} $$
(A4)

Given a population of drug like molecules and a pair of feature types x and y, there is a probability density p *(d xy ) that describes the distribution of xy pharmacophores within that population. While p *(d xy ) may be complex and possibly discontinuous, for purposes of estimating selectivity a simple Gaussian dependence is assumed, so that the probability density may be written as:

$$ p^{\ast} (d_{xy})=\frac{1}{\sigma_{xy} \sqrt{2\pi}}\exp \left[ -\frac{(d_{xy}-\mu_{xy})^{2}}{2\sigma_{xy}^{2}}\right] $$
(A5)

For small values of Δd, the following approximation can be made:

$$ p\left({d_{xy} \in [d-\Delta d,\,d+\Delta d]} \right) \approx \Delta d\cdot p^{\ast} (d_{xy})\left|_{d_{xy}=d} \right. $$
(A6)

Substituting A5 and A6 into A4 yields

$$ p\left(\bigcap\limits_{i < j\le k} {{\rm H}_{ij}^{\langle k \rangle}} \right)\approx \prod\limits_{i < j\le k} {\frac{\Delta d}{\sigma_{\alpha (i)\alpha (i)} \sqrt {2\pi}}\exp \left[ {-\frac{(d_{ij} -\mu_{\alpha (i)\alpha (j)})^{2}}{2\sigma_{\alpha (i)\alpha (j)}^{2}}} \right]} $$
(A7)

Taking logarithms,

$$ -\log_{10} \left[ {p\left({\bigcap\limits_{i < j\le k} {{\rm H}_{ij}^{\langle k \rangle}}} \right)} \right]\approx \sum\limits_{i < j\le k} {\left\{ {-\log_{10} \left[ {\frac{\Delta d}{\sigma_{\alpha (i)\alpha (i)} \sqrt {2\pi}}} \right]+\frac{(d_{ij} -\mu_{\alpha (i)\alpha (j)})^{2}}{\log (10)\cdot 2\sigma_{\alpha (i)\alpha (j)}^{2}}} \right\}} $$
(A8)

Although it is certainly possible to estimate the univariate parameters σα (i)α (j) and μα (i)α (j) for each possible pair of feature types, it is advantageous to treat the right-hand-side of Eq. A8 as a general polynomial in d ij , and fit the associated coefficients to observed probabilities for a large number and variety of pharmacophores:

$$ -\log_{10} \left[ {p({\rm H}^{\langle k \rangle})} \right]\approx \sum\limits_{i < j\le k} {\left({A_{\alpha (i)\alpha (j)} +B_{\alpha (i)\alpha (j)} d_{ij} +C_{\alpha (i)\alpha (j)} d_{ij}^{2}} \right)} $$
(A9)

This treatment can help overcome certain deficiencies in the model, such as the assumption that the two-point probabilities are independent of each other (Eq. A3). In practice, the second-order terms in Eq. A9 do not add much statistically independent information to the model, and we have found a first-order approximation to be satisfactory:

$$ -\log_{10} \left[ {p({\rm H}^{\langle k \rangle})} \right]\approx \sum\limits_{i < j\le k} {\left({A_{\alpha (i)\alpha (j)} +B_{\alpha (i)\alpha (j)} d_{ij} } \right)} $$
(A10)

To determine appropriate values for the A and B parameters, a training set was assembled by randomly selecting 1000 minimized structures from a conformational database of the World Drug Index [36], then randomly choosing between two and seven pharmacophore sites from each structure. This yielded a training set of 1000 pharmacophores containing varying numbers of sites and different combinations of the features A, D, H, N, P, and R. A sample probability was computed for each pharmacophore Hλ by determining the number of structures M λ out of the original 1000 that matched the pharmacophore to within a tolerance of 2.0 Å in all intersite distances:

$$ p\left({{\rm H}_\lambda} \right)\equiv \frac{M_\lambda}{1000} $$
(A11)

Since there were six types of features in the sampled pharmacophores, the number of unique feature pairs was 21, requiring a total of 42 adjustable parameters. No attempt was made to optimize all of these independently because of the possibility of only limited information for certain pairs of features. For example, pharmacophores that contain both negative and positive ionizable features tend to be very rare among drug-like structures, so they cannot be expected to be well-represented in a relatively small population sample. Therefore, parameter values were determined by applying a partial least-squares (PLS) procedure to fit the −log10(Hλ) values in terms of latent factors constructed from the pool of 42 variables. Details of the PLS algorithm used in PHASE are provided in Appendix B.

To arrive at an appropriate number of PLS factors to include in the model, predictions were made for a test set of 500 pharmacophores drawn from the same sample population of 1000 WDI structures. As successively more PLS factors were incorporated into the model, test set errors trended downward until reaching a minimum at 23 factors. At this point, the test set RMSE was 0.372 log units and Q 2 was 0.786. This compared to a training set RMSE of 0.343 and R 2 of 0.826. This model has been integrated into PHASE for computation of the Selectivity_Score term that appears in Eq. 7.

It is worth noting that training sets containing as many as 5000 structures were also investigated, and no significant improvement in the test set predictions was observed. The protocol of using 1000 structures was adopted because it is far less computationally demanding, and therefore represents a practical approach for users who wish to calibrate selectivity models based on a different set of structures.

Appendix B: partial least-squares regression

PHASE utilizes a standard recursive procedure for extracting orthogonal latent factors from a data matrix in a predetermined number of steps. It is distinguished from the NIPALS algorithm [37, 38], which is an iterative approach with a user-defined stopping criterion, but no absolute control over the total number of steps.

Let XR n × m represent the independent variable data matrix for a training set of n observations and a pool of m variables. Let yR n × 1 represent the training set dependent data, which will be estimated using latent factors extracted from X. Creation of the PLS regression model proceeds as follows:

Center each column of X:

$$\hbox{for }i = 1,\ldots,m \quad \mu_{i}^{x}=\frac{1}{n}\sum\limits_{k=1}^{n} {{\bf X}(k,i)} \quad \hbox{for }k=1,\ldots,n \quad\quad {\bf X}(k,i)\to {\bf X}(k,i)-\mu_{i}^{x} \quad \hbox{next }k \quad \hbox{next }i $$

Center y:

$$ \mu^{y}=\frac{1}{n}\sum\limits_{k=1}^{n} {{\bf y}(k)} \hbox{for }k=1,\ldots ,n \quad {\bf y}(k)\to {\bf y}(k)-\mu^{y} \quad \hbox{next }k $$

Determine PLS factors and regression coefficients for up to M PLS factors (Mm):

$$ {\bf X}_{1} = {\bf X}\quad \hbox{for }i= 1,\ldots,M \quad\hbox{Compute the vector of weights that define PLS factor }i: \quad {\bf w}_{i} ={\bf X}_{i}^{\rm T} {\bf y}/ \vert{\bf X}_{i}^{\rm T} {\bf y}\vert \quad ({\bf w}_{i} \in {\bf R}^{m\times 1}) \quad \hbox{Project the rows of }{\bf X}_{i}\hbox{ onto factor }i: \quad{\bf t}_{i} ={\bf X}_{i} {\bf w}_{i} \quad\quad\quad ({\bf t}_{i} \in {\bf R}^{n\times 1}) \quad \hbox{Project }{\bf t}_{i}\hbox{ onto each column of }{\bf X}_{i}: \quad {\bf p}_{i} ={\bf X}_{i}^{\rm T} {\bf t}_{i}/ \vert {\bf t}_{i}^{\rm T} {\bf t}_{i} \vert \quad ({\bf p}_{i} \in {\bf R}^{m\times 1}) \quad \hbox{Compute the i}^{\rm th}\hbox{ PLS regression coefficient by projecting }{\bf t}_{i}\hbox{ onto }{\bf y}: \quad{\bf b}(i)={\bf t}_{i}^{\rm T} {\bf y}/ \vert {\bf t}_{i}^{\rm T} {\bf t}_{i} \vert \quad ({\bf b}\in {\bf R}^{M\times 1}) \quad\hbox{Orthogonalize }{\bf X}_{i}\hbox{ w.r.t. PLS factor }i: \quad{\bf X}_{i+1} ={\bf X}_{i}-{\bf t}_{i} {\bf p}_{i}^{\rm T} \hbox{next }i$$

For a regression with M PLS factors, the estimates \({\hat{{\bf y}}}\) are then given by:

$$ \hat{{\bf y}}(k)=\mu^{y}+\sum\limits_{i=1}^{M} {{\bf b}(i){\bf t}_{i} (k)}\quad k=1,\ldots,n $$

To apply the M-factor PLS model to a new set of \({\tilde{n}}\) observations with data matrix \({\tilde{{\bf X}}}\) , the regression coefficients b must first be translated back to the space of the original X variables:

Define

$$ \begin{array}{ll} {\bf W}\equiv \left[ {{\bf w}_{1} \ldots {\bf w}_{M}}\right] &({\bf W}\in {\bf R}^{m\times M})\\ {\bf P}\equiv \left[ {{\bf p}_{1} \ldots {\bf p}_{M}}\right] &({\bf P}\in {\bf R}^{m\times M})\\ {\bf b}^{x}\equiv {\bf W}\left({{\bf P}^{\rm T}{\bf W}} \right)^{-1}{\bf b} &({\bf b}^{x}\in {\bf R}^{m\times 1}) \end{array} $$

The coefficients b x may then be used to make estimates for the new observations as follows:

$$ \tilde{{\bf y}}(k)=\mu^{y}+\sum\limits_{i=1}^{M} \left[ {\tilde{{\bf X}}(k,i)-\mu_{i}^{x}}\right]{\bf b}^{x}(i) \quad k=1,\ldots, \tilde{n} $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dixon, S.L., Smondyrev, A.M., Knoll, E.H. et al. PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results. J Comput Aided Mol Des 20, 647–671 (2006). https://doi.org/10.1007/s10822-006-9087-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-006-9087-6

Keywords

Navigation