PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results

Dixon, Steven L.; Smondyrev, Alexander M.; Knoll, Eric H.; Rao, Shashidhar N.; Shaw, David E.; Friesner, Richard A.

doi:10.1007/s10822-006-9087-6

PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results

Original Paper
Published: 24 November 2006

Volume 20, pages 647–671, (2006)
Cite this article

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Steven L. Dixon¹,
Alexander M. Smondyrev¹,
Eric H. Knoll^1,2,
Shashidhar N. Rao¹,
David E. Shaw^1,3 &
…
Richard A. Friesner^1,2

6319 Accesses
916 Citations
4 Altmetric
Explore all metrics

Summary

We introduce PHASE, a highly flexible system for common pharmacophore identification and assessment, 3D QSAR model development, and 3D database creation and searching. The primary workflows and tasks supported by PHASE are described, and details of the underlying scientific methodologies are provided. Using results from previously published investigations, PHASE is compared directly to other ligand-based software for its ability to identify target pharmacophores, rationalize structure-activity data, and predict activities of external compounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robocrystallographer: automated crystal structure text descriptions and analysis

Article 20 September 2019

Alex M. Ganose & Anubhav Jain

Software for molecular docking: a review

Article 16 January 2017

Nataraj S. Pagadala, Khajamohiddin Syed & Jack Tuszynski

On the relevance of query definition in the performance of 3D ligand-based virtual screening

Article Open access 04 April 2024

Javier Vázquez, Ricardo García, … Enric Herrero

References

Guner OF (2000) Pharmacophore perception, development, and use in drug design. International University Line, La Jolla, CA
Google Scholar
Van Drie JH (2003) Curr Pharm Design 9:1649
Article CAS Google Scholar
Topliss JG (1983) Quantitative structure-activity relationships of drugs, vol 19. Academic Press, New York
Google Scholar
Martin YC (1978) Quantitative drug design: a critical introduction. Marcel Dekker, New York
Google Scholar
Hansch C, Fujita T (1964) J Am Chem Soc 86:1616
Article CAS Google Scholar
Gund P, Wipke WT, Langridge R (1974) Computer searching of a molecular structure file for pharmacophoric patterns, vol 3. Elsevier, Amsterdam, pp 33–39
Google Scholar
Kier LB, Hall LH (1976) Molecular connectivity in chemistry and drug research. Academic Press, London
Google Scholar
Hancsh C, Leo A (1979) Substituent constants for correlation analysis in chemistry and biology. Wiley, New York
Google Scholar
Hopfinger AJ (1980) J Am Chem Soc 102:7196
Article CAS Google Scholar
Van Drie JH, Weininger D, Martin YC (1989) J Comput-Aided Mol Design 3:225
Article CAS Google Scholar
Lauri G, Bartlett PA (1994) J Comput-Aided Mol Design 8:51
Article CAS Google Scholar
Van Drie JH (1997) J Comput-Aided Mol Design 11:39
Article CAS Google Scholar
Chen X, Rusinko A, III Young SS (1998) J Chem Inf Comput Sci 38:1054
Article CAS Google Scholar
Chen X, Rusinki A, III Tropsha A, Young SS (1999) J Chem Inf Comput Sci 39:887
Article CAS Google Scholar
Greene J, Kahn S, Savoj H, Sprague P, Teig S (1994) J Chem Inf Comput Sci 34:1297
Article CAS Google Scholar
Barnum D, Greene J, Smellie A, Sprague P (1996) J Chem Inf Comput Sci 36:563
Article CAS Google Scholar
Martin YC, In Hansch C, Fujita T (eds) (1995) Classical and 3D QSAR in agrochemistry. American Chemical Society, Washington, DC, pp 318–329
Google Scholar
Jones G, Willett P, Glen RC (1995) J Comput-Aided Mol Design 9:532
Article CAS Google Scholar
Cramer RD, Patterson DE, Bunce JD (1988) J Am Chem Soc 110:5959
Article CAS Google Scholar
Van Drie JH, In Guner OF (ed) (2000) Pharmacophore perception, development, and use in drug design. International University Line, La Jolla, CA, pp 517–530
Google Scholar
Ligprep 2.0 (2006) Schrodinger, LLC, New York, NY
MacroModel 9.1 (2006) Schrodinger, LLC, New York, NY
Halgren TA (1996) J Comput Chem 17:520
Article CAS Google Scholar
MacroModel 2.0 (2006) User Manual, Schrodinger LLC, New York, NY
Chang G, Guida W, Still WC (1989) J Am Chem Soc 111:4379
Article CAS Google Scholar
Kolossvary I, Guida WC (1996) J Am Chem Soc 118:5011
Article CAS Google Scholar
SMARTS – Language for Describing Molecular Patterns, Daylight Chemical Information Systems, Inc., Aliso Viejo, CA
Marshall GR, Barry CD, Bosshard HE, Dammkoehler RA, Dunn DA, In Olson EC, Christoffersen RE (eds) (1979) Computer-assisted drug design. American Chemical Society, Washington, DC, pp 205–226
Google Scholar
Beusen DD, Marshall GR, In Guner OF (ed) (2000) Pharmacophore perception, development, and use in drug design. International University Line, La Jolla, CA, pp 23–45
Google Scholar
Van Drie JH (1997) J Chem Inf Comput Sci 37:38
Article CAS Google Scholar
Patel Y, Gillet VJ, Bravi G, Leach AR (2002) J Comput-Aided Mol Design 16:653
Article CAS Google Scholar
Suling WJ, Reynolds RC, Barrow EW, Wilson LN, Piper JR, Barrow WW (1998) J Antimicrob Chemother 42:811
Article CAS Google Scholar
Suling WJ, Seitz LE, Pathak V, Westbrook L, Barrow EW, Zywno-Van-Ginkel S, Reynolds RC, Piper JR, Barrow W (2000) Antimicrob Agents Chemoth 44:2784
Article CAS Google Scholar
Debnath AK (2002) J Med Chem 45:41
Article CAS Google Scholar
Maestro 7.5 (2006) Schrodinger, LLC, New York, NY
World Drug Index (2001) Thomson Scientific
Wold H, In Gani J (ed) (1975) Perspectives in probability and statistics, Papers in Honour of Bartlett MS on the Occasion of His Sixty-Fifth Birthday, Academic Press, London, pp 117–142
Wold S, Ruhe H, Wold H, Dunn WJI (1984) SIAM J Scientific Stat Comput 5:735
Article Google Scholar

Download references

Author information

Authors and Affiliations

Schrödinger, Inc., 120 W. 45th St., 29th Floor, New York, NY, 10036, USA
Steven L. Dixon, Alexander M. Smondyrev, Eric H. Knoll, Shashidhar N. Rao, David E. Shaw & Richard A. Friesner
Department of Chemistry, Columbia University, New York, NY, 10027, USA
Eric H. Knoll & Richard A. Friesner
D E Shaw & Co, 120 W. 45th St., 39th Floor, New York, NY, 10036, USA
David E. Shaw

Authors

Steven L. Dixon
View author publications
You can also search for this author in PubMed Google Scholar
Alexander M. Smondyrev
View author publications
You can also search for this author in PubMed Google Scholar
Eric H. Knoll
View author publications
You can also search for this author in PubMed Google Scholar
Shashidhar N. Rao
View author publications
You can also search for this author in PubMed Google Scholar
David E. Shaw
View author publications
You can also search for this author in PubMed Google Scholar
Richard A. Friesner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steven L. Dixon.

Appendices

Appendix A: selectivity estimation

In PHASE, the selectivity of a pharmacophore hypothesis H is defined as follows:

$$ {\rm Selectivity}({\rm H})=-\log_{10} [p({\rm H})], $$

(A1)

where p(H) is the probability that a random drug-like molecule will match the hypothesis, irrespective of any activity exhibited by that molecule toward the biological target in question. Given a database of drug-like molecules, it is straightforward to search that database for matches to a hypothesis, and thereby arrive at an estimate of selectivity based on that particular sample population of molecules. However, application of such a procedure is far too time-consuming to be practical when scoring a large number of hypotheses, so a rapid means of estimating selectivity based on the physical characteristics of a hypothesis is sought.

Van Drie [12] has shown that selectivities of two-point pharmacophores can be reliably estimated with respect to a given database using pre-tabulated probabilities that cover discrete distance ranges. He went on to show that highly selective three-point pharmacophores can be constructed by combining two-point pharmacophores with the highest selectivities. This is a natural consequence of the fact that the probability of matching a k-point pharmacophore $\hbox{H}^{\langle k \rangle}$ is less than or equal to the probability of matching all (k· (k−1))/2 two-point pharmacophores embedded within $\hbox{H}^{\langle k \rangle}$:

$$ p({\rm H}^{\langle k \rangle})\le p\left(\bigcap\limits_{i < j\le k} {{\rm H}_{ij}^{\langle k \rangle}} \right) $$

(A2)

Strict equality is not preserved because a given molecule may match each of the two-point pharmacophores even if it fails to contain a single arrangement of k features that matches $\hbox{H}^{\langle k \rangle}$. Nevertheless, since matching the two-point pharmacophores is a necessary condition for matching $\hbox{H}^{\langle k \rangle}$, the right-hand-side of Eq. A2 is of interest for purposes of estimating selectivity.

If the two-point probabilities are independent, then the following relation holds:

$$ p\left(\bigcap\limits_{i < j\le k} {{\rm H}_{ij}^{\langle k \rangle}} \right)=\prod\limits_{i < j\le k} p({\rm H}_{ij}^{\langle k \rangle}) $$

(A3)

Further, if sites i and j are separated by a distance of d _ij, and their pharmacophore feature types are α(i) and α(j), respectively, then Eq. A3 can be rewritten in terms of probabilities of matching specific inter-feature distances to within a tolerance Δd:

$$ p\left(\bigcap\limits_{i < j\le k} {\rm H}_{ij}^{\langle k \rangle} \right)=\prod\limits_{i < j\le k} {p\left(d_{\alpha(i)\alpha(j)} \in [d_{ij} -\Delta d,\,d_{ij} +\Delta d]\right)} $$

(A4)

Given a population of drug like molecules and a pair of feature types x and y, there is a probability density p ^*(d _xy) that describes the distribution of xy pharmacophores within that population. While p ^*(d _xy) may be complex and possibly discontinuous, for purposes of estimating selectivity a simple Gaussian dependence is assumed, so that the probability density may be written as:

$$ p^{\ast} (d_{xy})=\frac{1}{\sigma_{xy} \sqrt{2\pi}}\exp \left[ -\frac{(d_{xy}-\mu_{xy})^{2}}{2\sigma_{xy}^{2}}\right] $$

(A5)

For small values of Δd, the following approximation can be made:

$$ p\left({d_{xy} \in [d-\Delta d,\,d+\Delta d]} \right) \approx \Delta d\cdot p^{\ast} (d_{xy})\left|_{d_{xy}=d} \right. $$

(A6)

Substituting A5 and A6 into A4 yields

$$ p\left(\bigcap\limits_{i < j\le k} {{\rm H}_{ij}^{\langle k \rangle}} \right)\approx \prod\limits_{i < j\le k} {\frac{\Delta d}{\sigma_{\alpha (i)\alpha (i)} \sqrt {2\pi}}\exp \left[ {-\frac{(d_{ij} -\mu_{\alpha (i)\alpha (j)})^{2}}{2\sigma_{\alpha (i)\alpha (j)}^{2}}} \right]} $$

(A7)

Taking logarithms,

$$ -\log_{10} \left[ {p\left({\bigcap\limits_{i < j\le k} {{\rm H}_{ij}^{\langle k \rangle}}} \right)} \right]\approx \sum\limits_{i < j\le k} {\left\{ {-\log_{10} \left[ {\frac{\Delta d}{\sigma_{\alpha (i)\alpha (i)} \sqrt {2\pi}}} \right]+\frac{(d_{ij} -\mu_{\alpha (i)\alpha (j)})^{2}}{\log (10)\cdot 2\sigma_{\alpha (i)\alpha (j)}^{2}}} \right\}} $$

(A8)

Although it is certainly possible to estimate the univariate parameters σ_{α (i)α (j)} and μ_{α (i)α (j)} for each possible pair of feature types, it is advantageous to treat the right-hand-side of Eq. A8 as a general polynomial in d _ij, and fit the associated coefficients to observed probabilities for a large number and variety of pharmacophores:

$$ -\log_{10} \left[ {p({\rm H}^{\langle k \rangle})} \right]\approx \sum\limits_{i < j\le k} {\left({A_{\alpha (i)\alpha (j)} +B_{\alpha (i)\alpha (j)} d_{ij} +C_{\alpha (i)\alpha (j)} d_{ij}^{2}} \right)} $$

(A9)

This treatment can help overcome certain deficiencies in the model, such as the assumption that the two-point probabilities are independent of each other (Eq. A3). In practice, the second-order terms in Eq. A9 do not add much statistically independent information to the model, and we have found a first-order approximation to be satisfactory:

$$ -\log_{10} \left[ {p({\rm H}^{\langle k \rangle})} \right]\approx \sum\limits_{i < j\le k} {\left({A_{\alpha (i)\alpha (j)} +B_{\alpha (i)\alpha (j)} d_{ij} } \right)} $$

(A10)

To determine appropriate values for the A and B parameters, a training set was assembled by randomly selecting 1000 minimized structures from a conformational database of the World Drug Index [36], then randomly choosing between two and seven pharmacophore sites from each structure. This yielded a training set of 1000 pharmacophores containing varying numbers of sites and different combinations of the features A, D, H, N, P, and R. A sample probability was computed for each pharmacophore H_λ by determining the number of structures M _λ out of the original 1000 that matched the pharmacophore to within a tolerance of 2.0 Å in all intersite distances:

$$ p\left({{\rm H}_\lambda} \right)\equiv \frac{M_\lambda}{1000} $$

(A11)

Since there were six types of features in the sampled pharmacophores, the number of unique feature pairs was 21, requiring a total of 42 adjustable parameters. No attempt was made to optimize all of these independently because of the possibility of only limited information for certain pairs of features. For example, pharmacophores that contain both negative and positive ionizable features tend to be very rare among drug-like structures, so they cannot be expected to be well-represented in a relatively small population sample. Therefore, parameter values were determined by applying a partial least-squares (PLS) procedure to fit the −log₁₀(H_λ) values in terms of latent factors constructed from the pool of 42 variables. Details of the PLS algorithm used in PHASE are provided in Appendix B.

To arrive at an appropriate number of PLS factors to include in the model, predictions were made for a test set of 500 pharmacophores drawn from the same sample population of 1000 WDI structures. As successively more PLS factors were incorporated into the model, test set errors trended downward until reaching a minimum at 23 factors. At this point, the test set RMSE was 0.372 log units and Q ² was 0.786. This compared to a training set RMSE of 0.343 and R ² of 0.826. This model has been integrated into PHASE for computation of the Selectivity_Score term that appears in Eq. 7.

It is worth noting that training sets containing as many as 5000 structures were also investigated, and no significant improvement in the test set predictions was observed. The protocol of using 1000 structures was adopted because it is far less computationally demanding, and therefore represents a practical approach for users who wish to calibrate selectivity models based on a different set of structures.

Appendix B: partial least-squares regression

PHASE utilizes a standard recursive procedure for extracting orthogonal latent factors from a data matrix in a predetermined number of steps. It is distinguished from the NIPALS algorithm [37, 38], which is an iterative approach with a user-defined stopping criterion, but no absolute control over the total number of steps.

Let X∈R ^{n × m} represent the independent variable data matrix for a training set of n observations and a pool of m variables. Let y∈R ^{n × 1} represent the training set dependent data, which will be estimated using latent factors extracted from X. Creation of the PLS regression model proceeds as follows:

Center each column of X:

$$\hbox{for }i = 1,\ldots,m \quad \mu_{i}^{x}=\frac{1}{n}\sum\limits_{k=1}^{n} {{\bf X}(k,i)} \quad \hbox{for }k=1,\ldots,n \quad\quad {\bf X}(k,i)\to {\bf X}(k,i)-\mu_{i}^{x} \quad \hbox{next }k \quad \hbox{next }i $$

Center y:

$$ \mu^{y}=\frac{1}{n}\sum\limits_{k=1}^{n} {{\bf y}(k)} \hbox{for }k=1,\ldots ,n \quad {\bf y}(k)\to {\bf y}(k)-\mu^{y} \quad \hbox{next }k $$

Determine PLS factors and regression coefficients for up to M PLS factors (M ≤ m):

$$ {\bf X}_{1} = {\bf X}\quad \hbox{for }i= 1,\ldots,M \quad\hbox{Compute the vector of weights that define PLS factor }i: \quad {\bf w}_{i} ={\bf X}_{i}^{\rm T} {\bf y}/ \vert{\bf X}_{i}^{\rm T} {\bf y}\vert \quad ({\bf w}_{i} \in {\bf R}^{m\times 1}) \quad \hbox{Project the rows of }{\bf X}_{i}\hbox{ onto factor }i: \quad{\bf t}_{i} ={\bf X}_{i} {\bf w}_{i} \quad\quad\quad ({\bf t}_{i} \in {\bf R}^{n\times 1}) \quad \hbox{Project }{\bf t}_{i}\hbox{ onto each column of }{\bf X}_{i}: \quad {\bf p}_{i} ={\bf X}_{i}^{\rm T} {\bf t}_{i}/ \vert {\bf t}_{i}^{\rm T} {\bf t}_{i} \vert \quad ({\bf p}_{i} \in {\bf R}^{m\times 1}) \quad \hbox{Compute the i}^{\rm th}\hbox{ PLS regression coefficient by projecting }{\bf t}_{i}\hbox{ onto }{\bf y}: \quad{\bf b}(i)={\bf t}_{i}^{\rm T} {\bf y}/ \vert {\bf t}_{i}^{\rm T} {\bf t}_{i} \vert \quad ({\bf b}\in {\bf R}^{M\times 1}) \quad\hbox{Orthogonalize }{\bf X}_{i}\hbox{ w.r.t. PLS factor }i: \quad{\bf X}_{i+1} ={\bf X}_{i}-{\bf t}_{i} {\bf p}_{i}^{\rm T} \hbox{next }i$$

For a regression with M PLS factors, the estimates ${\hat{{\bf y}}}$ are then given by:

$$ \hat{{\bf y}}(k)=\mu^{y}+\sum\limits_{i=1}^{M} {{\bf b}(i){\bf t}_{i} (k)}\quad k=1,\ldots,n $$

To apply the M-factor PLS model to a new set of ${\tilde{n}}$ observations with data matrix ${\tilde{{\bf X}}}$ , the regression coefficients b must first be translated back to the space of the original X variables:

Define

$$ \begin{array}{ll} {\bf W}\equiv \left[ {{\bf w}_{1} \ldots {\bf w}_{M}}\right] &({\bf W}\in {\bf R}^{m\times M})\\ {\bf P}\equiv \left[ {{\bf p}_{1} \ldots {\bf p}_{M}}\right] &({\bf P}\in {\bf R}^{m\times M})\\ {\bf b}^{x}\equiv {\bf W}\left({{\bf P}^{\rm T}{\bf W}} \right)^{-1}{\bf b} &({\bf b}^{x}\in {\bf R}^{m\times 1}) \end{array} $$

The coefficients b ^x may then be used to make estimates for the new observations as follows:

$$ \tilde{{\bf y}}(k)=\mu^{y}+\sum\limits_{i=1}^{M} \left[ {\tilde{{\bf X}}(k,i)-\mu_{i}^{x}}\right]{\bf b}^{x}(i) \quad k=1,\ldots, \tilde{n} $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dixon, S.L., Smondyrev, A.M., Knoll, E.H. et al. PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results. J Comput Aided Mol Des 20, 647–671 (2006). https://doi.org/10.1007/s10822-006-9087-6

Download citation

Received: 28 June 2006
Accepted: 17 October 2006
Published: 24 November 2006
Issue Date: October 2006
DOI: https://doi.org/10.1007/s10822-006-9087-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results

Summary

Access this article

Similar content being viewed by others

Robocrystallographer: automated crystal structure text descriptions and analysis

Software for molecular docking: a review

On the relevance of query definition in the performance of 3D ligand-based virtual screening

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: selectivity estimation

Appendix B: partial least-squares regression

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results

Summary

Access this article

Similar content being viewed by others

Robocrystallographer: automated crystal structure text descriptions and analysis

Software for molecular docking: a review

On the relevance of query definition in the performance of 3D ligand-based virtual screening

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: selectivity estimation

Appendix B: partial least-squares regression

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation