# AutoQSAR/DeepChem

Scalable, Intuitive, Deep-Learning QSAR Models for Big Data Applications

**Background**

Identifying Quantitative Structure-Activity Relationships (QSAR) has been a powerful technique in researchers’ computational arsenal for decades. It’s widely used in lead optimization, ADME/Tox modeling, genotypic and phenotypic screening analysis, and many other applications. Schrödinger's AutoQSAR application democratizes creation and application of QSAR models through automation, following a best practices QSAR modeling workflow to create best-of-breed predictive models without requiring detailed knowledge of machine learning methods or QSAR modeling [1].

Ever-larger public and private datasets have become available in recent years [2] including those of pharmaceutical interest such as ChEMBL, PubChem and ChemSpider repositories. While AutoQSAR’s performance in creating predictive QSAR models has been good for datasets with fewer than approximately 5,000 compounds, it has not been possible to train QSAR models with AutoQSAR using larger datasets of hundreds of thousands of molecules or more, due to the computational cost of its machine learning methods. However, deep learning methods have recently emerged as a powerful new approach for efficiently training predictive models on very large datasets — from image recognition to machine translation, deep learning has transformed the world of artificial intelligence [3].

The DeepChem project has been creating high-performance implementations of deep learning algorithms for drug discovery, quantum chemistry, materials science and biology [4,5]. Built upon Google’s TensorFlow framework [6], DeepChem is optimized for large datasets, and can produce highly predictive QSAR models from datasets with tens of thousands, or even hundreds of thousands of data points. By integrating DeepChem technology with AutoQSAR, a new feature in AutoQSAR to create and apply deep learning QSAR models has been developed. This feature performs equivalently to AutoQSAR/Traditional on smaller datasets in predictive accuracy, but dramatically outperforms AutoQSAR/Traditional in big-data scenarios. The addition of a deep-learning-based AutoQSAR method enables users to create predictive QSAR models on datasets of hundreds of thousands of molecules or more.

**Approach**

The DeepChem approach to QSAR is to create a convolutional neural network — similar to those that have achieved tremendous recent success in image and video processing, except that the “convolution” is applied to atoms, instead of pixels. For those unfamiliar with the terminology, a convolution is a “filter” matrix that is progressively slid (i.e. multiplied) across all pixel values in the input data; edge-detection and posterization are examples of image convolutions that are intuitively familiar to anyone who has used photo-editing software (Figure 1). Convolutional neural networks work by “learning” a large number of these kinds of convolutions (starting from random values), by fitting to an objective function using gradient-descent optimization. Thus, convolutional neural networks are highly parameterized statistical models that can discover patterns in input data using relatively simple mathematical techniques.

**Figure 1:** *Edge detection is an example of a convolution. Every pixel in the input image (left) is multiplied by the filter matrix (middle), to produce the output image (right). Analogous methods are used by convolutional neural networks to “learn” features from input data. *

Moving from image processing to the chemical realm, DeepChem treats small molecules as a graph, where nodes are atoms and edges are bonds. Chemical features of the molecule (atom type, valence, charge, etc.) are attached to each node in the graph. The convolution, in this case, is to apply filters to neighboring atoms instead of (as in the case of images) neighboring pixels (Figure 2).

**Figure 2:** *Graph convolution on a small molecule. The feature set for an atom is a function of the sum of the feature sets for neighboring atoms [7].*

**Superior Big-data Performance**

AutoQSAR/DeepChem’s principal advantage over AutoQSAR/Traditional lies in its ability to fully exploit large datasets. In Figure 3, the results of traditional and AutoQSAR/DeepChem are presented in a way that demonstrates the advantage of additional training data: AutoQSAR/DeepChem is used to train models for the for the full MUV and HIV benchmarks (~14.7k, ~40.4k points, respectively), as well as randomly selected 5,000-point subsets of each set (a threshold selected to represent the approximate limits of AutoQSAR/Traditional). AutoQSAR/Traditional and AutoQSAR/DeepChem achieve similar performance for the smaller training sets, but the deep learning-based method achieves dramatically better results when trained on the full data set. AutoQSAR/DeepChem is able to achieve a mean AUC of 0.72 for MUV, and 0.77 for the HIV benchmark, vs 0.50 and 0.68 for the traditional approach.

**Figure 3:** *AutoQSAR/DeepChem results are enhanced by larger datasets. The full MUV data set is 14,700 points; the HIV set is 40,426 points. *

Figure 4 shows a head-to-head comparison of AutoQSAR/DeepChem and AutoQSAR/Traditional on three multitask learning benchmarks: ToxCast (~1.5M points; 617 properties), Tox21 (~79.5k points; 12 properties) and MUV (~14.7k points; 17 properties). The ToxCast [9] and Tox21 [10] benchmarks were toxicity prediction assays, while the MUV benchmark is an unbiased decoy set designed to test virtual screening methods [11]. For these benchmarks, we compute ROC curves for each task, averaging the AUC for all tasks to present aggregate results for a benchmark. In each case, AutoQSAR/DeepChem clearly outperforms the traditional method.

**Figure 4: ***Head-to-head comparison of AutoQSAR/Traditional and AutoQSAR/DeepChem on large datasets. Dataset sizes: Tox21 (79,573 points); ToxCast (1,533,411 points); MUV (14,700 points).*

**Equivalent to AutoQSAR on Smaller Datasets**

While AutoQSAR/DeepChem shines in big-data scenarios, it also performs on par with AutoQSAR/Traditional when using smaller training sets. In Figure 5, the results of 13 different tests taken from the original AutoQSAR paper are shown, along with AutoQSAR/DeepChem performance on the same data. These datasets all have under 5000 data points. While both methods outperform on specific tasks, on average, the deep learning approach performs equivalently to AutoQSAR/Traditional: whether weighted by task or data, AutoQSAR/Traditional and AutoQSAR/DeepChem have statistically indistinguishable performance (Table 1).

**Figure 5:** *Comparison of AutoQSAR/Traditional and AutoQSAR/DeepChem performance on test sets from the original AutoQSAR paper [1].*

**Table 1: ***Aggregate summary of Q ^{2} statistics from Figure 5.*

AutoQSAR/Traditional | AutoQSAR/DeepChem | |
---|---|---|

Weighted by task | 0.61 ± 0.22 | 0.62 ± 0.20 |

Weighted by data | 0.73 ± 0.20 | 0.75 ± 0.19 |

**Conclusion**

AutoQSAR/DeepChem employs cutting edge deep learning methods within the AutoQSAR framework, enabling non-expert practitioners to easily create high-performance QSAR models using much larger datasets than practical with AutoQSAR/Traditional alone. Performance of AutoQSAR/DeepChem is superior when large datasets are available and on par with AutoQSAR/Traditional when trained on smaller datasets (<5000 molecules). These models are enhanced by large input data sets, while still delivering aggregate results on par with AutoQSAR/Traditional on smaller amounts of data. Thus, AutoQSAR/DeepChem is the ideal solution for QSAR modeling in big-data scenarios.

**References**

- Dixon, S.L. et al. “AutoQSAR: an automated machine learning tool for best-practice quantitative structure-activity relationship modeling.” Future. Med. Chem. 2016, 8, 1825-1839.
- Cherkasov A. et al. “QSAR modeling: where have you been? Where are you going to?” J. Med. Chem. 2014, 57(12), 977-5010.
- LeCun, Y. et al. “Deep Learning.” Nature. 2015, 521, 436–444.
- DeepChem project. https://deepchem.io/
- Zhenqin, W. et al. “MoleculeNet: A Benchmark for Molecular Machine Learning.” https://arxiv.org/abs/1703.00564 .
- Abadi, M. et al. “TensorFlow: Large-scale machine learning on heterogeneous systems” 2015, https://tensorflow.org.
- Altae-Tran, H. et al. “Low Data Drug Discovery with One-Shot Learning.” ACS Central Science, 2017, 3(4), 283-293.
- Duvenaud, D. et al. “Convolutional Networks on Graphs for Learning Molecular Fingerprints.” https://arxiv.org/abs/1509.09292.
- EPA ToxCast data. https://www.epa.gov/chemical-research/toxicity-forecaster-toxcasttm-data.
- NIH Tox21 grand challenge. https://tripod.nih.gov/tox21/challenge/about.jsp.
- Rohrer, S.G. et al. “Maximum unbiased validation (MUV) data sets for virtual screening based on Pub Chem bioactivity data.” J. Chem. Inf. Model. 2009, 49(2), 169-84.