Scalable, Intuitive, Deep-Learning QSAR Models for Big Data Applications


Note that DeepAutoQSAR is occasionally referred to as DeepChem/AutoQSAR or AutoQSAR/DeepChem.

Identifying Quantitative Structure-Activity Relationships (QSAR) has been a powerful technique in researchers’ computational arsenal for decades. It’s widely used in lead optimization, ADME/Tox modeling, genotypic and phenotypic screening analysis, and many other applications. Schrödinger's AutoQSAR application democratizes the creation and application of QSAR models through automation, following the best practices QSAR modeling workflow to create best-of-breed predictive models without requiring detailed knowledge of machine learning methods or QSAR modeling [1].

Ever-larger public and private datasets have become available in recent years [2] including those of pharmaceutical interest such as ChEMBL, PubChem and ChemSpider repositories.  While AutoQSAR’s performance in creating predictive QSAR models has been good for datasets with fewer than approximately 5,000 compounds, it has not been possible to train QSAR models with AutoQSAR using larger datasets of hundreds of thousands of molecules or more due to the computational cost of its machine learning methods. However, deep learning methods have recently emerged as a powerful new approach for efficiently training predictive models on very large datasets — from image recognition to machine translation, deep learning has transformed the world of artificial intelligence [3].

The DeepChem project has been creating high-performance implementations of deep learning algorithms for drug discovery, quantum chemistry, materials science and biology [4,5]. Built on modern open-source deep learning frameworks such as Google’s TensorFlow framework [6], DeepChem is optimized for large datasets, and can produce highly predictive QSAR models from datasets with tens of thousands, or even hundreds of thousands of data points. Inspired by the DeepChem project, DeepAutoQSAR allows users to create and apply best-in-class deep learning QSAR models. DeepAutoQSAR performs equivalently to AutoQSAR on smaller datasets in predictive accuracy, but dramatically outperforms AutoQSAR in big-data scenarios. The creation of a deep-learning-based QSAR product, DeepAutoQSAR, enables users to create predictive QSAR models on datasets of hundreds of thousands of molecules or more.


The DeepAutoQSAR approach to QSAR is to create a neural network, often a graph convolutional neural network—similar to those that have achieved tremendous recent success in image and video processing, except that the “convolution” is applied to atoms, instead of pixels. For those unfamiliar with the terminology, a convolution is a “filter” matrix that is progressively slid (i.e. multiplied) across all pixel values in the input data; edge-detection and posterization are examples of image convolutions that are intuitively familiar to anyone who has used photo-editing software (Figure 1). Convolutional neural networks work by “learning” a large number of these kinds of convolutions (starting from random values), by fitting to an objective function using gradient-descent optimization. Thus, convolutional neural networks are highly parameterized statistical models that can discover patterns in input data using foundational mathematic techniques.

Figure 1: Edge detection is an example of a convolution. Every pixel in the input image (left) is multiplied by the filter matrix (middle), to produce the output image (right). Analogous methods are used by convolutional neural networks to “learn” features from input data.

Moving from image processing to the chemical realm, DeepAutoQSAR treats small molecules as a graph, where nodes are atoms and edges are bonds. Chemical features of the molecule (atom type, valence, charge, etc.) are attached to each node in the graph. The convolution, in this case, is to apply filters to neighboring atoms instead of (as in the case of images) neighboring pixels (Figure 2). 

Figure 2: Graph convolution on a small molecule. The feature set for an atom is a function of the sum of the feature sets for neighboring atoms [7].

Superior Big-data Performance

DeepAutoQSAR’s principal advantage over AutoQSAR lies in its ability to fully exploit large datasets. In Figure 3, the results of AutoQSAR and DeepAutoQSAR are presented in a way that demonstrates the advantage of additional training data: DeepAutoQSAR is used to train models for the full MUV and HIV benchmarks (~14.7k, ~40.4k points, respectively), as well as randomly selected 5,000-point subsets of each set (a threshold selected to represent the approximate limits of AutoQSAR). AutoQSAR and DeepAutoQSAR achieve similar performance for the smaller training sets, but the deep learning-based method achieves dramatically better results when trained on the full data set. DeepAutoQSAR is able to achieve a mean AUC of 0.72 for MUV, and 0.77 for the HIV benchmark, vs 0.50 and 0.68 for the AutoQSAR approach.

Figure 3: DeepAutoQSAR results are enhanced by larger datasets. The full MUV data set is 14,700 points; the HIV set is 40,426 points. 

Figure 4 shows a head-to-head comparison of DeepAutoQSAR and AutoQSAR on three multitask learning benchmarks: ToxCast (~1.5M points; 617 properties), Tox21 (~79.5k points; 12 properties) and MUV (~14.7k points; 17 properties). The ToxCast [9] and Tox21 [10] benchmarks were toxicity prediction assays, while the MUV benchmark is an unbiased decoy set designed to test virtual screening methods [11]. For these benchmarks, we compute ROC curves for each task, averaging the AUC for all tasks to present aggregate results for a benchmark. In each case, DeepAutoQSAR clearly outperforms the AutoQSAR method.

Figure 4: Head-to-head comparison of AutoQSAR and DeepAutoQSAR on large datasets. Dataset sizes: Tox21 (79,573 points); ToxCast (1,533,411 points); MUV (14,700 points).

Equivalent to AutoQSAR on Smaller Datasets

While DeepAutoQSAR shines in big-data scenarios, it also performs on par with AutoQSAR when using smaller training sets. In Figure 5, the results of 13 different tests taken from the original AutoQSAR paper are shown, along with DeepAutoQSAR performance on the same data. These datasets all have under 5000 data points. While both methods outperform on specific tasks, on average, the deep learning approach performs equivalently to AutoQSARl: whether weighted by task or data, AutoQSAR and DeepAutoQSAR have statistically indistinguishable performance (Table 1).

Figure 5: Comparison of AutoQSAR and DeepAutoQSAR performance on test sets from the original AutoQSAR paper [1].


Table 1: Aggregate summary of Q2 statistics from Figure 5.

Weighted by task0.61 ± 0.220.62 ± 0.20
Weighted by data0.73 ± 0.200.75 ± 0.19


DeepAutoQSAR employs cutting edge deep learning methods, enabling non-expert practitioners to easily create high-performance QSAR models using much larger datasets than practical with AutoQSAR alone. Performance of DeepAutoQSAR is superior when large datasets are available and on par with AutoQSAR when trained on smaller datasets (<5000 molecules). These models are enhanced by large input data sets, while still delivering aggregate results on par with AutoQSAR on smaller amounts of data. Thus, DeepAutoQSAR is the ideal solution for QSAR modeling in big-data scenarios.


  1. Dixon, S.L. et al. “AutoQSAR: an automated machine learning tool for best-practice quantitative structure-activity relationship modeling.” Future. Med. Chem. 2016, 8, 1825-1839.
  2. Cherkasov A. et al. “QSAR modeling: where have you been? Where are you going to?” J. Med. Chem. 2014, 57(12), 977-5010.
  3. LeCun, Y. et al. “Deep Learning.” Nature. 2015, 521, 436–444.
  4. DeepChem project.
  5. Zhenqin, W. et al. “MoleculeNet: A Benchmark for Molecular Machine Learning.” .
  6. Abadi, M. et al. “TensorFlow: Large-scale machine learning on heterogeneous systems” 2015,
  7. Altae-Tran, H. et al. “Low Data Drug Discovery with One-Shot Learning.” ACS Central Science, 2017, 3(4), 283-293.
  8. Duvenaud, D. et al. “Convolutional Networks on Graphs for Learning Molecular Fingerprints.”
  9. EPA ToxCast data.
  10. NIH Tox21 grand challenge.
  11. Rohrer, S.G. et al. “Maximum unbiased validation (MUV) data sets for virtual screening based on Pub Chem bioactivity data.” J. Chem. Inf. Model. 2009, 49(2), 169-84.
Back To Top