Automated creation and application of predictive QSAR models following best practices
The Advantage of QSAR
Identifying Quantitative Structure-Activity Relationships (QSAR) has been a powerful technique in researchers’ computational arsenal for decades. It’s widely used in lead optimization, ADME/Tox modeling, genotypic and phenotypic screening analysis, and many other applications. However, the creation of high-quality QSAR models has traditionally required significant QSAR expertise and can be labor intensive.
AutoQSAR democratizes creation and application of QSAR models through automation, following a best practices QSAR modeling workflow. With AutoQSAR, high-quality, predictive QSAR models can be created and employed with confidence by QSAR experts and non-experts alike. The best practices workflow includes descriptor generation, feature selection, creation of a large number of QSAR models from several methods including kernel-based partial least squares, naive bayes, and ensemble-based recursive partitioning with different training/test set splits, and ranking of QSAR models by performance. Predictions can be made from a consensus of the best models or from a particular model.
Not only does AutoQSAR takes the guesswork out of creating a QSAR model, an estimate of the domain of applicability provides a yes/no indication of whether to trust a model’s predictions. AutoQSAR can evolve and grow with a drug discovery project. It is easy to connect to existing cheminformatics platforms and facilitates refinement of models as projects are ongoing, leading to improved prediction accuracy as more data becomes available.
AutoQSAR takes 1D, 2D, or 3D structural data as input and a desired property to be modeled either as continuous or categorical, and automatically computes descriptors and fingerprints, create QSAR models with multiple machine learning statistical methods, and evaluates each QSAR model for predictive accuracy. Predictions can be made as a consensus of the best QSAR models or from a single QSAR model.
No descriptor limitations
Provide your own descriptors in CSV format to be used in addition to or instead of those generated by AutoQSAR. This opens a wide range of QSAR applications beyond small molecules. For example, AutoQSAR using protein descriptors has been used to predict properties such as protein solubility and viscosity.
Easily integrates into informatics platform
AutoQSAR integrates on a process level into informatics platforms to facilitate the creation and application of QSAR models. As new data becomes available, QSAR models can be automatically regenerated leading to improved accuracy and applicability.
Employs QSAR best practices
AutoQSAR embodies QSAR best practices from the literature included the OECD recommendations. This minimizes the likelihood of overfitting or misrepresenting a model’s performance while ensuring maximum predictive model performance through features such as the use of consensus predictions.
Applicability domain estimate
AutoQSAR estimates each QSAR model’s applicability domain using structural similarity among the training set and returns a yes/no indication for each prediction whether it lies inside or outside the applicability domain. This provides confidence when making predictions from QSAR models.
Results from AutoQSAR analyses may be visualized and analyzed in Maestro, enabling and encouraging further experiments. This makes it easy for QSAR experts to learn what worked and didn’t work quickly from AutoQSAR to save time creating models manually.
Citations and Acknowledgements
Schrödinger Release 2016-4: AutoQSAR, Schrödinger, LLC, New York, NY, 2016.
ö Dixon, S.L.; Duan, J.; Smith, E.; Von Bargen, C.D.; Sherman, W.; Repasky, M.P., "AutoQSAR: an automated machine learning tool for best-practice QSAR modeling," Future Med. Chem., 2016, 8 (15), 1825-1839