Machine Learning & QSPR for Materials


There is pressing need for rapid, informatics-based predictive models to extend the scale of materials optimization and discovery, extracting property limits and design rules, complementing and extending costly experimental measurements and computationally intensive atomic-scale simulation. Statistical correlations and learned models (e.g. regression models, neural networks, Bayes models, etc.) developed from structure derived binary fingerprints and simulation predicted properties, or experimental data allows the rapidly exploration of vast chemical space to advise experimental efforts. With these approaches, variations in structure and composition defining candidate materials libraries of tens (to even hundreds) of thousands systems can be evaluated.1 Schrödinger’s Materials Science Suite includes a fully integrated, powerful cheminformatics platform Canvas2, providing a wide range of physicochemical descriptors, 400+ topological and functional group descriptors, binary and proprietary dendritic fingerprints, and extensive learned models; including Kernel-based partial least square (KPLS) regression.3

Many materials properties, such as phase transition behavior, optical response of solids, and dynamic properties of complex fluids, often bear complex physicochemical origins that represent methodological challenges or cannot be rapidly estimated by an explicit in silico prediction. An illustrative example, is the prediction of the glass transition temperature (Tg) for amorphous solids. Quantitative structure-property relationship (QSPR) models for Tg can be derived from simulated or experimental data, enabling nearly instantaneous property prediction for systems of interest.4,5


Figure 1. Scatter plot of KPLS model predictions of glass transition temperature (Tg).


A large set of 250 organic electronic compounds with known experimental Tg, was used to construct a KPLS predictive model. The model was built for 200 randomly selected training set data for Tg (Figure 1, red data). In an effort to extract a chemical motif-based structure-property relationship, dendritic fingerprints were used for each compound, along with other topological descriptors. As shown in Figure 1, the model demonstrates high correlation between experimental and predicted Tg (R2 = 0.9), indicating strong correlation between structure and Tg. The KPLS model is validated against the remainder of the test set (fifty “unknown-to-model” compounds, blue data in Figure 1) providing high Q2 values (0.7-0.8).


The fast turnaround of model building and validation assisted by the advanced GUI-based workflows in Canvas (usually finished within minutes), enables users to test out a wide variety of potential structure-property relationships without investing significant time and resources. For applications in which morphological stability is a key property required for performance, cheminformatics based models can greatly accelerate the evaluation process for candidate materials.

Another illustrative use-case is QSPR-based prediction of polymer properties. Traditionally the approach has been focused around group contribution methods and/or utilization of topological descriptors such as connectivity indices to build simple multi-linear regression schemes.6 With the advent of the hashed binary fingerprint technology for the structural description as well as state-of-the-art regression methods proven to be successful in both materials science and life science, however, constructing robust modern-grade prediction models for a wide variety of known properties for macromolecular systems has been made possible for general users. Figure 2 illustrates the predictive capability of KPLS regression models for polymer properties, fit against the property tables in the pioneering reference book by Josef Bicerano.6

Figure 2. Example dendritic fingerprint/KPLS regression models fit for various polymer properties using Canvas.


The combination of the cheminformatics techniques provided in Canvas, along with the atomic scale simulation tools in the Schrödinger’s Materials Science Suite enables the development of new highly efficient and accurate hybrid approaches. In such a hybrid scheme, predictions from low-cost explicit materials simulation techniques are adjusted by a QSPR-based correction scheme to provide efficient and reliable property predictions.

Figure 4. Illustration of high-throughput screening based on QM-trained hybrid SQM-KPLS model for phosphorescent organic optoelectronic materials.


Figure 4 shows use of various levels of quantum mechanical methods can be combined with the machine learning technology to arrive at a new materials discovery framework for novel display technology solutions. By combining semiempirical quantum mechanics (SQM), first-principles density functional theory (DFT), and fingerprint-based KPLS regression modeling scheme, one can build a hybrid predictive model which can predict the key optoelectronic properties for thousands of candidate light-emitting materials with the accuracy of DFT, for the cost of SQM.

Read More


  1. T. Le, V.C. Epa, F.R. Burden and D.A. Winkler, “Quantitative Structure–Property Relationship Modeling of Diverse Materials Properties”, Chem. Rev., 112, 2889 (2012).
  2. J. Duan, S.L. Dixon, J.F. Lowrie and W. Sherman, "Analysis and Comparison of 2D Fingerprints: Insights into Database Screening Performance Using Eight Fingerprint Methods", J. Molec. Graph. Model., 29, 157 (2010).
  3. Y. An, W. Sherman and S.L. Dixon, “Kernel-based Partial Least Squares: Application to Fingerprint-based QSAR with Model Visualization”, J. Chem. Info. Model., 53(9), 2312 (2013).
  4. S. Yin, Z. Shuai and Y. Wang, “A Quantitative Structure-Property Relationship Study of the Glass Transition Temperature of OLED Materials”, J. Chem. Info. Model., 43, 970 (2003).
  5. J. Xu and B. Chen, “Prediction of Glass Transition Temperatures of OLED Materials using Topological Indices”, J. Mol. Model., 12, 24 (2005).
  6. J. Bicerano, "Prediction of Polymer Properties, revised and expanded third edition", Marcel Dekker, New York (2002).