Hit to development candidate in 10 months: Rapid discovery of a novel, potent MALT1 inhibitor

Hit to development candidate in 10 months: Rapid discovery of SGR- 1505, a novel, potent MALT1 inhibitor

Digital chemistry platform provides scale and accuracy to drive high precision molecular design

8.2billion

compounds computationally evaluated

78

total compounds synthesized in lead series

10months

To discovery of development candidate

Target
MALT1, protease
Program Type
Schrödinger proprietary program, small molecule
Indication
Relapsed or refractory B-cell lymphoma, chronic lymphocytic leukemia
Stage
Phase 1 clinical trial

“The ability to leverage the computational platform to rapidly identify not just one, but several novel, highly potent series with well-balanced properties is unique in my many years experience in industry.”

Zhe Nie
Project Lead, Executive Director, Medicinal Chemistry,
Schrödinger Therapeutics Group

Design challenge

Mucosa-associated lymphoid tissue lymphoma translocation protein 1 (MALT1) is a genetically validated target for the treatment of diseases associated with lymphocyte regulation. MALT1 consists of three domains: a paracaspase protease domain, an Ig3 domain, and a linking helix. First generation MALT1 inhibitors consisted of large peptidomimetics targeting the protease domain; due to their poor drug-like properties, none made it into the clinic. Second generation MALT1 inhibitors targeting an allosteric region at the interface of the caspase-like and Ig3 domains have been more successful, resulting in a clinical stage compound.

Significant challenges exist in optimizing the properties of second generation MALT1 inhibitors, specifically permeability, efflux, and solubility, while maintaining on-target potency. The aim of this program was to discover a potent inhibitor with good overall drug-like properties to support combinations with standard of care agents for treatment of relapsed or refractory B-cell malignancies.

Scale and accuracy of digital assays drives efficient DMTA cycles

Finding a novel molecule with the right balance of on-target affinity and desired physicochemical properties is the essential challenge of every drug discovery program. In principle, increasing the number of rationally designed compounds assessed across these various properties increases the odds of success. Designing molecules in silico — with the speed and accuracy to traverse billions of molecules — is the guiding ethos of Schrödinger’s digital chemistry strategy. Specifically, this project combines rigorous physics-based modeling with machine learning (ML), predictive ADMET models, and data analytics to search and triage a chemical space consisting of more than 8B compounds. Ultimately, execution of this strategy enabled the identification of multiple novel series.

First, the team performed structure-activity relationship (SAR) analysis of existing chemical matter, followed by computational assessment of the allosteric binding site using WaterMap. As a result, the team identified a number of displaceable highenergy water molecules in regions of the binding site that provided an opportunity to gain potency while exploring different chemotypes.1 Schrödinger’s drug discovery team used this information to drive the evaluation of billions of compounds via a De Novo Design strategy for iterative large-scale design and scoring. This strategy included synthetically-aware, reaction-based enumeration, crowdsourced medicinal chemistry ideation, and FEP+ for free energy perturbation modeling. The accuracy and utility of FEP+ as a computational assay for the prediction of relative binding energies of molecules has been validated extensively, generating predictions within one kcal/mol of experimental values on average.2 By combining FEP+ with high performance cloud computing and machine learning (Active Learning FEP+), over 1,700 molecules were evaluated in the first three months of the project. All ideas and corresponding modeled data crowdsourced by the team were captured and analyzed with LiveDesign, a best-in-class, modeling-enabled collaborative enterprise platform for real-time project ideation (Figure 1). In less than three months, with fewer than 50 total compounds synthesized, the team was able to identify two novel and distinct series of highly potent MALT1 inhibitors, affording progression to in vivo testing.

Figure 1: Modeling strategy and design-predict-make-test-analyze (DPMTA) cycle employed for MALT1 inhibitor program, in which development candidate SGR-1505 was discovered in 10 months.

Overcoming the MPO challenge by tuning potency, solubility, and permeability simultaneously

Once potent chemical series were identified, the team focused on tuning physicochemical properties to meet the target product profile (TPP). They employed a multiparameter optimization (MPO) scoring system to triage molecules rapidly based on their predicted ability to satisfy the TPP. Calculation of the MPO score was based on values derived from predictive models for solubility, permeability, and potency. Using this strategy the design team assessed over 5,000 ideas and identified 43 compounds that met the program’s criteria. A handful progressed to synthesis and experimental testing, reducing cost and time significantly.

Within 10 months and a total of 78 compounds synthesized in the lead series (and 129 compounds program wide), the project team identified a potential best-in-class MALT1 inhibitor with balanced properties and on-target activity, SGR-1505 (Figure 2). In June 2025, SGR-1505 was observed in its ongoing Phase 1, open-label, dose-escalation study to have a favorable safety profile and was well tolerated, with encouraging preliminary efficacy in patients with relapsed/refractory B-cell malignancies.4 Responses were observed across a broad range of B-cell malignancies, including monotherapy responses in patients with chronic lymphocytic leukemia (CLL) and Waldenström macroglobulinemia (Figure 3).5

Figure 2: Comparison of SGR-1505 with competitor’s MALT1 inhibitor. *Structure of JNJ-6633 first disclosed by Tianbao Lu at 2021 Spring ACS. All competitor data is internally generated by contract research organizations. Yin et al., ASH 2023.
Figure 3: Initial results of the SGR-1505 Phase 1 study showing encouraging preliminary efficacy across a range of B-cell malignancies including chronic lymphocytic leukemia/small lymphocytic leukemia (CLL/SLL), marginal zone lymphoma (MZL), and Waldenström macroglobulinemia (WM).

Enabling digital technologies to drive discovery programs

FEP+

Digital assay for predicting protein-ligand binding across broad chemical space at an accuracy matching experimental methods.

De Novo Design Workflow

Ultra-large scale chemical space exploration combining multiple compound enumeration strategies with an advanced filtering cascade.

WaterMap

Calculation of the positions and energies of water sites in a protein binding pocket.

LiveDesign

Collaborative enterprise informatics platform for centralizing access to virtual and wet lab project data and powerful computational predictions.

References

  1. Calculating water thermodynamics in the binding site of proteins – Applications of WaterMap to drug discovery.

    Cappel et al. Curr. Top. Med. Chem. 2017, 17(23), 2586-2598.

  2. Advancing drug discovery through enhanced free energy calculations.

    Abel et al. Acc. Chem. Res. 2017, 50(7), 1625–1632.

  3. Characterization of potent paracaspase MALT1 inhibitors for hematological malignancies.

    Yin et al. ASH Presentation 2021.

  4. Schrödinger reports encouraging initial Phase 1 clinical data for SGR-1505 at EHA Annual Congress.

    Schrödinger. 2025.

  5. A Phase 1 study of SGR-1505, an oral, potent, MALT1 inhibitor for relapsed/refractory (R/R) B-cell malignancies, including chronic lymphocytic leukemia/small lymphocytic leukemia (CLL/SLL).

    Spurgeon, et al. European Hematological Association Annual Congress. 2025.

Software and services to meet your organizational needs

Industry-Leading Software Platform

Deploy digital drug discovery workflows using a comprehensive and user-friendly platform for molecular modeling, design, and collaboration.

Modeling Services

Leverage Schrödinger’s team of expert computational scientists to advance your projects through key stages in the drug discovery process.

Scientific and Technical Support

Access expert support, educational materials, and training resources designed for both novice and experienced users.

Schrödinger solutions for small molecule protonation state enumeration and pKa prediction

Schrödinger solutions for small molecule protonation state enumeration and pKa prediction

Executive Summary

The pKa of a drug is a key physicochemical property to consider in the drug discovery process given its importance in determining the ionization state of a molecule at physiological pH. Schrödinger provides several solutions for predicting pKa values, protonation state distribution and derived properties that can be applied across a range of drug discovery stages, from screening through lead optimization. Here we provide an overview of each technology solution and use case examples of how they can be applied in drug discovery.

 

Background

Small molecules can undergo ionization in solution where they either lose or gain protons (H+) at different ionizing sites. The measure of the propensity of a site or molecule to ionize by the association/dissociation of one or more protons is quantified by a pKa value. If the pKa value refers to a particular site ionizable site the value is a microscopic pKa (micro-pKa), and it is a macroscopic pKa (macro-pKa) if the value refers to the entire molecule. The specific arrangement of protons around the ionizing sites constitutes a protonation state, and different protonation states of the same charge level are called tautomers. Each protonation state is in thermodynamic equilibrium with the others and therefore has a free energy associated with its population within this collection of protonation states, which may be derived either from micro-pKa values through thermodynamic equations or obtained directly by comparing the free energies of the states. In drug design, understanding the different protonation states of a molecule is critical, since they will drive properties including solubility, membrane permeability, and activity.

 

Challenges of pKa Prediction

Determining which states predominate at a given pH and by how much is a challenging task both experimentally and computationally because the number of states that are all in thermodynamic equilibrium grows ~2n with the number, n, of singly protonatable sites. Thus, molecules with many titratable sites can potentially have a large number of different protonation states, all of which need to be enumerated and energetically scored.

Computationally, Schrödinger uses two main approaches to score states: 1) through evaluating thermodynamic equilibrium equations with micro-pKa values, and 2) directly predicting the states’ relative free energies. Predicting pKa values is an important step to calculating state distributions, which in turn enables prediction of important related quantities that would otherwise be inaccessible.

Figure 1: Relationships between macro-pKa, micro-pKa, protonation states, and tautomers and the corresponding speciation diagram.

 

Overview of Schrödinger Solutions

Epik Classic

Epik Classic, previously known simply as Epik1, is an expert system for rapidly and accurately predicting the micro-pKa values and the most populated protonation states for a ligand at a given pH. The underlying pKa prediction technology is the empirical Hammett-Taft linear free energy relationship (LFER), which identifies an ionizing group, takes its root pKa value, perturbs it by the bonded chemical fragments, and applies charge spreading to arrive at its effective micro-pKa value. Epik Classic then uses the predicted pKa values to enumerate a ligand’s protonation states, rank them by energy, and then return the most populated states. Because Epik Classic uses SMARTS patterns-based rules, it is fast enough for high-throughput, although at the expense of being unaware of both conformational and stereochemical effects.
 

Epik 7

Epik 7 is a complete redesign of Epik that leverages Schrödinger’s powerful machine learning (ML) technology for more accurate results across broader chemical space. Ionizing groups are initially identified by SMARTS patterns and are then used to enumerate the protonation states for a range of ionizations.2 The micro-pKa values of each site in each state are predicted with 3-layer atomic graph convolutional neural networks (GCNNs) extending out radially six bonds from the ionizing atom. The predicted pKa values for the states are then used to predict the relative energies of the states to both allow determination of the most populated states at a pH and calculation of macro-pKa values. The topological nature of the ML approach means that Epik 7, like previous versions, is rapid but agnostic to 3D geometry and stereochemistry.
 

Jaguar pKa

Jaguar pKa takes a third, more physics-based approach to predicting micro-pKa values for a ligand. This workflow calculates the pKa values at the user-defined ionizing sites in a query ligand by first generating the conjugate pair, on which are then executed conformational searches to locate the lowest energy structures,3 followed by density functional theory (DFT) based geometry optimizations and single-point energy evaluations. These resulting conformationally-averaged, “raw” micro-pKa values are then corrected using empirically-parametrized relationships to give accurate predictions. Jaguar pKa performs best on non-tautomerizable structures. Being physics-based, it does take into account geometric and stereochemical effects, but at the expense of speed.
 

Macro-pKa

Macro-pKa follows the same philosophy as Jaguar pKa by combining physics-based DFT calculations with empirical corrections, but extends its applicability to enable calculation of tautomerizable ligands. Macro-pKa automatically identifies ionizing sites, enumerates the protonation states, and calculates the micro-pKa values following a similar workflow to Jaguar pKa, but with an enhanced scheme for generating empirical corrections. Finally, the calculated micro-pKa values are used to rank the protonation states by energy, return the most populated states for a user-supplied pH, and determine the macro-pKa values for the ligand. The exhaustiveness of this approach comes at a larger time and resource cost than Jaguar pKa.

 

Use Cases

Here we outline several use cases for pKa prediction in the drug discovery workflow.
Note: Each use case example outlines below could be approached with any of the listed solutions within that section. The dataset presented highlights the applicability of just one of the possible solutions.

I. Querying microscopic pKa values

Applicable Solutions

  • Epik Classic
  • Epik 7
  • Jaguar pKa

When investigating the binding modes of a ligand, the micro-pKa value of an ionizing site is an indicator of the propensity for it to become ionized at a given pH. The ionization state of the ligand directly influences how it interacts with another molecule such as a protein, e.g., whether or not it can participate in a salt bridge.

Figure 2: Jaguar pKa micro-pKa predictions for a dataset of small molecules.

II. Querying apparent or macroscopic pKa values

Applicable Solutions

  • Epik 7
  • Macro-pKa

For monoprotic or polyprotic compounds with a single dominant tautomer at each charge level, micro-pKas may very closely match the apparent or macro-pKa value that is most commonly obtained through titration experiments. However, for compounds or ionization states with multiple competitive tautomers, the micro-pKa value of a single tautomer may not fully reproduce the experimentally observed macroscopic value. To obtain this apparent value, all states’ must first be enumerated and evaluated so that all their micro-pKa values are considered in the macro-pKa calculation.

Figure 3: Macro-pKa macro-pKa predictions for a dataset of tautomeric molecules.

III. Ligand preparation and high-throughput screening

Applicable Solutions

  • Epik Classic
  • Epik 7

Physics-based simulations typically require specification of all atoms in the simulation system, including all hydrogen atoms. Thus, structure-based simulations including Glide docking, molecular dynamics, and free energy perturbation with FEP+ should be performed using an ensemble of the highly-populated protonation states of a ligand. Therefore, a crucial first step in any structure-based screen of a small molecule ligand library is to prepare the ligands by obtaining the most populated protonated states. Epik Classic and Epik 7 are integrated with our automated ligand preparation workflow, LigPrep, to allow preparation of large ligand libraries for high-throughput screening. Additionally, both Epik Classic and Epik 7 and their LigPrep implementations allow for the generation and scoring of additional states that may potentially bind to metal ions in the pocket.

Figure 4: Epik Classic micro-pKa predictions for a dataset of 152 drug molecules

IV. Hit-to-lead optimization

Applicable Solutions

  • Epik Classic
  • Epik 7

Once hits are identified, a series of analogs are synthesized to explore the relevant chemical space in greater detail to arrive at improved behavior. It is important to be able to screen potential candidates rapidly and accurately to assess which to optimize further. The < 0.5 log unit accuracy and sub-second calculation speed of Epik Classic and Epik 7 make them excellent tools for rapid idea generation and testing. In addition to pKa value and protonation state distribution prediction, they have been implemented in other ADMET or property predictors, such as for membrane permeability and solvation energy.

Figure 5 Epik 7 macro-pKa predictions for a dataset of congeneric tricyclic thrombin inhibitors.

V. Early-stage lead optimization

Applicable Solutions

  • Epik 7
  • Jaguar pKa
  • Macro-pKa

Optimizing the many physical characteristics required can be laborious and costly, from ideation, through synthesis and assay. In this environment, where high quality property predictions are required and time permits, Schrödinger’s physics-based predictors, Jaguar pKa and Macro-pKa, take into account more molecular characteristics, including conformational and stereochemical effects to improve pKa prediction accuracy. Additionally, Macro-pKa and Epik 7 both offer detailed speciation reports for a queried ligand. These are especially helpful for understanding the distribution of tautomeric states across the pH spectrum.

Figure 6: A Macro-pKa report detailing the macro-pKa value and the distribution of protonation states across a pH range.

 

Feature Comparison Table

a) Easily adjustable; b) Strongly influenced by the number of conformers (and tautomers in Macro-pKa); c) Only by internal experts at this time. Table 1. Comparison of Features of the Small Molecule Protonation State Enumeration and pKa Prediction Technologies

References

  1. Epik: A Software Program for pKa Prediction and Protonation State Generation for Drug-like Molecules.

    Shelley, J. C. et al. J. Comput. Aided Mol. Des. 2007, 21 (12), 681–691

  2. Epik: pKa and Protonation State Prediction through Machine Learning.

    J. Chem. Theory Comput. 2023, 19 (8), 2380–2388

  3. Multiconformation, Density Functional Theory-Based pKa Prediction in Application to Large, Flexible Organic Molecules with Diverse Functional Groups.

    Bochevarov, A. D. J. Chem. Theory Comput. 2016, 12 (12), 6001–6019.

Software and services to meet your organizational needs

Software Platform

Deploy digital materials discovery workflows with a comprehensive and user-friendly platform grounded in physics-based molecular modeling, machine learning, and team collaboration.

Research Services

Leverage Schrödinger’s expert computational scientists to assist at key stages in your materials discovery and development process.

Support & Training

Access expert support, educational materials, and training resources designed for both novice and experienced users.

De novo design of hole-conducting molecules for organic electronics

De novo design of hole-conducting molecules for organic electronics

Panasonic and Schrödinger scientists designed over 50 novel molecules with improved hole mobility by performing large-scale density functional theory (DFT) calculations and machine learning inverse design.

Executive Summary

Tremendous Time Saved and Cost Reduced

  • de novo design methods developed and assessed
  • 14 million molecules enumerated and screened
  • 9,000 DFT calculations performed
  • Over 50 molecules identified with target performance profile

Performance Improved

Identified molecules with lower hole reorganization energy (up to 22% reduction) than the lowest one in training dataset

Highly Predictive Machine Learning (ML) Models Developed

Leveraged data based on DFT calculations of 250,000 molecules

New Insights Proposed

High quality de novo design complements molecular enumeration and virtual screening

 

Charge carrier mobility is one of the most important characteristics of semiconductor materials.

Applications in printed electronics demand molecules with high mobility. Despite rapid progress toward discovery of new molecules with improved mobility, challenges persist. For example, the impact of topological shape of the molecules on the magnitude of hole mobility is not well understood for optimized molecular design, and it can be extremely costly and time-consuming to synthesize and assess every candidate molecule. Atomic simulations and machine learning technologies can reveal novel insights which are inaccessible to experimental methods alone.

““With Schrödinger’s advanced simulation tools, we were able to explore millions of molecules and target tens of potential candidates within a short period of time, which is simply unfeasible with traditional approaches. This level of computational power changes the way we innovate. Both the scientific expertise and the excellence of technology Schrödinger brings to the table give us confidence in future collaborations.””
Nobuyuki N. Matsuzawa General Manager of Panasonic Corporation

Approach

Scientists at Panasonic are challenged to develop novel organic semiconductor materials with higher efficiency. In order to drive innovation, Panasonic teamed up with Schrödinger for de novo design of new molecules leveraging the computational power and expertise of Schrödinger of high-throughput DFT calculations, machine learning/deep learning model building, and chemical enumeration.

 

high-throughput DFT calculations, machine learning/deep learning model building, and chemical enumeration_approach_figure

 

Results

Scientists from Panasonic and Schrödinger performed a thorough benchmark study of three de novo methods and identified molecular structures in the heteroacene family, which may show improved carrier transport properties. 1 Schrödinger demonstrated strong large-scale computing capabilities and in-house expertise in machine learning to develop de novo methods based on knowledge and literature reports, building bayesian optimizers and reward engineering.

 

Conclusion

Scientists from Panasonic and Schrödinger have applied three major classes of de novo molecular design (inverse design) methods to the challenging problem of improving charge carrier mobility in materials science. They evaluated the performance of these methods via large-scale DFT calculation of hole reorganization energy. These methods present an attractive complement to molecular enumeration and virtual screening, and recent advances in deep learning for de novo design have yielded promising results for the design of novel materials.

  • Over 50 molecules were identified to have lower hole reorganization energy than lowest data in the training set (up to 22% reduction). We expect significant enhancement in hole mobility by the reduction of the reorganization energy in the newlydesigned molecules.
  • The best scoring compound was found by the JTNN method, followed by REINVENT. However, on the whole, the REINVENT method generated the best top 1,000 molecules.
  • Based on the findings, the scientists propose that high-quality de novo methods should optimize for compounds that “fill holes” in the space of the enumeration, generating highly targeted molecules.

References

  1. De Novo Design of Molecules with Low Hole Reorganization Energy Based on a Quarter-Million Molecule DFT Screen

    Gabriel Marques*, Karl Leswing, Tim Robertson, David Giesen, Mathew D. Halls, Alexander Goldberg, Kyle Marshall, Joshua Staker, Tsuguo Morisato, Hiroyuki Maeshima, Hideyuki Arai, Masaru Sasago, Eiji Fujii, and Nobuyuki N. Matsuzawa* J. Phys. Chem. A 2021, 125, 33, 7331–7343.

Software and services to meet your organizational needs

Industry-Leading Software Platform

Deploy digital drug discovery workflows using a comprehensive and user-friendly platform for molecular modeling, design, and collaboration.

Research Enablement Services

Leverage Schrödinger’s team of expert computational scientists to advance your projects through key stages in the drug discovery process.

Scientific and Technical Support

Access expert support, educational materials, and training resources designed for both novice and experienced users.

An automated workflow for rapid large-scale computational screening to meet the demands of modern catalyst development

An automated workflow for rapid large-scale computational screening to meet the demands of modern catalyst development

Executive Summary

First-principles simulation has become a reliable tool for the prediction of structures, chemical mechanisms, and reaction energetics for the fundamental steps in homogeneous catalysis. Details of reaction coordinates for competing pathways reveal a fundamental understanding of observed catalytic activity, selectivity, and specificity. Such predictive capability raises the opportunity to accelerate computational discovery and design of new single-site catalysts with enhanced properties.

However, along with the rapid technology development and materials innovation, challenges persist:

  • The complexity of chemical reactions and the need for associated computational research has increased as demands for innovation increase
  • The traditional rate of catalysts discovery is limited and unable to keep pace with demands for improved catalysts
  • Existing computational frameworks are manually intensive with limited scale, and require a high level of expertise and training
  • Cataloging and maintaining databases of novel catalysts is challenging and time-consuming

To democratize the fundamental understanding, design, and discovery of novel catalysts, Schrödinger developed an automated reaction workflow called AutoRW. AutoRW combines the elements of enumeration, mapping, organization, and output needed for high-throughput screening of catalysts, reagents, and substrates, requiring a pre-built reaction coordinate, novel chemical fragment, and any R-Groups for enumeration. By automating processes and computing the reaction coordinates, rates, energies, transition states, structures, properties, for each reaction, AutoRW streamlines the process of large-scale computational catalyst screening.

 

Solution: AutoRW for automated large-scale catalysts screening

  • Simplified, customizable workflows that enhance reproducibility and predictability
  • Easy to use for both expert and non-expert computational users
  • Increased productivity for highly-complex problems and challenges
  • Enhanced coverage where conformers could be missed by manual methods
  • Improved organization of files and properties to save time and reduce errors
  • Dedicated scientific & technical support and vast learning resources

Case Studies: How AutoRW Accelerates Innovation in Catalysis and Reactivity

Understanding the Effects of Catalysts Selectivity on Polypropylene Tacticity

Production of olefin-based polymer products has surpassed 100 million tonnes. Of these, polypropylene is the second most produced polymer. Its physical properties are directly influenced by the regularity of adjacent stereocenters. This regularity, or tacticity, is determined by the kinetic selectivity and the control of incorporation of α-olefin monomer allows for fine-tuning of its physical properties. In this project, scientists studied 13 isotactic catalysts using AutoRW to fundamentally understand the adjacent stereoselectivity of polypropylene. The results were in good agreement with the experimental selectivities (R= 0.8). This quick and accurate approach allows for optimized polypropylene design and synthesis with target structures and properties.

 

Screening Epoxy Amine Reactions for Efficient High-performance Polymer Design

Epoxy amine crosslinking reaction. The initial reaction occurs between a primary amine and a single epoxide. The resulting product contains a secondary amine that can further react with additional epoxide.

 

Thermoset polymers have gained interest in recent years due to their favorable thermomechanical properties for applications in aerospace, automotive, defense and high performance athletics equipment. While thermosets are very versatile, the cost to incorporate these polymers into new materials is high. Accelerating the development process pipeline would not only reduce costs, it would also decrease accumulation of thermoset waste which is difficult to recycle. Towards this end, the scientists studied a library of 12 amines and 21 epoxides to build a relative reaction barrier heat map. Each epoxy/amine combination was subjected to the Reaction Workflow to locate all stationary points in the reaction as well as compute energetic barriers for all reaction steps, enabling efficient design and synthesis of high-performance polymers.

 

Investigating Comonomer Selectivity for Optimized Block Copolymerization

Block copolymers have unique properties that include high strength, flexibility, and melting temperature. Their physical properties are directly influenced by the polymer block length and distribution, which are then determined by the kinetic comonomer selectivity and chain shuttling agent activity. The comonomer selectivity is influenced by both the catalyst and existing polymer product. In this project, scientists ran AutoRW to screen and test 35 catalyst derivatives with different polymer substrates to understand the effect of catalysts on comonomer selectivity for different copolymerization reactions. This approach enables quick screening of catalysts and substrates to design block copolymers with target structures and properties.

Empower advanced catalysts discovery across entire R&D project teams

As the modern R&D processes evolve in a more collaborative and globalized manner, tremendous effort has been put into data storage and sharing, communication and project management across teams and geographies. Enterprise scale informatics platforms for R&D have been developed to break the silos and barriers, and have been adopted by many companies across several molecular design industries.

The benefits are clear: project teams can work across departments and sites, across geographies and time zones, and even across companies, with live sharing of all the project data of experiments, designs, processes, simulations, etc.; Teams can share, analyze and communicate data seamlessly and make rapid decisions, accelerating the collaboration and progress of the projects. Enterprise informatics platforms also simplify and optimize data management for organizations, eliminating the chaos of storing large data on computer drives and transferring data between teams and during team or personnel changes.

Schrödinger’s LiveDesign is a powerful, web-based informatics and molecular design platform that enables teams to rapidly advance materials discovery projects by collaborating, designing, experimenting, analyzing, tracking, and reporting in a centralized platform. By incorporating AutoRW into LiveDesign, scientists can get the most benefits from both.

 

AutoRW in LiveDesign for enterprise-wide time and cost savings

  • Streamlined automated enterprise solution for catalysis and reactivity
  • Scalable to satisfy the need of large global organizations
  • Integrated with advanced machine learning systems
  • End-to-end collaborative discovery between chemists, modelers and engineers on a single web-based platform
  • Live sharing of ideas and results for rapid decision-making
  • Intuitive cheminformatics: visualization, data and model analysis for experimental and computational data simultaneously

Using LiveDesign, the team can collaborate and virtually screen over 2000 catalysts per year, while a single modeling user can only screen about 150 catalysts annually. Employing the automated enterprise-scale workflows lead to much higher cost efficiency and rates of success.

Conclusions

Scientists across industries are entering a new paradigm for catalysis research. Research historically based on pure experimental trial-and-error is moving to computationally-driven workflows. Rapid technology development and evolving project collaborations demand for simplified and automated workflows at enterprise scale. Schrödinger’s AutoRW and LiveDesign enable rational catalyst design in an automated, accelerated and collaborative manner on a single web-based platform, which are easy to use and deploy across teams and organizations.

The tools empower scientists to solve high-level challenges of even more complex reactions and catalysis systems with reduced time and cost while enhancing predictability and productivity.

Software and services to meet your organizational needs

Industry-Leading Software Platform

Deploy digital drug discovery workflows using a comprehensive and user-friendly platform for molecular modeling, design, and collaboration.

Research Enablement Services

Leverage Schrödinger’s team of expert computational scientists to advance your projects through key stages in the drug discovery process.

Scientific and Technical Support

Access expert support, educational materials, and training resources designed for both novice and experienced users.

Accelerating the Design of Asymmetric Catalysts with a Digital Chemistry Platform

APR 18, 2023

Accelerating the Design of Asymmetric Catalysts with a Digital Chemistry Platform

Speaker

Pavel A. Dub
Senior Principal Scientist

Abstract

Asymmetric catalysis has become an integral part of the science-driven technological revolution in the second half of the 21st century, leading to decreased energy demands, sustainable chemical processes and the realization of “impossible” transformations. Asymmetric catalysis based on chiral transition-metal complexes plays an important role in the synthesis of single-enantiomer drugs, perfumes and agrochemicals. The importance of the field is recognized by two Nobel Prize Awards in 2001 (transition-metal catalysis) and 2021 (organocatalysis).

Asymmetric catalysts are traditionally designed by experimental trial-and-error methods, which are resource-, time- and labor-consuming, and thus extremely expensive. Digital methods offer the opportunity to expedite catalyst design. Until recently, computational chemistry, typically quantum chemical studies, indirectly contributed to asymmetric catalyst design by providing rationalization for the mechanism of generation of chirality. With the development of more advanced methods, algorithms and an included layer of automation, computational catalysis is now providing the possibility for direct asymmetric catalyst design.

In this webinar, we will demonstrate how Schrödinger’s advanced digital chemistry platform can be used to accelerate the direct design and discovery of asymmetric catalysts.

Key Learning Objectives:

  • Learn how to design an asymmetric catalyst with computational chemistry
  • Learn how automated high-throughput simulation workflows enable rapid asymmetric catalyst design
  • Understand the intersection of physics-based and machine learning techniques in asymmetric catalyst design

Battery Tech – Leveraging Atomic Scale Modeling for Design and Discovery of Next-Generation Battery Materials

MAR 29, 2023

Battery Tech – Leveraging Atomic Scale Modeling for Design and Discovery of Next-Generation Battery Materials

Speaker

Garvit Agarwal
Senior Scientist

Abstract

Rechargeable Li-ion batteries (LIBs) are revolutionizing electric vehicles and portable devices, but improvements are needed in areas such as power density, safety, reliability, and lifetime. Reliable atomic scale modeling enables rapid initial evaluation of large chemical and material design space, accelerating the development cycle of next-generation battery technologies.

Attend this webinar to learn about an advanced digital chemistry platform for developing next-generation battery materials with improved properties. The presentation will include use of physics-based and machine learning techniques for understanding structure-property relationships of different battery components. It will also outline an automated active learning framework for the development of neural network force fields to predict critical bulk properties of high-performance liquid electrolytes used in advanced batteries.

Attend this webinar and learn:

  • Predictive capabilities of physics-based modeling for battery materials
  • How automated high-throughput simulation workflows enable rapid screening of new material candidates
  • How advanced neural network force fields can be applied for accurate electrolyte property prediction

Expect Success: Modern Virtual Screening Technologies that Actually Deliver High-Quality, Developable Hits

MAR 14, 2023

Expect Success: Modern Virtual Screening Technologies that Actually Deliver High-Quality, Developable Hits

Speakers

Jeremie Vendome
Steve Jerome

Abstract

For years, traditional virtual screening approaches have suffered from low hit rates, lack of novelty, and poor developability of those molecules identified. These shortcomings have been attributed to both the performance of screening technologies and the size of the chemical libraries that could realistically be screened and have led to a lack of confidence in virtual screening as a reliable approach for hit discovery.

Today, Schrödinger is pioneering a new, modern virtual screening workflow that leverages game-changing technologies, including AI/ML-powered active learning and accurate Absolute Binding FEP+ (ABFEP) to screen and rescore ultra-large chemical libraries, including nearly comprehensive fragment-based libraries, in a cost effective way. The performance of this workflow has been transformational.

In this webinar, we will describe several recent case studies from the Schrödinger Therapeutics Group where this modern large-scale virtual screening workflow resulted in double-digit hit rates across a diverse range of targets. We will also describe how to access these technologies today via Schrödinger’s Research Enablement Services.

Highlights

  • Hear case studies of successfully achieving double-digit hit rates from virtual screens across broad target classes
  • Learn about a new modern screening workflow that combines physics-based methods (docking and absolute binding free energy calculations) with machine learning for large scale screening and rescoring of whole ligands and fragments
  • Learn about how to easily access these technologies and expertise via Schrödinger’s Research Enablement Services for Hit Discovery
  • Ask questions to gain further insight from the speakers to apply to your work

DeepautoQSAR hardware benchmark

DeepAutoQSAR hardware benchmark

Executive Summary

  • This benchmark evaluates the performance of DeepAutoQSAR on two datasets of different sizes using different hardware configurations and model training times.
  • Our general recommendations, based on the results and the hardware costs, are to use the NVIDIA T4 GPU hardware with the following training times: 2 hrs for datasets with less than 1,000 data points; 4 hrs for 1,000 to 10,000 data points; and 8 hrs for more than 10,000 data points.
  • While performance ultimately depends on the data, the intended purpose of this benchmark is to serve as a starting point for choosing the hardware to train the ML model(s) with and the specific model training time to use. Actual performance is highly dependent on the specific dataset and may require increasing the training time or choosing a different GPU to achieve the desired results.

 

Introduction

The application of machine learning (ML) to predict the molecular properties of drug candidates is an important area of research that has the potential to reduce drug development timelines and accelerate the creation of medicines for patients with serious unmet medical needs.

The successful application of ML relies on sufficient data quantity and quality, a suitable model architecture(s) for the given problem, proper hyperparameter choices (the parameters for a particular ML model architecture), and appropriate model training time for a chosen hardware configuration.

DeepAutoQSAR is a machine learning product that allows users to predict molecular properties based on chemical structure. The automated supervised learning pipeline enables both novice and experienced users to create and deploy best-in-class quantitative structure activity/property relationship (QSAR/QSPR) models.

The purpose of this benchmark, which builds on the work of an earlier whitepaper [1], is to characterize the performance of DeepAutoQSAR on two datasets of different sizes using different hardware configurations and model training times. While performance ultimately depends on the data, the intended purpose of this benchmark is to serve as a starting point for choosing the hardware to train the ML model(s) with and the specific model training time to use.

 

Datasets

The datasets used in the benchmark were obtained from the Therapeutics Data Commons (TDC). TDC provides ML-ready datasets that can be used for learning tasks that are valuable to pharmaceutical research and development and that cover different therapeutic modalities and stages of the drug development lifecycle [2].

We use two datasets that contain assay data for one Absorption, Distribution, Metabolism, and Excretion (ADME) property each:

  1. Caco2 (Human Epithelial Cell Effective Permeability)
  2. AqSolDB (Aqueous Solubility)

Performance is measured by the median accuracy of the ADME property prediction for a sample of train-test data splits; note that the specific train-test data splits used are different from the splits provided by TDC for its benchmark leaderboard.

Dataset Descriptions

Caco2 (Human Epithelial Cell Effective Permeability) [3]*

The human colon epithelial cancer cell line, Caco-2, is used as an in vitro model to simulate the human intestinal tissue. The experimental result on the rate of drug passing through the Caco-2 cells can approximate the rate at which the drug permeates through the human intestinal tissue.

This dataset contains numerical data for use in regression, and there are 906 compounds.

AqSolDB (Aqueous Solubility) [4]*

Aqueous solubility measures a drug’s ability to dissolve in water. Poor water solubility could lead to slow drug absorptions, inadequate bioavailability and even induce toxicity. More than 40% of new chemical entities are not soluble.

This dataset contains numeric, non-integer data for use in regression, and there are 9845 compounds.

*Note: The datasets have been modified from their original form to remove structural redundancies and experimental errors.

 

Hardware

The hardware used in the benchmark was provisioned from the Google Cloud Platform (GCP); therefore, the hardware configurations chosen were based on the machine types offered by Google.

These limitations on hardware configurations, dictated by the cloud provider, mean that only specific hardware pairings are available, such as a particular GPU platform that can only be used with a given CPU platform. For example, NVIDIA A100 GPUs can only be run on an A2 machine type, which only uses the Intel Cascade Lake CPU platform. Constrained by these limitations, every effort was made to keep hardware-specific options consistent across machine types, to provide hardware diversity when reasonable, and to use cost-effective high-performance computing hardware.

 

Hardware Key GCP Machine Type CPU Platform vCPUs* RAM (GB) GPU Platform GPUs Cost ($) per Hour+
2 vCPUs n2-standard-2 Intel Ice Lake 2 8 N/A None $0.10
4 vCPUs n2-standard-4  4 16 $0.19
8 vCPUs n2-standard-8 8 32 $0.39
16 vCPUs n2-standard-16 16 64 $0.78
T4 GPU n1-standard-4 Intel Ice Lake** 4 15 Nvidia T4  1 $0.54
V100 GPU Nvidia V100 $2.67
A100 GPU a2-highgpu-1g Intel Cascade Lake  12 85 Nvidia A100 $3.67
* For these machine types, GCP defines vCPUs as the number of threads. 2 vCPU (threads) per core.
** Up to Intel Ice Lake generation; GCP auto assigns CPU platform on node pool creation.
+ Prices in November 2022. Includes sustained use discounts.

 

Benchmarking Methods & Results

Our benchmark is a two stage process. In the first stage, DeepAutoQSAR models are trained to fit the TDC datasets using a standard cross validation procedure to select top performing ML models for the model ensemble and to optimize hyperparameters; the end result of this stage is an ensemble of top performing models, which, under normal usage, are averaged to provide a mean prediction and associated ensemble standard deviation. We detail the specific protocol in our white paper, a Benchmark Study of DeepAutoQSAR, ChemProp, and DeepPurprose on the ADMET Subset of the Therapeutic Data Commons [1].  In the second stage, random train-test splits of the data are computed, and the previously determined ensemble of top ML models architectures with specific hyperparameter configurations are trained on the new training data splits. Predictions are then generated for the new test data splits. These multi-split metrics provide a more robust estimate of model performance by reducing potential bias introduced from a single train-test data split. Model performance in this hardware benchmark is reported as the median R2 coefficient of determination [5] across these random train-test splits for each hardware configuration and model training time.

In the first stage, the initial training procedure runs continuously for each training time allotment. Due to the stochastic nature of hyperparameter optimization and model architecture selection, each hardware and training time combination can potentially explore a different number of model architectures and hyperparameter combinations each time a benchmark job is run. The model training times evaluated were: 0.5, 1, 2, 4, 8, and 16 hours. As a general rule, more competent hardware running for longer training times on smaller datasets (e.g., a machine with an A100 GPU training for 16 hrs on the smaller Caco2 permeability dataset) will explore more hyperparameterizations than less competent hardware running for shorter training times on larger datasets (e.g., a two core machine training for 2 hrs on the larger AqSolDB dataset).

Since model architecture selection and hyperparameter sampling is a stochastic process, we run each benchmark configuration, which is the particular hardware and training time combination, three times and report averages for performance—this is especially relevant when fewer hyperparameter combinations are explored as model performance is more sensitive to hyperparameter sampling. The output of the first stage is an ensemble of top models, determined by cross validation, with specific hyperparameters choices for each.

The second stage of our benchmark runs for half the training time of the first stage. Increasing training time leads to more robust statistics as the median performance converges to a split-independent value, but comes at the expense of increased computational cost; in practice computational expense must be balanced with the need to train the ensemble model for a sufficiently large training time. For performance reporting we provide the median R2 coefficient of determination [5] as computed from the multiple train-test splits, which aims to reduce potential bias introduced by a single train-test split. To compute this R2, we repeatedly split the data into training and testing sets via bootstrap sampling with replacement; to do so, we take N samples with replacement from the dataset with N total data points and remove any duplicates to form a subset. The selected points are then used to train the specific model architectures found in stage one, and the unselected points serve as the test holdout. We do this until the time limit is reached and report the median R2 of all resamplings.

As both of the TDC datasets are numerical regression problems, this metric is a reasonable measure of model performance; however, the choice of performance metric in real-world applications should always be determined according to the use-case of the ensemble model. Sometimes MAE or RMSE are more appropriate to assess if a model is sufficiently performant. The output of the second stage is a distribution of ensemble model performances over different train-test splits; the reported value is the median of the distribution.

We plot the benchmark results, which is the median R2 coefficient of determination from the second stage, below. Our first plot shows performance on the AqSolDB dataset, and the second plot shows performance on the Caco2 permeability dataset. For each of these datasets, we highlight the progression of performance over time grouped by hardware type, where hardware type is on the x-axis, training time in hours is the bar color, and median R2 score is on the y-axis. The data used to generate the plots are provided in the supplementary tables.

 

Figure 1: Grouping R2
Figure 1: Grouping R2 score by hardware configurations on the AqSolDB regression dataset.

 

Figure 2: Grouping R2 score by hardware configurations on the Caco2 permeability regression dataset.
Figure 2: Grouping R2 score by hardware configurations on the Caco2 permeability regression dataset.

 

Based on these results and the hardware costs, our general recommendations are the following:

Number of Data Points Hardware Training Time (hr)
<1,000 Nvidia T4 GPU 2
1,000 – 10,000 4
>10,000 8
These recommendations are a starting point and a lower bound. Actual performance is highly dependent on the specific dataset, and you may need to increase the training time or choose a different GPU to achieve your desired results.

Selected publications

  1. Kaplan, Z.; Ehrlich, S.; Leswing, K. Benchmark study of DeepAutoQSAR, ChemProp, and DeepPurpose on the ADMET subset of the Therapeutic Data Commons. Schrödinger, Inc., 2022.

    https://www.schrodinger.com/science-articles/benchmark-study-deepautoqsar-chemprop-and-deeppurpose-admet-subset-therapeutic-data (accessed 2022-11-29).

  2. Therapeutics Data Commons.

    https://tdcommons.ai/ (accessed 2022-06-15).

  3. ADME – TDC.

    https://tdcommons.ai/single_pred_tasks/adme/#caco-2-cell-effective-permeability-wang-et-al (accessed 2022-06-15).

  4. ADME – TDC.

    https://tdcommons.ai/single_pred_tasks/adme/#solubility-aqsoldb (accessed 2022-06-15).

  5. Sklearn.metrics.r2_score — scikit-learn 1.1.3 documentation.

    https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html#sklearn-metrics-r2-score (accessed 2022-11-29).

Software and services to meet your organizational needs

Industry-Leading Software Platform

Deploy digital drug discovery workflows using a comprehensive and user-friendly platform for molecular modeling, design, and collaboration.

Research Enablement Services

Leverage Schrödinger’s team of expert computational scientists to advance your projects through key stages in the drug discovery process.

Scientific and Technical Support

Access expert support, educational materials, and training resources designed for both novice and experienced users.

The role of digital chemistry across the polymer supply chain

The role of digital chemistry across the polymer supply chain

Molecular modeling and simulation tools have proven effective in materials development and are increasing in use throughout the polymer industry, from raw materials suppliers to end product manufacturers. Computational workflows also open new avenues for developing polymers with improved recyclability. Physics-based simulations offer reliable predictions of structures, morphologies, properties, and chemical reactivity for polymers. Recent advances in machine learning, deep learning, and enterprise informatics platforms have accelerated the speed, accuracy, and automation of novel materials and solutions discovery. A paradigm shift to computer-driven molecular design is occurring throughout the industry.

Vision of a digital chemistry catalyzed polymer supply chain:

Raw Materials Suppliers: Employ atomistic simulations to improve understanding and predict properties of downstream products; offer optimized raw materials to compounders.

Polymer Compounders: Use atomistic simulations to understand formulation chemistry and predict properties; provide detailed requirements to raw materials suppliers and offer optimized products to end product manufacturers.

End Product Manufacturers: Leverage atomistic simulations to predict the performance of final products and quickly identify causes of failure; give specific formulation requirements to compounders.

 

role-digital-chemistry-across-polymer-supply-chain_figure

 

Solution Overview

Schrödinger’s Materials Science platform offers tailored solutions for research and business throughout the polymer supply chain, with differentiated model builders, efficient simulation engines accelerated by GPU computing power, automated thermophysical and mechanical response workflows, and accurate analysis tools.

  • Broad molecular simulation and property prediction tools for: Thermal Properties, Mechanical and Dielectric Properties, Reactivity and Kinetics, Aggregation in Polymer Production, Solvent Sensitivity, Gas, Ion, Additive Diffusivity, Phase Morphology and SAXS Scattering, Semi-crystalline Morphology
  • Applicable to all polymer types: thermoplastic homo and copolymers, crosslinked, elastomers, and dendrimers
  • Intuitive user interface with automated workflows for experts or non-experts
  • Dedicated scientific/technical support and vast learning resources

 

Digital Chemistry Value Across Polymer Supply Chain (Example: Transportation Industry)

 

Digital Chemistry Value Across Polymer Supply Chain (Example: Transportation Industry)

 

1. Raw Materials Suppliers

Suppliers of petrochemical and chemical feedstocks, additives, and various monomers and resins

Design new chemistries from alternative sources and discover new applications through simulating downstream products properties

  • Predict polymer crosslinker performance in composite matrix resins such as epoxy-amine and cyanate esters
  • Simulate the interaction between thermoplastic styrene-butadiene and crosslinkers

Speed decision making for catalyst selection in raw materials production

  • Simulate and understand the catalysis mechanisms, selectivity, and reactivity of epoxy amine, urethane, and other reactions

Develop alternative greener raw materials that are more environmentally sustainable

  • Simulate the impact of degradation on modulus for a chemistry of focus

 

2. Polymer Compounders

Suppliers who prepare polymer formulations by mixing or/and blending polymers and additives into process-ready products

Predict the performance of alternative raw materials in formulations and end products

  • Predict glass transition, thermal stability, and thermal expansion with new polymers
  • Quantify the diffusion of additive in polymers • Understand water transport and morphological stability of polymer formulations

Efficiently optimize formulation properties

  • Predict and track water uptake in polymer composites
  • Predict curing kinetics and processing properties

Develop greener formulations that are more environmentally sustainable

  • Simulate and screen for optimal formulation with new bio-based chemistry

 

3. End Product Manufacturers

Processors of resins/formulations who make them into finished products on the market

Enable reliable decision-making through predictive modeling of end product properties

  • Predict tire materials performance with different additives and cross-linkers

Obtain best chemistry from upstream suppliers by targeted chemical design to properties critical to product and processing constraints

  • High-throughput screening of epoxy-amine reactions to identify the unique combinations for target properties

Accelerate the manufacturing process pipeline

  • Predict polymer gelling during manufacturing process

Quickly screen and identify potential causes and impacts of manufacturing and material source deviations

  • Predict sensitivity of matrix to cleaning solvents

Design greener products that are more environmentally sustainable

  • Simulate and predict properties of high-performance resins with bio-based materials and automate discovery of new biomaterials

 

4. Polymer Recycling

Research and design for recyclability throughout the polymer supply chain

Design polymers for recyclability

  • Predict selectivity of chemical recycling reaction

Expand use of recycled materials

  • Simulate impact of recycled polymers in packaging

Determine impact on product with use of recycled material

  • Screen for property changes with recycling driven microstructure changes

 

About Schrödinger

Schrödinger is transforming the way materials are discovered. Schrödinger has pioneered a physics-based software platform that enables discovery of high-quality, novel molecules for materials applications more rapidly and at lower cost compared to traditional methods.

Learn how digital chemistry is driving innovation across materials science industries

Software and services to meet your organizational needs

Industry-Leading Software Platform

Deploy digital drug discovery workflows using a comprehensive and user-friendly platform for molecular modeling, design, and collaboration.

Research Enablement Services

Leverage Schrödinger’s team of expert computational scientists to advance your projects through key stages in the drug discovery process.

Scientific and Technical Support

Access expert support, educational materials, and training resources designed for both novice and experienced users.

Chinese: 2022薛定谔秋季中文生命科学网络讲座 | 用最新的基于物理计算方法为基于结构的药物研发开辟新天地

NOV 24, 2022

2022薛定谔秋季中文生命科学网络讲座 | 用最新的基于物理计算方法为基于结构的药物研发开辟新天地

Speaker

Dr. Jianxin Duan
Fellow

Abstract

近年来,随着新的高预测性、基于物理理论方法的发展与其加速发现新型临床化合物能力的展现,基于结构的药物发现 (SBDD) 策略的价值得到提升。然而,这些方法受靶蛋白的高质量结构模型可用性的限制。 最新的结构生物学创新利器,如冷冻电镜和计算预测的蛋白质模型(使用机器学习和基于物理的方法)有望开创一个新的靶点纪元。 在本次网络研讨会中,我们将介绍最新的计算工作流程如何在这些具有历史挑战性的靶点和脱靶点上实现基于结构的药物发现。

主要议题:

在没有实验晶体结构(即同源模型或 AlphaFold 结构)的情况下,建立和验证高质量蛋白质结构模型用于SBDD的新计算方法。

通过以下案例展示新方法在项目中的作用:

1)推进用高通量筛选获得的初步苗头化合物

2)解决脱靶效应带来的障碍

3)使用同源模型推进整个项目

The value of pursuing a structure-based drug discovery (SBDD) strategy has amplified in recent years as new highly-predictive, physics-based methods have evolved and demonstrated the ability to accelerate the discovery of novel clinical compounds. However, these approaches are limited by the availability of high-quality structural models of the target protein. Recent advances in structural biology such as cryo-EM and computationally-predicted protein models (using machine learning and physics-based methods) have the potential to open a new world of targets to pursue. In this webinar, you’ll learn how new advances in computational workflows are enabling structure-based drug discovery on these historically challenging targets and off-targets.

Key topics covered:

Overview of new computational approaches for building and validating high-quality protein structural models for use in SBDD in the absence of an experimental crystal structure (i.e. homology models or AlphaFold structures)

Case studies demonstrating the impact of these approaches to:

1) progress initial hits from high-throughput screens

2) dial-out off target liabilities

3) progress entire programs using homology models

Computational chemistry applications

Computational chemistry applications

An in-depth exploration of computational chemistry applications to solve real-life biological science, materials, and engineering problems.

Computational chemistry allows researchers to explore a large, diverse range of chemical space since it is much easier to draw a molecule on the computer than to synthesize, purify, and characterize a molecule in a lab.

When deployed appropriately, computational chemistry applications can effectively bring molecules to life on the computer by accurately simulating and predicting relevant properties. For instance, the binding affinity of a small-molecule ligand to a protein target can be calculated with a similar accuracy to that of wet lab assays.

Within computational chemistry, physics-based methods grounded in first-principles can enable prediction accuracy matching experimental accuracy and are broadly applicable, but they tend to be more computationally expensive than other methods. Alternatively, machine learning (ML) methods, which develop a model by training on a data set, are also being deployed for molecular design. These ML approaches can generate results much faster but are most effective when exploring chemical space that is related to the data set the machine learning model is built upon, thus limiting their domain of applicability.

Combining physics-based and ML approaches incorporates the strengths of both to speed up scientific advances in molecular design. For example, integrating active learning into physics-based molecular docking allows one to assess very large chemical libraries in an efficient manner while still retaining the high level of performance. With active learning incorporated in docking algorithms, roughly 30,000 compounds can be tested in one second as compared to typical non-ML methods that run at roughly 1 compound per 30 seconds–this represents a 104 times speed up.

Putting Computational Chemistry to Work

Many industries are using computational chemistry methods and molecular modeling to drive innovations in pharmaceutical drugs, packaging materials, batteries, and more. Some applications for computational chemistry include:

  • Drug design
  • Medicinal chemistry design
  • Chemoinformatics
  • Consumer packaged goods
  • Protein/antibody engineering
  • Enzyme design
  • Organic electronics
  • Pharmaceutical formulations
  • Catalysis design
  • Polymer design
  • Surface chemistry
  • Energy capture and storage
  • Lead optimization
  • Drug target validation
  • Semiconductors
  • Peptide design
  • Metals, alloys, and ceramics design

Benefits of Using Computational Chemistry

Computational chemistry aims to simulate and predict molecular structures and properties using different kinds of calculations based on quantum and classical physics. Advances in machine learning are also making computational chemistry more effective by increasing the speed at which calculations can be done.

Computational chemistry methods reduce the time, money, and reagent resources spent on synthesis, assays, and other experimental work. Machine learning applications can further enhance computational chemistry by increasing the speed of complex calculations, sometimes by several orders of magnitude. By carefully integrating machine learning with physics-based algorithms, digital chemical design can easily outpace wet lab design. This time savings directly translates into cost savings. Additionally, these methods allow for a broader expanse of chemical space to be explored, which can result in a greater likelihood of finding unexpected, novel molecules. In the fast-paced world of molecular design, where first-to-patent can mean the difference between success and the loss of a research program, the increase in the speed and breadth afforded by digital chemistry increases the chances of owning intellectual property.

Real-World Computational Chemistry Applications

Computational Chemistry Accelerates Drug Design

When used in drug discovery programs, computational tools allow the exploration of the chemical space with times and costs that cannot be achieved with wet-lab experiments.

For example, recent acceleration of the lead optimization process was made by using a broad search algorithm and cloud computing to explore a huge chemical space–more than 1 billion molecules computationally characterized–towards the goal of designing new inhibitors of d-amino acid oxidase (DAO). DAO is a target for the treatment of schizophrenia. This work shows the application of chemical enumeration, property filtering, machine learning and rigorous free energy perturbation calculations to design new small-molecule drugs and tackle the multiparameter optimization problem.

R&D for Product Development in Consumer Packaged Goods

In the consumer packaged goods (CPG) industry, manufacturers need to consider cost, performance and sustainability when developing new products.

Computational chemistry models and simulations decrease the development timeline and costs by allowing for fast screening, design and testing of new materials. Reckitt, which produces health, hygiene and nutrition consumer products, uses quantum mechanics and molecular dynamics computational tools in their R&D process to speed innovation. They have described how they used digital chemistry in their efforts to design more sustainable materials and how this approach has sped up timelines by 10x on average compared to a solely experimental approach.

Physics-Based Simulations to Develop New Energy Solutions

Another exciting application of computational chemistry approaches is the use of atomic-scale materials modeling in the design of new battery and energy storage solutions.

Some behaviors of materials that have been studied include ion diffusion, electrochemical response in electrodes and electrolytes, dielectric properties, mechanical response, and more. This computational approach has been used to screen for Li-ion battery additives that form a stable solid electrolyte interphase.

Driving R&D with Schrödinger’s Pioneering Computational Platform

At Schrödinger, our physics-based computational platform allows companies worldwide to harness the capabilities of computational chemistry methods and apply these to their R&D programs quickly and with ease. Over the last 30 years, Schrödinger’s modeling software and services have enabled the discovery of high-quality, novel molecules and materials across industries–as illustrated by some of the examples described above.

Molecules come to life in Maestro, the streamlined portal for structural visualization and access to cutting-edge predictive computational modeling and machine learning workflows. And researchers can bring their digital and experimental data side-by-side within LiveDesign, Schrödinger’s enterprise informatics platform for collaborative analysis, molecular design, and program management.

As the predictive and analytical capabilities of physics-based modeling continue to advance and are enhanced by the addition of new ML models, the myriad applications that are impacted by computational chemistry will continue to grow.

References

  1. Advancing Drug Discovery through Enhanced Free Energy Calculations

    2017. Abel R, Wang R, Harder ED, Berne BJ, and Friesner RA. Accounts of Chemical Research. 50(7):1625-1632. DOI: 10.1021/acs.accounts.7b00083

  2. Docking and scoring in virtual screening for drug discovery: methods and applications

    2004. Kitchen D, Decornez H, Furr J, et al. Nature Review Drug Discovery. 3:935–949. DOI: 10.1038/nrd1549

  3. Efficient Exploration of Chemical Space with Docking and Deep Learning

    2021. Yang Y, Yao K, Repasky MP, Leswing K, Abel R, Shoichet BK, and Jerome SV. Journal of Chemical Theory and Computation I. 17(11): 7106-7119. DOI: 10.1021/acs.jctc.1c00810

Leveraging Atomic Scale Modeling for Design and Discovery of Next-Generation Battery Materials

SEPT 22, 2022

Leveraging Atomic Scale Modeling for Design and Discovery of Next-Generation Battery Materials

Speaker

Garvit Agarwal
Senior Scientist

Abstract

The development of rechargeable Li-ion batteries (LIBs) has revolutionized electric vehicles and portable electronic devices. Further advancements are needed to improve the power density, safety, reliability, and lifetime of LIBs. ​​Over the past few decades, atomistic modeling of battery materials has complemented experimental characterization techniques and has become an integral part of the development of new technologies. Reliable atomic scale modeling enables rapid initial evaluation of large chemical and material design space accelerating the development cycle of next-generation battery technologies.

In this webinar, we will demonstrate how Schrödinger’s advanced digital chemistry platform can be leveraged to accelerate the design and discovery of next-generation battery materials with improved properties. We will discuss the application of both physics-based and machine learning techniques for understanding structure-property relationships of different components of batteries including electrodes, electrolytes and electrode-electrolyte interfaces. We also discuss the automated active learning framework for the development of state-of-the-art neural network force fields for modeling liquid electrolytes. The framework allows training the force field using highly accurate range-separated hybrid density functional theory data which enables accurate prediction of critical bulk properties of high-performance liquid electrolytes for application in advanced batteries.

Key Learning Objectives:

  • Understand predictive capabilities of physics-based modeling for battery materials
  • Learn how automated high throughput simulation workflows enable rapid screening of new battery material candidates
  • Application of advanced neural network force fields for accurate electrolyte property prediction