Webinar Archives
- Recent
- Maestro
- Life Science
- 20th Schrödinger European User Group Meeting
- What's New
- AI in Life Sciences Series
- Drug Discovery User Group Meeting 2020
- Summer of Science Series
- Chinese Webinar Series 2020
- Multiscale Modeling for Biotherapeutics Symposium
- Biologics
- FEP
- Force Field
- Formulations
- Informatics
- Lead Optimization
- Machine Learning
- MD
- Pharmacophores
- QM
- Target Refinement
- Virtual Screening
- Materials
- 20th Schrödinger European User Group Meeting
- What's New
- Digital Transformation for Chemistry & Materials Innovation
- Summer of Science Series
- Materials Science User Group Meeting 2020
- Chinese Webinar Series 2020
- Multiscale Modeling for Biotherapeutics Symposium
- ALD
- Catalysis and Reactivity
- Formulations
- Informatics
- MD
- Machine Learning
- OLED
- Polymer
- QM
- Informatics
- Bootcamps
- India Life Sciences Seminar Series
- Global Materials Science Seminar Series 2019
- 2nd Life Science Bootcamp
- 4th Life Science Bootcamp
- 5th Life Science Bootcamp
- 6th Life Science Bootcamp
- 7th Life Science Bootcamp
- 8th Life Science Bootcamp
- 9th Life Science Bootcamp
- 1st Materials Bootcamp
- 2nd Materials Bootcamp
- Release
- Lunch and Learns
Taking Hit Identification to the Next Level by Screening Billions of Compounds Efficiently and Cost Effectively with Machine Learning Enabled DNA Encoded Libraries and Virtual Screening
Dr. Steven Jerome
Product Manager, Hit DiscoveryThe chemical space available to drug discovery is vast, estimated conservatively at 1020-1024 compounds, yet traditional, structure-based experimental and virtual methods such as high throughput screening and docking have been limited to around ten million compounds per screening campaign. Examining only a tiny fraction of available chemical space limits chemotype diversity and decoration, available IP space, and scores and affinities from virtual screening and experimental screening, respectively. In order to cost effectively screen much larger chemical spaces in the billions of compounds, two machine learning enabled approaches have been developed. DNA encoded libraries (DEL) enable screening billions of synthesized compounds but are limited due to high rates of experimental false positives and negatives. Employing machine learning, we describe an approach using experimental DEL results that identifies false negatives and biproducts in a more favorable property space. Secondly, the advent of on-demand, synthesizable libraries has made multi-billion compound chemical spaces experimentally and virtually accessible. However brute force examination of such chemical libraries incurs significant experimental and computational costs, promoting the use of less accurate virtual screening techniques. To enable efficient chemical space exploration using an accurate scoring function, we have developed an active learning-based method employing AutoQSAR/DC machine learning and Glide SP docking as the learner. Results from Active Learning Glide screening of 100 million to billion compound screens show increased chemical diversity and GlideScore of hits relative to brute force screening of subsets of the libraries. Results and costs from these two new methods suggest billion compound library screens should replace the smaller (1-10 million compound), traditional screens commonly employed today.