schrodinger.application.combinatorial_explorer.route_screener module

This module contains classes that allow fingerprint similarity screens within the combinatorial space of one or more synthetic routes.

Copyright Schrodinger LLC, All Rights Reserved.

schrodinger.application.combinatorial_explorer.route_screener.pass_prop_filter(prop_value, sd, prop_filter)

Given a property value, an estimate of the standard deviation in that property value, and a property filter, this function returns the probability that the property filter is passed.

Parameters
  • prop_value (float) – The property value

  • sd (float) – An estimate of the standard deviation in prop_value. If > 0, prop_value is treated not as a single value, but as a normal distribution with a mean of prop_value and a standard deviation of sd, which we refer to as smearing of the property value. In the limit of sd -> 0, this distribution becomes a Dirac delta function, and the probability is 1 or 0, depending on whether prop_value lies within the filter limits. For larger sd, the probability of passing a filter tends to decrease, which effectively penalizes routes that yield inaccurate models.

  • prop_filter (diversity_selector.PropertyFilter) – Property filter

Returns

Probability in the range 0 to 1 of passing the filter

Return type

float

class schrodinger.application.combinatorial_explorer.route_screener.RouteScreener(db_path, query_smiles, property_filters=None, filter_weight=None, min_products=1000, max_combos=1000000, rand_seed=0)

Bases: object

Utilizes an empirical algorithm to select the most promising reactant combinations for a fingerprint similarity screen in a combinatorial space of one or more synthetic reaction routes. The basic approach involves sorting each set of reactants by decreasing Tversky similarity to a query structure, choosing relatively small numbers of high-ranking rectants and performing systematic enumeration until a desired number of products are obtained. Tversky similarities are weighted to favor reactants that are substructures (or near substructures) of the query, which tends to yield products that resemble the query to a much higher degree than occurs with random enumeration.

__init__(db_path, query_smiles, property_filters=None, filter_weight=None, min_products=1000, max_combos=1000000, rand_seed=0)
Parameters
  • db_path (str) – Reactant fingerprint database file created by RfpDatabase (.rfpdb)

  • query_smiles (str) – SMILES string for query

  • property_filters (list[diversity_selector.PropertyFilter]) – List of property filters that products should preferentially satisfy. If supplied, the empirical algorithm biases the selection of reactants to maximize both similarity to the query and the number of property filters satisfied. Both short and long property names are supported (see REACTANT_PROPS_SHORT, REACTANT_PROPS_LONG).

  • filter_weight – Weight assigned to the property filter score in the calculation of total score. By default, similarity and filter score are both assigned a weight of 1.

  • min_products (int) – The desired minimum number of products per reaction route. Enumeration stops when this many products has been met or exceeded, or if the maximum number of reactant combinations has been reached.

  • max_combos (int) – Maximum number of reactant combinations to consider when attempting to enumerate the desired mininum number of products

  • rand_seed (int) – If a non-zero value is provided, reactants are selected randomly, rather than according to the empirical algorithm. This provides a means of comparing the algorithm to random enumeration. Note that any property filters are ignored if rand_seed is non-zero.

screen(route_file, logger=None)

Given a JSON route file, this function yields unique products that tend to exhibit higher than average fingerprint similarities to the query and, if property filters have been supplied, higher than average property filter scores. Yields until the minimum requested number of products have been generated, or until the maximum number of combinations have been considered.

Parameters
  • route_file (str) – Synthetic route file with reagent sources of the form <class>.pfx, where <class> is a reactant class within the reactant fingerprint database supplied to the constructor

  • logger (logging.Logger) – Logger to which progress messages should be written

Yield

The next enumerated product, with the similarity to the query stored in SIM_PROP, the property filter score stored in FILTER_SCORE_PROP, a weighted sum of those properties stored in TOTAL_SCORE_PROP, and all filter properties stored according to their long names (e.g., r_explorer_AlogP).

Type

rdkit.Chem.rdchem.Mol

Raise

RuntimeError if the route contains any reactant classes that are not present in the database