schrodinger.application.combinatorial_screen.combinatorial_screener module¶

This module contains the CombinatorialScreener class, which employs a heuristic approach to identify subsets of combinatorial reactants that are most likely to yield enumerated products with the highest dendritic fingerprint similarities to a query.

Basic Algorithm¶

For the example reaction A + B + C –> ABC, the reactants in each of the 3 groups are ranked by decreasing Tversky similarity to the query, where the Tversky weight for the reactant R is 1 and the Tversky weight for the query Q is 0. In other words,

rank_score(R, Q) = ON(R & Q) / ON(R)

Where:

ON(R & Q) = Number of ‘on’ bits shared by reactant and query ON(R) = Number of ‘on’ bits in reactant

This quantity is maximized when R is a substructure of Q.

Once the reactants have been ranked, limits NA, NB, NC on the ranked lists are assigned to yield the subset S(NA, NB, NC) = A[0:NA] x B[0:NB] x C[0:NC], where [0:N] is a Python slice.

If the minimum number of enumerated products desired is min_products, the limits must be chosen such that NA * NB * NC >= min_products.

NA, NB, NC are arrived by setting them to 1 and then performing a systematic exploration of larger values with the goal of identifying combinations of reactants whose logical OR fingerprints yield the highest similarities to the query. In the case of dendritic fingerprints, the logical OR similarities correlate strongly with similarities computed from the enumerated products, so this is a good approximation that avoids enumeration of S(NA, NB, NC) as the limits are varied.

A rough outline of the procedure is as follows:

1. Set NA = NB = NC = 1
2. if NA * NB * NC >= min_products, we are done
3. sim_best = 0, R_best = None
4. for R in (A, B, C):
       NR += 1  # Temporarily add new reactant R_new
       for each (a, b, c) in S(NA, NB, NC), where R_new is in (a, b, c)
           FP_abc = FP(a) | FP(b) | FP(c)  # Logical OR fingerprint
           sim = Tanimoto(FP_abc, FP_query)
           if sim > sim_best:
               sim_best = sim
               R_best = R
        NR -= 1  # Remove new reactant
5. NR_best += 1  # Expand limit to include best new reactant
6. Go to step 2

This approach is superior to assigning equal limits, such as (10, 10, 10) if 1000 products are desired. In many cases, the algorithm finds limits that are quite ragged, such as (2, 50, 10), and the enumerated compound with the highest similarity to the query is found at some non-obvious position, such as (1, 45, 7).

class schrodinger.application.combinatorial_screen.combinatorial_screener.CombinatorialScreener(reactant_fp_files, query_smiles, max_reactants=None, reactant_classes=None)¶

Bases: object

Identifies subsets of combinatorial reactants that are most likely to yield products with the highest dendritic Tanimoto similarities to a query.

__init__(reactant_fp_files, query_smiles, max_reactants=None, reactant_classes=None)¶

Constructor taking the names of dendritic fingerprint files for one or more classes of reactants, the SMILES string for a query, an optional cap on the reactant subset size and, if supplying a multi- reactant fingerprint file created by RfpDatabase, the names of the reactant classes within that file to utilize.

If supplying a separate Canvas fingerprint file for each reactant class, those files must contain a single extra data column that holds the SMILES of the reactants. If the same fingerprint file is supplied more than once, each instance is treated as a separate reactant class, but the file is read only once, and all screens will yield identical reactant subsets for all instances of that reactant class.

If supplying a single fingerprint file created by RfpDatabase, the format is guaranteed to be compatible with this class, and the reactant class names play a role that’s analogous to the different Canvas fingerprint file names in the previous use case.

Example usage with multiple Canvas fingerprint files::: reactant_fp_files = [‘alcohols.fp’, ‘halides-aryl.fp’, ‘thiols.fp’] query_smiles = ‘c1ccccc1Nc(nc(c23)[nH]cn3)nc2-c4cc(ccc4)-c5ccccc5’ screener = CombinatorialScreener(reactant_fp_files, query_smiles)
Example usage with a single RfpDatabase fingerprint file::: reactant_fp_file = ‘all_reactants.rfpdb’ query_smiles = ‘c1ccccc1Nc(nc(c23)[nH]cn3)nc2-c4cc(ccc4)-c5ccccc5’ reactant_classes = [‘alcohols’, ‘halides-aryl’, ‘thiols’] screener = CombinatorialScreener([reactant_fingerprint_file],

query_smiles, reactant_classes=reactant_classes)

Parameters

reactant_fp_files (list[str]) – List of reactant fingerprint files.
query_smiles (str) – SMILES string for query.
max_reactants (int) – Maximum allowed size of each reactant subset when a query is screened. The default is MAX_COMBOS ** 1/N, where N is the number of reactant groups.
reactant_classes (list[str]) – Names of reactant classes if using a single RfpDatabase fingerprint file.

screen(min_products)¶

Performs a similarity screen against the query to determine the number of reactants in each sorted group that are required to make the minimum number of enumerated products. These reactant counts are stored in self.reactant_limits.

Parameters: min_products (int) – Minimum number of theoretical products