Source code for schrodinger.application.vss.dise

'''
DIrected Sphere Exclusion - like selection of ordered compounds. Typically
used to enhance diversity in virtual screen results. An initial list of
the top cpds is elected without respect to diversity, then subsequent
members are compared, in input order, against the members in the list.
If the subsequent cpd is distinct with respect to fingerprint similarity,
it is added to the list. The ordered input is traversed until the
desired number of cpds is reached or the input is exhausted.

Generally speaking, virtual screen scoring functions are aimed at
enrichment, and don't accurately rank order activies. Thus, a modest
difference in score values may not be meaningful. Furthermore, the input
cpds to a virtual screen often contain congeneric series with relatively
low diversity. The aim of this tool is to balance the tension between
'best scores' and 'chemical diversity'. The premise is that a 'novel'
candidate cpd is potentially more valuable than a 'degenerate' candidate
of roughly equal score.

In practice, 2-4 seed/threshold parameter sets are explored in
independent exercises. The result sets are compared with respect
to score distributions and similarity of nearest neighbor metrics.
Leading to the selection of the set that presents an acceptable balance
of score and diversity.
'''

from schrodinger.application.canvas import fingerprint as canvas_fp
from schrodinger.application.canvas import similarity as canvas_sim
from schrodinger.infra.canvas import ChmMol
from schrodinger.utils import log

DEFAULT_SIMILARITY_METRIC = 'Tanimoto'
DEFAULT_FINGERPRINT_TYPE = 'MolPrint2D'
DEFAULT_SIMILARITY_THRESHOLD = 0.6

logger = log.get_output_logger('vss')


[docs]def get_dise(num_seeds, *, similarity_threshold=DEFAULT_SIMILARITY_THRESHOLD, fingerprint_type=DEFAULT_FINGERPRINT_TYPE, similarity_metric=DEFAULT_SIMILARITY_METRIC): ''' Returns a closure that implements DiSE filtering. >>> selector = get_dise( ... 1, similarity_threshold=0.1, fingerprint_type='Radial') >>> selector(ChmMol.fromSMILES('C')) True, None, None >>> selector(ChmMol.fromSMILES('CC')) True, 0.0, 1 >>> selector(ChmMol.fromSMILES('CCC')) False, 0.2, 2 :param num_seeds: Number of "seed" compounds. :type num_seeds: int :param similarity_threshold: Similarity threshold. :type similarity_threshold: float :param fingerprint_type: (Canvas) fingerprint type. See `schrodinger.application.canvas.fingerprint.CanvasFingerprintGenerator` for possible values. :type fingerprint_type: str :param fingerprint_type: (Canvas) similarity metric. See `schrodinger.application.canvas.fingerprint.CanvasFingerprintSimilarity` for possible values. :type fingerprint_type: str :return: Callable that accepts `schrodinger.structure.Structure` or `schrodinger.canvas.ChmMol` and returns (distinct, similarity, index) tuple, where `distinct` is of type bool, `similarity` and `index` are of type float and int respectivelly (or None for the first `num_seeds` structures). The `index` is the index of a most similar distinct structure already seen. :rtype: Callable. ''' fp_gen = canvas_fp.CanvasFingerprintGenerator(logger=logger) fp_gen.setType(fingerprint_type) fp_gen.setAtomBondTyping(fp_gen.getDefaultAtomTypingScheme()) fp_sim = canvas_sim.CanvasFingerprintSimilarity(logger=logger) fp_sim.setMetric(similarity_metric) kept_fingerprints = [] def dise(st): fp = fp_gen.generate(st, chmmol=isinstance(st, ChmMol)) if len(kept_fingerprints) < num_seeds or not kept_fingerprints: kept_fingerprints.append(fp) return (True, None, None) else: sim, idx = max((fp_sim.calculateSimilarity(fp, seen_fp), i) for (i, seen_fp) in enumerate(kept_fingerprints, 1)) distinct = sim < similarity_threshold if distinct: kept_fingerprints.append(fp) return (distinct, sim, idx) return dise