schrodinger.application.phase.packages.shape_diversity module

Provides functionality for selecting diverse structures from shape screen hits.

Copyright Schrodinger LLC, All Rights Reserved.

schrodinger.application.phase.packages.shape_diversity.generate_shape_gpu_fingerprints(hits_file, fp_file_prefix, logger=None, progress_interval=10000)

Generates molprint2D fingerprints for shape_screen_gpu hits provided in Maestro or SD format. A separate fingerprint file <fp_file_prefix>_<i>.fp is generated from the hits produced by each shape query, where <i> runs from 1 to the number of shape queries. Query structures themselves are skipped, and the name stored for each row in a given fingerprint file corresponds to the 0-based position of the structure in the hits file.

Parameters
  • hits_file (str) – Name of shape_screen_gpu hits file

  • fp_file_prefix (str) – Prefix of output fingerprint files

  • logger (logging.Logger or NoneType) – Logger for info level progress messages

  • progress_interval (int) – Interval between progress messages

Returns

List of fp_name, fp_count tuples

Return type

list((str, int))

schrodinger.application.phase.packages.shape_diversity.get_min_pop(diverse_fraction)

Given the diverse fraction of hits to select, this function determines an appropriate minimum population for each chunk of hit space. Returns driver_utils.DEFAULT_MIN_POP if diverse_fraction is 0.025 or smaller. Otherwise, the minimum population is halved until the maximum number of diverse structures per chunk would not exceed TARGET_DIVERSE_PER_CHUNK. The returned value corresponds to the combinatorial_diversity -min_pop parameter.

Parameters

diverse_fraction (float) – Diverse fraction of hits to select

Returns

Minimum number of hits per chunk

Return type

int

Raise

ValueError if diverse_fraction is outside the legal range

schrodinger.application.phase.packages.shape_diversity.get_num_probes(num_hits, min_pop)

Determines an appropriate number of hit space probes for the specified number of hits and minimum population per hit space chunk. The number of probes will be driver_utils.DEFAULT_NUM_PROBES unless additional probes are needed to ensure that min_pop * 2**(num_probes - 1) is at least num_hits. The returned value corresponds to the combinatorial_diversity -ndim parameter.

Parameters
  • num_hits (int) – Total number of hits

  • min_pop (int) – Minimum number of hits per chunk

Returns

Number of probes

Return type

int

schrodinger.application.phase.packages.shape_diversity.get_shape_gpu_hits_positions(fp_file, fp_positions)

Given a fingerprint file created by generate_shape_gpu_fingerprints and a list of 0-based positions in that file, this function returns the corresponding 0-based positions in the hits file from which the fingerprint file was created. Accounts for the presence of shape queries in the hits file, grouping of hits by shape query and any hits for which fingerprint generation failed.

Parameters
  • fp_file (str) – Name of fingerprint file

  • fp_positions (Any iterable of int values) – 0-based positions in the fingerprint file

Returns

0-based positions into the hits file

Return type

list(int)

schrodinger.application.phase.packages.shape_diversity.select_shape_gpu_hits(hits_file_in, diverse_fraction, hits_file_out, fp_file_prefix, logger=None, progress_interval=10000)

Selects a specified fraction of structurally diverse hits from a shape_screen_gpu hits file and writes a new hits file containing the shape queries and only the diverse hits for each shape query.

Parameters
  • hits_file_in (str) – Name of input shape_screen_gpu hits file in Maestro or SD format

  • diverse_fraction (float) – Diverse fraction of hits to select

  • hits_file_out (str) – Name of output hits file in Maestro or SD format

  • fp_file_prefix (str) – Prefix of temporary fingerprint files that will be generated from the hits for each shape query

  • logger (logging.Logger or NoneType) – Logger for info level progress messages

  • progress_interval (int) – Interval between progress messages for fingerprint generation

Returns

Number of diverse hits written for each shape query

Return type

list[int]

Raise

ValueError if diverse_fraction is outside the legal range

schrodinger.application.phase.packages.shape_diversity.split_hits(fp_file, diverse_fraction)

Figuratively splits the hits in the supplied fingerprint file into approximately equal-sized chunks that occupy non-overlapping regions of fingerprint space. Returns lists of 0-based fingerprint row numbers that define the various chunks.

Parameters
  • fp_file (str) – Name of fingerprint file of hits

  • diverse_fraction (float) – Diverse fraction of hits to select

Returns

Lists of 0-based fingerprint row numbers of the chunks

Return type

list(list(int))

Raise

ValueError if diverse_fraction is outside the legal range