Package schrodinger :: Package structutils :: Module sort
[hide private]
[frames] | no frames]

Module sort

A module for sorting structure files by Structure-level property values. The module supports multi-key sorting, 'block' sorting, and file merging.

'sort_criteria' and 'intra_block_sort_criteria' are lists of tuples, where each tuple is an ct-level property dataname and ascending/descending directive for that dataname. If a structure does not have a particular property it is assigned a None value. Python natively places None before 'something', which is the opposite of common table sort behaviors such as Excel or Maestro's Project table. The module overrides this behavior with the NONE_IS_LAST constant. If NONE_IS_LAST evaluates as True then None values appear after defined values when sorted in ascending order.

'Block sorting' is possible by using the auxiliary 'intra_block_sort_criteria' sort keys. Block sorting organizes structures into groups by the 'intra_block_sort_criteria' set of keys, then orders those groups by their leading member's 'sort_criteria'. Put another way, 'intra_block_sort_criteria' specifies how to organize structures *within* a block, and 'sort_criteria' specifies how to organize the blocks. If 'intra_block_sort_criteria' is None, then a simple multi-key sort is performed using the 'sort_criteria'. For example, if you have a pose file with multiple poses for each ligand-title, a useful global order is to have all poses with the same title in a contiguous block ordered by Emodel values, and title-blocks ordered by the Glide score of the first member in each title-block.

Copyright Schrodinger, LLC. All rights reserved

Classes [hide private]
  StructureFileSorter
A class to sort structure files by ct-level property values.
  DsuList
A class to sort a list with special behaviors.
Functions [hide private]
 
_in_memory_sort_ok(file_name, chunk_size=2500000)
Returns True if file_name is small enough to sort in memory, otherwise False.
 
sort_file(file_name, sort_criteria, out_file_name=None, intra_block_sort_criteria=None, no_split=False)
Sort structure file by the values of ct-level properties within the file.
 
split_file(file_name, max_count=10000, dir=None)
Returns a list of file names generated by splitting the original structures in file_name split into smaller files.
 
merge_files(file_list, sort_criteria, out_file_name, remove_file_list=True, sort_file_list=False, dir=None)
Combines pre-ordered structure files by their property values.
 
merge_pv_files(file_list, sort_criteria, out_file_name)
Combines pre-ordered pose viewer structure files by their property values.
 
merge_st_iters(structure_iters, sort_criteria, output_handle)
Combines pre-ordered structure iterators by their property values.
 
sort_file_in_memory(file_name, sort_criteria, out_file_name=None, intra_block_sort_criteria=None)
Orders the structures in file_name, keeping structures in memory during the sort operation.
 
_get_temp_file_name(dir=None, suffix='.mae')
Returns the path to a new temporary file that is safe to append structures to.
Variables [hide private]
  _version = '$Revision: 1.19 $'
  ASCENDING = 1
  DESCENDING = -1
  CHUNK_SIZE = 2500000
  MKTMPSUFFIX = '.mae'
  NONE_IS_LAST = True
  GLIDE_SP_KEY_1 = [('b_glide_receptor', 1), ('r_i_docking_score...
  GLIDE_SP_KEY_2 = [('s_m_title', 1), ('r_i_glide_emodel', 1)]
  GLIDE_XP_KEY_1 = [('b_glide_receptor', 1), ('r_i_docking_score...
  GLIDE_XP_KEY_2 = [('s_m_title', 1), ('i_glide_XP_PoseRank', 1)]
  GLIDE_HTVS_KEY_1 = [('b_glide_receptor', 1), ('r_i_docking_sco...
  GLIDE_HTVS_KEY_2 = [('s_m_title', 1), ('r_i_glide_emodel', 1)]
  logger = log.get_output_logger(__file__)
  __package__ = 'schrodinger.structutils'
Function Details [hide private]

_in_memory_sort_ok(file_name, chunk_size=2500000)

 

Returns True if file_name is small enough to sort in memory,
otherwise False.  

Test is based on the size, in bytes, of the first 100 structures in
file_name and the number of structures in file_name.  If the size in
bytes of the first 100 structures is less than chunk_size the file is
assumed to contain ligand-sized structures, otherwise it is assumed to
contain receptor-sized structures.  The type of structures determines
a limit on the structure count that can be sorted in memory:
1x10^3 receptor-sized structures, or 1x10^4 ligand-sized structures
(hardwired values).  If the count of structures in file_name is less
than the limit then the file should be sortable in memory.

file_name (string)
    Path to the structure file on which to operate.

chunk_size (int)
    The size, in bytes, to used to estimate the scale of structures
    in the file.  Default is the module constant CHUNK_SIZE.

sort_file(file_name, sort_criteria, out_file_name=None, intra_block_sort_criteria=None, no_split=False)

 

Sort structure file by the values of ct-level properties within
the file.

This is the central API that has some logic under the hood to choose
a good trade off between disk IO and memory use given the size of
the file.

file_name (string)
    Path to file upon which to operate.

sort_criteria (list of tuples)
    List of (m2io dataname, module constant) tuples.  These are
    the primary, secondary, ..., keys for sorting the structures,
    *or* blocks if intra_block_sort_criteria is defined, and
    optional ascending/descending constants.  e.g.: 
    [('s_m_title', sort.ASCENDING), ('r_i_glide_docking_score',
    sort.ASCENDING)]

out_file_name (string)
    Output structure file containing the sorted structures.
    If out_file_name is None, then the input file is clobbered with
    the results of the sort.  Default is to replace input file_name
    with sorted results.

intra_block_sort_criteria (list of tuples)
    Optional list of (m2io dataname, module constant) tuples for block
    sorting.  These are the primary, secondary, ..., keys for sorting
    the structures *within* blocks, and optional ascending/descending
    order constants.  Default is None, don't block sort.

no_split (bool)
    Deprecated option.  This option is currently ignored.  

split_file(file_name, max_count=10000, dir=None)

 

Returns a list of file names generated by splitting the original
structures in file_name split into smaller files.

file_name (string)
    Path to the structure file upon which to operate.

max_count (int)
    Maximum number of structures per sub-file.

dir (string)
    Path to the directory where the sub-files are written.  The
    default is the runtime current working directory.  There needs
    to be enough space to store effectively a copy of file_name.
    For really large files, /tmp is not a good location for most
    hosts.

merge_files(file_list, sort_criteria, out_file_name, remove_file_list=True, sort_file_list=False, dir=None)

 

Combines pre-ordered structure files by their property values. Input files are assumed to be sorted by default. Optionally the files can be sorted by the sort_criteria prior to merging by setting sort_file_list=True.

Parameters:
  • file_list (list) - List of paths for the structure files that will be merged.
  • sort_criteria (list) - List of (m2io dataname, module constant) tuples, which are the primary keys for sorting the structures.
  • out_file_name (string) - Path to the structure output file containing all the merged structures.
  • remove_file_list (boolean) - If True then the file names in file_list are removed from disk.
  • sort_file_list (boolean) - If True, then prior to merging, sort the files by 'sort_criteria'. Default is False, assume the file_list members are already sorted.
  • dir - Unused parameter.

Note: This function is not suited for handling pose viewer files because all receptors will be included in the output. See merge_pv_files.

merge_pv_files(file_list, sort_criteria, out_file_name)

 

Combines pre-ordered pose viewer structure files by their property
values.  Input files are assumed to be ordered.  Only the receptor
from the first pose viewer file is retained.

file_list (list)
    List of paths for the pose viewer files that will be merged.

sort_criteria (list of tuples)
    List of (m2io dataname, module constant) tuples, which are the
    primary keys for sorting the ligand structures.

out_file_name (string)
    Path to the structure output file containing all the merged
    structures.

merge_st_iters(structure_iters, sort_criteria, output_handle)

 

Combines pre-ordered structure iterators by their property values.

Parameters:
  • structure_iters - List of iterables that emit structure. Emitted structures can be a full structure, a MaestroText structure, or some other object with a property dictionary.
  • sort_criteria (list) - List of (m2io dataname, module constant) tuples, which are the primary keys for sorting the structures.
  • output_handle (An object with an append() method.) - Output stream to which the sorted structures are appended.

sort_file_in_memory(file_name, sort_criteria, out_file_name=None, intra_block_sort_criteria=None)

 

Orders the structures in file_name, keeping structures in memory
during the sort operation.

file_name (string)
    Path to file upon which to operate.

sort_criteria (list of tuples)
    List of (m2io dataname, module constant) tuples, which are the
    primary keys for sorting the structures and optional sort
    order constants.

out_file_name (string)
    Output structure file containing the sorted structures.
    If out_file_name is None then the input file_name is clobbered
    with the sorted results.

intra_block_sort_criteria (list of tuples)
    List of (m2io dataname, module constant) tuples, which are the
    properties for sorting the structures within groups, and optional
    sort order constants.

_get_temp_file_name(dir=None, suffix='.mae')

 

Returns the path to a new temporary file that is safe to append
structures to.

dir (string)
    Path to a directory with write permissions, where temporary
    files can be created.  Default is None, use the tempfile default,
    which appears to be /tmp.

suffix (string)
    Optional suffix for temporary files.  Default is module constant
    MKTMPSUFFIX.


Variables Details [hide private]

GLIDE_SP_KEY_1

Value:
[('b_glide_receptor', 1), ('r_i_docking_score', 1)]

GLIDE_XP_KEY_1

Value:
[('b_glide_receptor', 1), ('r_i_docking_score', 1)]

GLIDE_HTVS_KEY_1

Value:
[('b_glide_receptor', 1), ('r_i_docking_score', 1)]