Package schrodinger :: Package application :: Package glide :: Module ensemble_selection :: Class EnsembleSelection
[hide private]
[frames] | no frames]

Class EnsembleSelection

object --+
         |
        EnsembleSelection

Objects of this class select the "best"(*) ensembles of a specified size
when given as input the results from an exhaustive cross-docking
calculation on a set of N complexes (i.e., the NxN matrix of GlideScores).

The input can come from a CSV file or can be provided as a dict. The file
reading methods are provided as a convenience and are useful for testing, but
they have their limitations. Hence it is recommended that the data be
provided as dicts when possible.

(*) Two definitions of "best" are available: 1) RMSD vs experimental
DeltaG (see the 'best_ensembles_by_rmsd' method; 2) number of ligands that
can be docked "properly" by at least one receptor in the ensemble ( see
'best_ensembles_by_count'). "Properly" means that that the score is
lower than 'tol' plus the self-docking score for the ligand.

A couple of object attributes are available (should be considered read-only
outside this class):
    * titles
    * N = len(titles)

Instance Methods [hide private]
 
__init__(self, gscores=None, exp_dg=None, fname=None, exp_dg_fname=None, initial_seed=42, max_exhaustive=1000000, n_random_comb=100000, tol=0.5, docking_failure_penalty=10.0)
Constructor optionally takes a few parameters that determine the behavior of the selection algorithm or that provide the input data.
 
rmsd_comb(self, comb)
Calculate the RMS deviation between experimental DeltaG and the lowest GScore obtained for each ligand using a given combination 'comb' of receptors.
 
count_good_ligs(self, comb)
Return the count of ligands that can be docked "properly" by at least one receptor in the given combination 'comb' of receptors.
 
sample_combinations(self, n)
Return an iterator that produces a random sample of n-element combinations from the list of receptors contained in self.
 
combinations(self, n)
Return an iterator that produces n-element combinations from the list of receptors contained in self.
 
best_ensembles_by_count(self, n, nmax=15)
Return the 'nmax' best n-member ensembles by count of "properly docked ligands" as a list of Ensemble objects.
 
best_ensembles_by_rmsd(self, n, nmax=15)
Return the 'nmax' best n-member ensembles by RMSD of computed gscore vs exp DeltaG as a list of Ensemble objects.
 
self_docking_rmsd(self)
Return, well, the self-docking rms deviation of GScore vs exp DeltaG.
 
count_combinations(self, n)
Return the number of n-member combinations out of the list of N receptors held by the object (i.e., N!/(n! * (N-n)!)).
 
count_singletons(self)
Return the number of ligands that get a "good" score with only one receptor.
 
set_tol(self, tol)
Set the tolerance used for determining whether a ligand is docked "properly" into a receptor.
 
_compute_ssets(self)
 
set_gscores(self, gscores)
Set the GlideScore matrix.
 
set_exp_dg(self, exp_dg)
Set the experimental data used for computing RMSDs.
 
read_csv(self, fname)
Import data from a CSV file, which is expected to contain an N*N matrix of GlideScores, plus the titles on the header and on the first column.
 
read_exp_dg(self, fname)
Read a file with the experimental data.

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, gscores=None, exp_dg=None, fname=None, exp_dg_fname=None, initial_seed=42, max_exhaustive=1000000, n_random_comb=100000, tol=0.5, docking_failure_penalty=10.0)
(Constructor)

 
Constructor optionally takes a few parameters that determine the
behavior of the selection algorithm or that provide the input data.
The input data may be provided either as a dict or as a filename to
be parsed.
    * initial_seed: random seed used whenever the random sampling
        method is used. It may be set to None if non-reproducible
        results are desired.
    * max_exhaustive: the maximum number of combinations for a
        systmatic (exhaustive) search of the available combinations.
        When the number of combinations exceeds this number, random
        sampling is used instead.
    * n_random_comb: the number of iterations for random sampling.
    * tol: the tolerance used to determine if a ligand is docked
      "properly"
    * gscores: a dict of GlideScores to be passed on to the
        set_gscores method.
    * fname: a filename for a csv file to be passed to the read_csv
        method.
    * exp_dg: a dict of experimental DeltaGs to be passed on to the
        set_exp_dg method.
    * exp_dg_fname: a filename to be passed on to the read_exp method.
    * docking_failure_penalty: the assumed deviation between experimental
        DeltaG and GlideScore when the ligand fails to dock into the ensemble
        under consideration. Used by the rmsd_comb method.

Overrides: object.__init__

rmsd_comb(self, comb)

 

Calculate the RMS deviation between experimental DeltaG and the lowest GScore obtained for each ligand using a given combination 'comb' of receptors. 'comb' is a tuple of titles. NOTE: ligands that failed to dock into all of the receptors in 'comb' count as a deviation of 'docking_failure_penalty', as given to the constructor (10.0 kcal/mol by default).

count_good_ligs(self, comb)

 

Return the count of ligands that can be docked "properly" by at least one receptor in the given combination 'comb' of receptors. 'comb' is a tuple of titles.

sample_combinations(self, n)

 

Return an iterator that produces a random sample of n-element combinations from the list of receptors contained in self. The iterator will try to produce self.n_random_comb combinations, but it will skip duplicates so the actual number of combinations returned is likely to be smaller.

combinations(self, n)

 

Return an iterator that produces n-element combinations from the list of receptors contained in self. If the number of combinations is less than self.max_exhaustive, a complete list of combinations is produced. If it is larger, a random sample is produced by deferring to the sample_combinations method.

set_gscores(self, gscores)

 

Set the GlideScore matrix. 'gscores' must be a dict of dicts, where each value gscores[prot][lig] = gs is the GlideScore from docking lig into prot. Both lig and prot are titles.

Sets the following public attributes: * titles * N = len(titles)

set_exp_dg(self, exp_dg)

 

Set the experimental data used for computing RMSDs. 'exp_dg' should be a dict where the key is a ligand title and the value is the DeltaG.

read_csv(self, fname)

 

Import data from a CSV file, which is expected to contain an N*N matrix of GlideScores, plus the titles on the header and on the first column. Each column corresponds to a receptor and each row corresponds to a ligand. The set of ligand titles must be the same as the set of receptor titles. Ligands that failed to dock can be represented int the file either as a blank field or by the string "NA".

read_exp_dg(self, fname)

 
Read a file with the experimental data. The text file has two
whitespace-separated columns:
    1) title
    2) DeltaG