schrodinger.application.matsci.mlearn.features module

Classes and functions to deal with ML features.

Copyright Schrodinger, LLC. All rights reserved.

class schrodinger.application.matsci.mlearn.features.MomentData(flag, components, header, units)

Bases: tuple

__contains__(key, /)

Return key in self.

__len__()

Return len(self).

components

Alias for field number 1

count(value, /)

Return number of occurrences of value.

flag

Alias for field number 0

header

Alias for field number 2

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

units

Alias for field number 3

schrodinger.application.matsci.mlearn.features.DescriptorUtility

alias of schrodinger.application.matsci.mlearn.features.DescriptorUtilitity

schrodinger.application.matsci.mlearn.features.get_distance_cell(struct, cutoff)

Create an infrastructure Distance Cell. Struct MUST have the Chorus box properties.

Parameters
Return type

schrodinger.structure.Structure, , schrodinger.infra.structure.DistanceCell, schrodinger.infra.structure.PBC

Returns

Supercell, an infrastructure Distance Cell that accounts for the PBC, and the pbc used to create it.

Raise

ValueError if struct is missing PBCs

schrodinger.application.matsci.mlearn.features.elemental_generator(struct, element, is_equal=True)
schrodinger.application.matsci.mlearn.features.get_anion(struct)

Get the most electronegative element in the structure (anion).

Parameters

struct (schrodinger.structure.Structure) – Input structure

Return type

str, float, int

Returns

Element, it’s electronegativity, number of anions in the cell

class schrodinger.application.matsci.mlearn.features.LatticeFeatures(features, element='Li', cutoff=4.0)

Bases: schrodinger.application.matsci.mlearn.base.BaseFeaturizer

Class to generate lattice-based features.

FEATURES = {'anionFrameCoordination': 'Anion frame coordination', 'avgAnionAnionShortDistance': 'Average anion anion shortest distance', 'avgAtomicVol': 'Average atomic volume', 'avgElementAnionShortDistance': 'Average cation anion shortest distance', 'avgElementNeighborCount': 'Average cation count', 'avgNeighborCount': 'Average neighbor count', 'avgNeighborIon': 'Average neighbor ionicity', 'avgShortDistance': 'Average cation cation shortest distance', 'avgSublatticeEneg': 'Average sublattice electronegativity', 'avgSublatticeNeighborCount': 'Average sublattice neighbor count', 'avgSublatticeNeighborIon': 'Average sublattice neighbor ionicity', 'packingFraction': 'Crystal packing fraction', 'pathWidth': 'Average straight-line path width', 'pathWidthEneg': 'Average straight-line path electronegativity', 'ratioCount': 'Ratio of average cation to sublattice count', 'ratioIonicity': 'Ratio of average cation to sublattice electronegativity', 'stdNeighborCount': 'Standard deviation of neighbor count', 'stdNeighborIon': 'Standard deviation of neighbor ionicity', 'sublatticePackingFraction': 'Sublattice packing fraction', 'volPerAnion': 'Volume per anion'}
__init__(features, element='Li', cutoff=4.0)

Initialize the object.

runFeature(feature)

Get result from a feature.

Param

feature: One of the features listed in FEATURES.

Return type

int or float

Returns

Feature value

transform(structs)

Get numerical features from structures. Also sets features names in self.labels. See parent class for more documentation.

Parameters

structs (list(schrodinger.structure.Structure)) – List of structures to be featurized

Return type

numpy array of shape [n_samples, n_features]

Returns

Transformed array

avgAtomicVol()

Get average atomic volume.

Parameters

struct (schrodinger.structure.Structure) – Structure to be used for feature calculation

Return type

float

Returns

Average atomic volume (A^3)

avgNeighborCount()

Get average neighbor count.

Return type

float

Returns

Average neighbor count

stdNeighborCount()

Get standard deviation of neighbor count.

Return type

float

Returns

Average neighbor count

avgSublatticeEneg()

Get average sublattice electronegativity.

Return type

float

Returns

Average sublattice electronegativity

avgSublatticeNeighborCount()

Get average sublattice neighbor count.

Return type

float

Returns

Average sublattice neighbor count

avgNeighborIon()

Get average neighbor ionicity.

Return type

float

Returns

Average neighbor ionicity

stdNeighborIon()

Get standard deviation of neighbor ionicity.

Return type

float

Returns

Average neighbor ionicity

avgSublatticeNeighborIon()

Get average sublattice neighbor ionicity.

Return type

float

Returns

Average sublattice neighbor count

volPerAnion()

Get volume per anion.

Return type

float

Returns

Volume per anion

packingFraction(skip_element=None)

Get packing fraction of the crystal.

Parameters

skip_element (str) – Element to skip

Return type

float

Returns

Packing fraction

effectiveRadius(atom)

Get atom effective radius.

Parameters

atom (schrodinger.structure._StructureAtom) – Atom

Return type

float

Returns

Effective radius

sublatticePackingFraction()

Get packing fraction of the sublattice crystal.

Return type

float

Returns

Packing fraction

avgElementNeighborCount()

Get average element neighbor count.

Return type

float

Returns

Average number of bonds per element

avgAnionAnionShortDistance()

Get average anion anion shortest distance.

Return type

float

Returns

Average anion anion shortest distance

avgElementAnionShortDistance()

Get average element anion shortest distance.

Return type

float

Returns

Average element anion shortest distance

avgShortDistance()

Get average element element shortest distance.

Return type

float

Returns

Average element element shortest distance

anionFrameCoordination()

Get anion framework coordination.

Return type

float

Returns

Anion framework coordination

pathWidth(eval_eneg=False)

Evaluate average straight line path width. See the reference in the constructor for more info.

Parameters

eval_eneg (bool) – If True, return average over electronegativity, instead of distance

Return type

float

Returns

Average path or electronegativity

pathWidthEneg()

Evaluate average straight line path electronegativity.

Return type

float

Returns

Average electronegativity along the path

ratioIonicity()

Get ratio ionicity.

Return type

float

Returns

Ratio ionicity

ratioCount()

Get ratio neighbor count.

Return type

float

Returns

Ratio neighbor count

fit(data, data_y=None)

Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.

Parameters
  • data (numpy array of shape [n_samples, n_features]) – Training set

  • data_y (numpy array of shape [n_samples]) – Target values

Return type

BaseFeaturizer

Returns

self object with fitted data

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

transform{“default”, “pandas”}, default=None

Configure output of transform and fit_transform.

  • "default": Default output format of a transformer

  • "pandas": DataFrame output

  • None: Transform configuration is unchanged

selfestimator instance

Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

**paramsdict

Estimator parameters.

selfestimator instance

Estimator instance.

class schrodinger.application.matsci.mlearn.features.Ligand(st, metal_atom, new_to_old, coordination_idxs)

Bases: object

Manage a ligand.

__init__(st, metal_atom, new_to_old, coordination_idxs)

Create an instance.

Parameters
getVec(point)

Return a vector pointing from the metal atom to the given point.

Parameters

point (numpy.array) – the point in Ang.

Return type

numpy.array

Returns

the vector in Ang.

getCentroid(st, idxs)

Return the centroid vector of the given coordination atom indices.

Parameters
Return type

numpy.array

Returns

the centroid vector in Ang.

getCoordinationVec(st, idxs)

Return a coordination vector pointing from the metal atom to the centroid of the given coordination atom indices.

Parameters
Return type

numpy.array

Returns

the coordination vector in Ang.

getStoichiometry()

Return the stoichiometry.

Return type

str

Returns

the stoichiometry

getDenticity()

Return the denticity.

Return type

int

Returns

the denticity

getHapticity()

Return the hapticity.

Return type

int

Returns

the hapticity

getHapticCharacter()

Return the haptic character.

Return type

int

Returns

the haptic character

getBiteAngle()

Return the bite angle in degrees.

Return type

float or None

Returns

the bite angle in degrees

getAtomConeAngle(atom)

Return the cone angle for the given atom in degrees.

Parameters

atom (schrodinger.structure._StructureAtom) – the atom

Return type

float

Returns

the cone angle for the given atom in degrees

getConeAngle()

Return the cone angle in degrees.

Return type

float

Returns

the cone angle in degrees

getBondLength()

Return the bond length in Ang.

Return type

float

Returns

the bond length in Ang.

getDescriptors()

Return descriptors.

Return type

dict

Returns

(label, data) pairs

class schrodinger.application.matsci.mlearn.features.Complex(st, logger=None, nonmetallic_centers=())

Bases: object

Manage a complex.

BURIED_VOLUME_VDW_SCALE = 1.17
CONTOURS_DIR = 'contours'
__init__(st, logger=None, nonmetallic_centers=())

Create an instance.

Parameters
  • st (schrodinger.structure.Structure) – the structure

  • logger (logging.Logger or None) – output logger or None if there isn’t one

  • nonmetallic_centers (tuple) – Tuple of nonmetallic elements to also consider when looking for center atom

setMetalAtom()

Set the metal atom.

setLigands()

Set the ligands.

getBondAngle()

Return the bond angle in degrees.

Return type

float

Returns

the bond angle in degrees

getVDWSurfaceArea()

Return the VDW surface area in Angstrom^2.

Return type

float

Returns

the VDW surface area in Angstrom^2

getVDWVolume(vdw_scale=1, buffer_len=2)

Return the VDW volume in Angstrom^3.

Parameters
  • vdw_scale (float) – the VDW scale

  • buffer_len (float) – a shape buffer lengths in Angstrom

Return type

float

Returns

the VDW volume in Angstrom^3

getBuriedVolumeStructure(only_largest_ligands=False)

Return a copy of the structure without the metal atom. If only_largest_ligands is True, it will only contain the largest ligand or multiple copies thereof if it is symmetric.

Parameters

only_largest_ligands (bool) – Whether small ligands should be deleted

Return type

schrodinger.structure.Structure

Returns

the structure containing some or all ligands

getBuriedVDWVolumePct(struct, vdw_scale=1.17)

Return the buried VDW volume percent.

Parameters
  • struct (structure.Structure) – The structure to get buried volume for

  • vdw_scale (float) – the VDW scale

Return type

float

Returns

the buried VDW volume percent

getFreeVolumeVector()

Return a unit vector pointing from the metal atom of the complex in the direction of free volume.

Return type

numpy.array

Returns

the free volume unit vector

getRotatedComplex()

Return a copy of the complex that is rotated so that the free volume vector points along the positive z-axis.

Return type

structure.Structure

Returns

A rotated copy of the input structure

exportBuriedVolumeContour(sphere_radius=3.5, vdw_scale=1.17, num_bins=30, seed=1234)

Export the buried volume contour for the complex

Parameters
  • sphere_radius (float) – The radius for the sphere to sample points in

  • vdw_scale (float) – The VdW scale factor to apply to VdW radii when checking to see if a point is “inside” an atom

  • num_bins (int) – The number of bins in x and y direction to put the points in

  • seed (int) – Seed for random number generation

Return type

str, str

Returns

The paths to contour png and csv files

plotContour(points)

Plot a contour for the passed points. matplotlib uses triangulation to create a grid for the contour.

Parameters

points (numpy.array) – The x, y, z values of points

getVectorizedDescriptors(jaguar_out_file)

Return vectorized descriptors which are instance specific descriptors that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.

Parameters

jaguar_out_file (str or None) – the name of a Jaguar *.out file from which descriptors will be extracted or None if there isn’t one

Return type

dict

Returns

(label, data) pairs

getDescriptors(no_organometallic=False)

Return descriptors.

Parameters

no_organometallic (bool) – Whether organometallic descriptors should be skipped

Return type

dict

Returns

(label, data) pairs

schrodinger.application.matsci.mlearn.features.get_unique_titles(sts)

Return a list of unique titles for the given structures.

Parameters

sts (list) – contains schrodinger.structure.Structure

Return type

list

Returns

the unique titles

class schrodinger.application.matsci.mlearn.features.ComplexFeatures(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)

Bases: schrodinger.application.matsci.mlearn.base.BaseFeaturizer

Class to generate features for metal complexes.

__init__(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, tpp=1, ligfilter=False, no_organometallic=False, nonmetallic_centers=(), canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)

Create an instance.

Parameters
  • jaguar (bool) – specify whether to calculate Jaguar features

  • jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here

  • tpp (int) – the number of threads for any Jaguar jobs

  • ligfilter (bool) – specify whether to calculate Ligfilter features

  • no_organometallic (bool) – Whether organometallic descriptors should be skipped

  • canvas (bool) – specify whether to calculate Canvas features

  • moldescriptors (bool or list) – specify whether to calculate Molecular Descriptors features. If it’s a list, it contains command line arguments for moldescriptors

  • include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.

  • save_files (bool) – Whether to save subjob files or not

  • logger (logging.Logger or None) – output logger or None if there isn’t one

runJaguar()

Run Jaguar on the given structures.

Return type

list

Returns

contains Jaguar *.out file names

getFeatures(structs, jaguar_out_files=None)

Return features dictionary for the given structures

Parameters
  • structs (list(schrodinger.structure.Structure)) – list of structures to be featurized

  • jaguar_out_files (list or None) – if Jaguar features should be calculated using existing Jaguar *.out files then specify the files here using the same ordering as used for any given structures

verifyJaguarOutfiles()

Run jaguar and get the out-files if the out-files have not been provided

getComplexDescriptors()

Create a Complex object for each structure and get their descriptors

Return type

dict

Returns

The descriptors from Complex for each structure

getJaguarDescriptors()

Return Jaguar descriptors for all structures. Sets Jaguar atom descriptors on structures.

Return type

dict

Returns

The jaguar descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors

getUtilityDescriptors()

Get the requested utility descriptors for all structures

Return type

dict

Returns

The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors

getDescriptorUtilityJob(descriptor_utility)

Get the job to run to generate the descriptors using the passed descriptor_utility for all structures

Parameters

descriptor_utility (DescriptorUtility) – The descriptor utility to run to get the descriptors

Return type

jobutils.RobustSubmissionJob

Returns

The job to run to generate the descriptors

getExtraMolecularDescriptorsProps(st, descriptor_utility)

Return any extra structure properties computed using the output from molecular descriptors.

Parameters
  • st (schrodinger.structure.Structure) – the structure output from molecular descriptors which has all output properties defined

  • descriptor_utility (DescriptorUtility) – the molecular descriptor utility containing the original job parameters

Return type

dict

Returns

pairs are property names and values

processUtilityDescriptorOutputs(jobs_dict)

Read the descriptors for all descriptor utilities that were run, and return them

Parameters

jobs_dict (dict) – Dictionary with DescriptorUtility as keys and jobs as values

Return type

dict

Returns

The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors

getMolecularDescriptorsJob()

Get the job to run to generate molecular descriptors for all structures

Return type

jobutils.RobustSubmissionJob

Returns

The job to run to generate the descriptors

static writeFingerprintFiles(structs)

Write fingerprint files for the given structures.

Parameters

structs (list(schrodinger.structure.Structure)) – list of structures to be fingerprinted

Return type

list

Returns

the fingerprint file names

log(msg, **kwargs)

Add a message to the log file

Parameters

msg (str) – The message to log

Additional keyword arguments are passed to the textlogger.log_msg function

fit(data, data_y=None)

Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.

Parameters
  • data (numpy array of shape [n_samples, n_features]) – Training set

  • data_y (numpy array of shape [n_samples]) – Target values

Return type

BaseFeaturizer

Returns

self object with fitted data

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

transform{“default”, “pandas”}, default=None

Configure output of transform and fit_transform.

  • "default": Default output format of a transformer

  • "pandas": DataFrame output

  • None: Transform configuration is unchanged

selfestimator instance

Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

**paramsdict

Estimator parameters.

selfestimator instance

Estimator instance.

transform(data)

Get numerical features. Must be implemented by a child class.

Parameters

data (numpy array of shape [n_samples, n_features]) – Training set

Return type

numpy array of shape [n_samples, n_features_new]

Returns

Transformed array

class schrodinger.application.matsci.mlearn.features.CrystalNNFeatures(preset='ops')

Bases: object

Calculates CrystalNN structure fingerprints as implemented in pymatgen

OPS_PRESET = 'ops'
CN_PRESET = 'cn'
__init__(preset='ops')

Create a structure featurizer

Parameters

preset (str) – One of OPS_PRESET or CN_PRESET class constants

featurize(struct)

Get CrystalNN fingerprints for the passed structure

:param structure.Structure The structure to get features for

Return type

list

Returns

List of CrystalNN fingerprints for the structure