schrodinger.livedesign.biologics.registration module

This file serves as the primary entry point for Live Design biologics. In particular, it provides an API for: (1) Converting user input data into a canonical data structure for storage and database manipulation. (2) Computing a default set of properties and descriptors.

class schrodinger.livedesign.biologics.registration.BioPolymer(polymer_id: 'str', monomer_string: 'str', reg_data: 'RegistrationData', connections: "Set['BioPolymer']" = <factory>, neighbors: "Set['BioPolymer']" = <factory>)

Bases: object

polymer_id: str
monomer_string: str
reg_data: schrodinger.livedesign.registration.RegistrationData
connections: Set[BioPolymer]
neighbors: Set[BioPolymer]
addConnection(neighbor: BioPolymer)

Instead of tracking inter-polymer connections directly, maintain a connection list to find non-self neighbors.

property bio_class
property deduplication_hash
__init__(polymer_id: str, monomer_string: str, reg_data: schrodinger.livedesign.registration.RegistrationData, connections: Set[BioPolymer] = <factory>, neighbors: Set[BioPolymer] = <factory>) None
schrodinger.livedesign.biologics.registration.get_data_blocks(data: str, input_format: schrodinger.rdkit_extensions.Format, options: schrodinger.livedesign.registration.RegistrationOptions)

Iterates across serialized formats, yielding a single data block at a time. Fasta are parsed as a single block if a mapping is provided, otherwise each sequence is treated as a separate block. HELM is always returned as a single block.

Parameters
  • data – input text string

  • input_format – input format of the data

  • options – registration options

Returns

an iterator of data blocks

schrodinger.livedesign.biologics.registration.get_registration_data(data: str, input_format: schrodinger.rdkit_extensions.Format, options: Optional[schrodinger.livedesign.registration.RegistrationOptions] = None) Iterator[schrodinger.livedesign.registration.RegistrationData]

Given an input in the form of fasta or HELM text, yields either one RegistrationData per sequence, or, if given a specially-formatted single-entity fasta, processes the sequences hierarchically and yields data for the subunits and their larger construct

Notably, this function uses the Bioluminate antibody detection modules to detect antibody sequences and classify connected protein chains into their respective antibody class, if any, and then combines the antibody chains into well-defined larger constructs.

Parameters
  • data – input text string to be deserialized into RDKit CG mols

  • input_format – input format of the data

  • options – registration options

Returns

an iterator over the hierarchy of biologic entities in increasing complexity. For example, an antibody-drug conjugate would return: 1. antibody heavy and light chains 2. a small molecule 3. the arms of the antibody 4. the antibody 5. the antibody-drug conjugate

schrodinger.livedesign.biologics.registration.get_entity_class(helm_model: schrodinger.protein.helm._helm_parser.HelmModel) schrodinger.livedesign.entity_type.EntityClass

Gets the overall classification of the given helm model.

schrodinger.livedesign.biologics.registration.extract_biopolymer_graph(helm_model)
schrodinger.livedesign.biologics.registration.create_registration_data(helm_model: schrodinger.protein.helm._helm_parser.HelmModel, bio_class: schrodinger.livedesign.entity_type.EntityClass, registered_children: List[schrodinger.livedesign.registration.RegistrationData]) schrodinger.livedesign.registration.RegistrationData

Package the relevant information from the input helm model and return a RegistrationData object.

Parameters
  • helm_model – input helm model from which all properties are derived

  • bio_class – the entity class of the inpuyt model

  • registered_children – the list of child registration data

Returns

RegistrationData with the relevant fields populated.

schrodinger.livedesign.biologics.registration.count_residues_in_model(helm_model: schrodinger.protein.helm._helm_parser.HelmModel) int

Simple function to count residues in a helm model.

schrodinger.livedesign.biologics.registration.extract_registration_data(polymer: schrodinger.protein.helm._helm_parser.HelmPolymer, helm_model: schrodinger.protein.helm._helm_parser.HelmModel) schrodinger.livedesign.registration.RegistrationData

Helper function to extract the connectivity metadata and the Registration data for a particular polymer from given helm model.

Parameters
  • polymer – the helm polymer to extract, in RegistrationData format

  • helm_model – the helm model to extract the RegistrationData from

Returns

the RegistrationData instance for the polymer

schrodinger.livedesign.biologics.registration.build_polymer_graph(connections: Iterable[schrodinger.protein.helm._helm_parser.HelmConnection], polymer_dict: Dict[str, schrodinger.livedesign.biologics.registration.BioPolymer]) None

Simple function to store neighbors as a list in each BioPolymer instead of storing connections, in order to explore polymer neighborhoods efficiently.

Parameters
  • connections – list of HelmConnections defining the BioPolymer graph

  • polymer_dict – the dictionary of BioPolymers for easy BioPolymer management

schrodinger.livedesign.biologics.registration.find_antibodies(polymers: Iterable[schrodinger.livedesign.biologics.registration.BioPolymer], helm_model: schrodinger.protein.helm._helm_parser.HelmModel) List[schrodinger.livedesign.registration.RegistrationData]

Find and classify all contiguous antibody polymer combinations if connectivity is provided, otherwise just assemble them into one large antibody.

Parameters
  • polymers – list of BioPolymers to search for antibodies

  • helm_model – source HelmModel, used to generate new HelmModels

Returns

a list of RegistrationData for each found antibody construct

schrodinger.livedesign.biologics.registration.is_antibody_subunit(polymer: schrodinger.livedesign.biologics.registration.BioPolymer) bool

Utility function to increase readability.

schrodinger.livedesign.biologics.registration.find_abs_by_connectivity(polymer_graph: Iterable[schrodinger.livedesign.biologics.registration.BioPolymer], helm_model: schrodinger.protein.helm._helm_parser.HelmModel) List[schrodinger.livedesign.registration.RegistrationData]

Depth-first search approach to finding all directly connected antibody components in a helm polymer network.

Parameters
  • helm_model – source helm model to query for connectivity and extract HelmPolymers from for output subunits

  • polymer_graph – the list of all BioPolymers

Returns

a list of RegistrationData for each found (and recognized) antibody constructs

schrodinger.livedesign.biologics.registration.find_ab_chain(biopolymer: schrodinger.livedesign.biologics.registration.BioPolymer, used_polymers: Set[schrodinger.livedesign.biologics.registration.BioPolymer]) List[schrodinger.livedesign.biologics.registration.BioPolymer]

Recursive function to grow an antibody chain from one polymer to reach all reachable antibody fragments.

Parameters

used_polymers – the set of “seen” polymers to avoid infinite recursion

schrodinger.livedesign.biologics.registration.create_combined_ab_data(ab_polymer_chain: List[schrodinger.livedesign.biologics.registration.BioPolymer], helm_model: schrodinger.protein.helm._helm_parser.HelmModel) schrodinger.livedesign.registration.RegistrationData

Given a set of BioPolymers, extract them from the given helm_model and create a new RegistrationData object consisting only of the set.

Parameters
  • ab_polymer_chain – the collection of antibody Biopolymers to combine

  • helm_model – the source helm_model containing connections and HelmPolymer objects to extract

Returns

the RegistrationData of the final combined object

schrodinger.livedesign.biologics.registration.classify_ab_assembly(ab_polymer_chain: List[schrodinger.livedesign.biologics.registration.BioPolymer]) schrodinger.livedesign.entity_type.EntityClass

Given a bunch of antibodies, find a matching, recognized antibody construct and return it.

Parameters

ab_polymer_chain – the chain of connected antibody subunits

Returns

the overall antibody class the collective subunits create, if any

schrodinger.livedesign.biologics.registration.get_arms(ab_polymer_chain: List[schrodinger.livedesign.biologics.registration.BioPolymer], helm_model: schrodinger.protein.helm._helm_parser.HelmModel) Iterable[schrodinger.livedesign.registration.RegistrationData]

Given a list of connected BioPolymers, find the arms (light/heavy pairs) and return the registration data for each arm.

Parameters
  • ab_polymer_chain – the full set of connected antibody chains

  • helm_model – source helm model to extract helm strings and connections from.

Returns

generator over the available arms in the antibody polymer chain.

schrodinger.livedesign.biologics.registration.get_arms_by_connectivity(ab_polymer_chain: List[schrodinger.livedesign.biologics.registration.BioPolymer]) Iterable[schrodinger.livedesign.biologics.registration.BioPolymer]

Extract heavy/light arm pairs via HELM model connections. Each arm must be from a F(ab’)2, monospecific antibody, or bispecific antibody. An arm thus consists of a light Fab domain, connected to a heavy full or heavy Fab’ chain. The heavy chain can connect to other heavy chains, but two light chains sharing a heavy chain would not fall into any of the currently supported classes.

Parameters
  • ab_polymer_chain – the full set of connected antibody chains

  • helm_model – source helm model to extract helm strings and connections from.

Returns

generator over the available arms in the antibody polymer chain.

schrodinger.livedesign.biologics.registration.get_arms_by_annotation(ab_polymer_chain: List[schrodinger.livedesign.biologics.registration.BioPolymer], helm_model: schrodinger.protein.helm._helm_parser.HelmModel) Iterable[schrodinger.livedesign.biologics.registration.BioPolymer]

Extract heavy/light arm pairs via HELM annotation data instead of HELM model connections.

Parameters
  • ab_polymer_chain – the full set of connected antibody chains

  • helm_model – source helm model to extract helm strings and connections from.

schrodinger.livedesign.biologics.registration.replace_base_chains(subunit_reg_data: schrodinger.livedesign.registration.RegistrationData, children: List[schrodinger.livedesign.registration.RegistrationData], hash_dict: Dict[str, schrodinger.livedesign.registration.RegistrationData]) None

Helper function to recursively traverse the set of returned RegistrationData and replace base chain data with their combined subunit in the top-level child hash list.

Parameters
  • child_hashes – top-level hash list

  • hash_dict – all returned child hashes so far

  • parent – the current subunit to insert into child_hashes

schrodinger.livedesign.biologics.registration.find_na_entities(base_biopolymers, model: schrodinger.protein.helm._helm_parser.HelmModel)

Generate registration data for nucleic acid entities, if any. Note that until nucleic acid entities more complex than single- and double-stranded DNA/RNA are supported, this function will yield a single entity.

Parameters
  • base_biopolymers – list of BioPolymers to search for NA entities

  • model – source HELM model

Returns

registration data for each NA entity

schrodinger.livedesign.biologics.registration.determine_na_type(base_biopolymers)
schrodinger.livedesign.biologics.registration.get_display_string(canonical_helm_model: schrodinger.protein.helm._helm_parser.HelmModel) str

Externally available serialization API for HELM serialization.

Parameters

canonical_helm_model – canonicalized helm model

Returns

a HELM string

schrodinger.livedesign.biologics.registration.clear_annotations(helm_model: schrodinger.protein.helm._helm_parser.HelmModel)

param helm_model: a HelmModel

schrodinger.livedesign.biologics.registration.get_deduplication_hash(helm_model: schrodinger.protein.helm._helm_parser.HelmModel) str

Unified HELM model hasher for biologics registration, which discards all annotations before generating a hash.

Parameters

helm_model – a HelmModel to hash

Returns

sha1 hashed HELM string.

schrodinger.livedesign.biologics.registration.combine_helm(biopolymers: List[schrodinger.livedesign.biologics.registration.BioPolymer], helm_model: schrodinger.protein.helm._helm_parser.HelmModel) schrodinger.protein.helm._helm_parser.HelmModel

Function to ensure recombined HELM strings always canonicalize the same way. Sometimes indistinguishable simple polymers (other than who they’re connected to) can get swapped, changing the connection section and the resulting final hash.

Parameters

biopolymers – BioPolymer instances whose corresponding polymers will be extracted

Returns

a HelmModel consisting only of the specified biopolymers

schrodinger.livedesign.biologics.registration.light_heavy_heavy_light(light_chains: List[schrodinger.livedesign.biologics.registration.BioPolymer], heavy_chains: List[schrodinger.livedesign.biologics.registration.BioPolymer]) bool

Ensure that the connectivity is light-heavy-heavy-light, which correspondes to a full antibody.

Parameters
  • light_chains – the list of connected light chains

  • heavy_chains – the list of connected heavy chains

schrodinger.livedesign.biologics.registration.is_deoxyribose(helm_monomer: schrodinger.protein.helm._helm_parser.HelmMonomer) bool
schrodinger.livedesign.biologics.registration.is_standard_ribose(helm_monomer: schrodinger.protein.helm._helm_parser.HelmMonomer) bool
schrodinger.livedesign.biologics.registration.classify_polymer(helm_polymer: schrodinger.protein.helm._helm_parser.HelmPolymer) schrodinger.livedesign.entity_type.EntityClass

Given a HelmPolymer (a polymer consisting of only one type), infer its type.

Parameters

helm_polymer – helm_polymer to determine class of

schrodinger.livedesign.biologics.registration.classify_nucleotide_polymer(helm_polymer) schrodinger.livedesign.entity_type.EntityClass
schrodinger.livedesign.biologics.registration.get_seqtype(fasta_sequence: str, scheme: schrodinger.infra.util.AntibodyCDRScheme = AntibodyCDRScheme.Kabat) schrodinger.application.prime.packages.antibody.SeqType

Cheap cache wrapper around antibody.SeqType to reduce the cost of calling SeqType twice (once for classification and once for annotation).

schrodinger.livedesign.biologics.registration.classify_antibody(ab_classification) schrodinger.livedesign.entity_type.EntityClass
schrodinger.livedesign.biologics.registration.classify_protein(helm_polymer) schrodinger.livedesign.entity_type.EntityClass
schrodinger.livedesign.biologics.registration.classify_overall_molecule(polymer_chains: List[schrodinger.livedesign.biologics.registration.BioPolymer]) schrodinger.livedesign.entity_type.EntityClass

Given a collection of polymers, return the corresponding EntityClass.

Parameters

polymer_chains – the set of polymers to classify

schrodinger.livedesign.biologics.registration.helm_mol_to_binary(helm_model: schrodinger.protein.helm._helm_parser.HelmModel) str

Converts given Helm model to rdmol binary, and adds sequence annotation data for any antibodies detected.

Parameters

helm_model – input helm model to convert to rdmol binary

Returns

serialized binary with sequence annotations embedded as serialized json in the “antibody_regions” property.