schrodinger.livedesign.biologics.sequence module

class schrodinger.livedesign.biologics.sequence.AlignedSequence(sequence: str, identity: float = None, similarity: float = None)

Bases: object

sequence: str
identity: float = None
similarity: float = None
fromProteinSequence(ref_seq: Optional[schrodinger.protein.sequence.ProteinSequence] = None)
__init__(sequence: str, identity: Optional[float] = None, similarity: Optional[float] = None) None
schrodinger.livedesign.biologics.sequence.subsequence_matches(match_mol: schrodinger.protein.helm._helm_parser.HelmModel, query_mol: schrodinger.protein.helm._helm_parser.HelmModel) Iterator[str]

Return matches on query_polymer in match_mol. Splits out the lone HelmPolymer in query_mol and uses it for naive string search.

Parameters
  • match_mol – molecule to search over for matches

  • query_mol – molecule to find matches of

Returns

number of matches found in match_mol of query_mol

schrodinger.livedesign.biologics.sequence.subsequence_matches_polymer(match_mol: schrodinger.protein.helm._helm_parser.HelmModel, query_polymer: schrodinger.protein.helm._helm_parser.HelmPolymer) Iterator[str]

Return matches on query_polymer in match_mol.

Parameters
  • match_mol – molecule to search over for matches

  • query_polymer – polymer (one chain) to find matches of

Returns

matches found in match_mol of query_mol

schrodinger.livedesign.biologics.sequence.get_annotations_for_helm_model(model: schrodinger.protein.helm._helm_parser.HelmModel) Dict[str, Dict[str, Union[Tuple[int, int], List[str]]]]

HelmModels reorder polymer chains to canonicalize input, which means that the same polymer can have two different polymer ids in two models if those two models contain different peptide polymers. This function goes back through a HELM model and computes the mapping between each antibody chains and its constituent region annotation.

Parameters

model – HelmModel to extract annotations for

Returns

a map from polymer id to a dictionary mapping antibody regions to monomer indices in the corresponding simple polymer.

schrodinger.livedesign.biologics.sequence.get_ab_chain_annotations(polymer: schrodinger.protein.helm._helm_parser.HelmPolymer) Dict[str, Union[Tuple[int, int], List[str]]]

Returns the antibody sequence annotations for a HelmPolymer encoding a protein.

schrodinger.livedesign.biologics.sequence.get_annotations(fasta_sequence: str) Dict[str, Union[Tuple[int, int], List[str]]]

Cheap cache wrapper around antibody.SeqType to reduce the cost of calling get_annotations for each RegistrationData object.

schrodinger.livedesign.biologics.sequence.get_seqtype(fasta_sequence: str) schrodinger.application.prime.packages.antibody.SeqType

Cheap cache wrapper around antibody.SeqType to reduce the cost of calling SeqType twice (once for classification and once for annotation).

schrodinger.livedesign.biologics.sequence.align_sequences(sequences: List[str], ref_seq_index: Optional[int] = None) List[schrodinger.livedesign.biologics.sequence.AlignedSequence]

Returns aligned sequences as a FASTA string.

Parameters
  • sequences – sequences to align

  • ref_seq_index – if not None, all sequences are pairwise aligned using the sequence at ref_seq_index as a reference sequence

Returns

FASTA string of the aligned sequences

schrodinger.livedesign.biologics.sequence.align_all_to_reference(aln: schrodinger.protein.alignment.ProteinAlignment, ref_seq_index: int) None

Aligns a given ProteinAlignment pairwise with respect to the specified reference sequence. Due to the way alignments were implemented, (see protein.alignment.BaseAlignment) ref_seq must be a sequence already in the alignment. The input ProteinAlignment is modified and not returned.

Parameters
  • aln – the alignment to be aligned

  • ref_seq – the ProteinSequence instance corresponding to the reference sequence. Must be already in the alignment and discoverable by aln.index(ref_seq).

schrodinger.livedesign.biologics.sequence.multiple_align(aln: schrodinger.protein.alignment.ProteinAlignment) None

Aligns a given ProteinAlignment via multiple sequence alignment. The input ProteinAlignment is modified and not returned.

Parameters

aln – the alignment to be aligned