schrodinger.protein.getpdb module

Module for downloading PDB files from the web.

The data is retrieved from the RCSB. Current download URLs are documented at https://www.rcsb.org/docs/programmatic-access/file-download-services

Running this module is no different from using a web-browser to access the site - it’s just a different type of web client. Therefore this should cause no problems for the maintainers of that site and be within the terms and conditions of use.

Note that certain assumptions are made about the layout of the web site - changes there in future may make this script stop working.

Copyright Schrodinger, LLC. All rights reserved.

schrodinger.protein.getpdb.download_file(filename)

Download the given file from RCSB and save it to either CWD or temp dir with same name. Path to the written file is returned.

Parameters

filename (str) – File to download from RSCB web site.

Raises

requests.HTTPError – if error in connection to RCSB.

schrodinger.protein.getpdb.download_sf(pdb_code)

Download the ENT file for the given PDB ID, converts it to CNS format, and returns the CNS file name. Will raise a RuntimeError if either download or conversion fails.

Not every pdb has structure factor files deposited, and not every structure factor file will convert perfectly.

schrodinger.protein.getpdb.download_fasta(pdb_code)

Attempts to download the fasta file for the given PDB ID and chain.

Parameters

pdb_code (str) – PDB ID of the file to download

schrodinger.protein.getpdb.download_em_map(emdb_code)

Attempts to download the EM map file for the given EMDB ID.

Parameters

emdb_code (str) – EMDB ID of the map file to download

schrodinger.protein.getpdb.get_pdb(pdbid, source=0, caps_asis=False)

Attempts to get the specified PDB file from either the database or the web, depending on the source option. Default is AUTO, which attempts the database first, and then the web.

pdbid - string of 4 characters source - one of: AUTO, DATABASE, WEB.

Parameters

caps_asis (bool) – True if the capitalization of pdbid should be preserved, False (default) if it should be converted to lowercase.

Returns

Path to the PDB file that was written (*.pdb or *.cif)

Return type

str

Raises
  • requests.HTTPError – if error in connection to RCSB

  • RuntimeError – for other error retreiving file

schrodinger.protein.getpdb.retrieve_pdb(pdbid, local_repos=None, verbose=False, caps_asis=False)

Attempt to retrieve the PDB from the local repository

First we look for current files ending in .gz or .Z, then obsolete files with the same endings. The file name we search for is:

pdbXXXX.ent.Y where XXXX is the PDB code and Y is either gz or Z

Parameters
  • pdbid (str) – the PDB code of the desired file

  • local_repos (list of str) – the paths to the parent directories of each local repository.

  • caps_asis (bool) – True if the capitalization of pdbid should be preserved, False (default) if it should be converted to lowercase.

Return type

str

Returns

the name of the pdb file or None if a failure occurs

schrodinger.protein.getpdb.find_local_repository(verbose=False)

Determine a directory list for local repositories.

Note: the location of the PDB directory can be specified via environment variables; the order of precedence is: * SCHRODINGER_PDB * SCHRODINGER_THIRDPARTY/database/pdb * SCHRODINGER/thirdparty/database/pdb (the default)

Parameters

verbose (bool) – True if debugging messages should be printed to the screen

Return type

list of str

Returns

the paths to the parent directories of each local repository. Returns an empty list if the local repository cannot be determined.

schrodinger.protein.getpdb.find_local_pdb(pdbid, local_repos=None, verbose=False, caps_asis=False)

Check a series of local directories and filenames for the PDB files.

First we look for current files ending in .gz or .Z, then obsolete files with the same endings. The file name we search for is:

pdbXXXX.ent.Y where XXXX is the PDB code and Y is either gz or Z

Note: the location of the PDB directory can be specified via environment variables; the order of precedence is: * SCHRODINGER_PDB * SCHRODINGER_THIRDPARTY * SCHRODINGER/thirdparty (the default)

Parameters
  • pdbid (str) – the PDB code of the desired file

  • local_repos (list of str) – the paths to the parent directories of each local repository.

  • verbose (bool) – True if debug messages should be printed out

  • caps_asis (bool) – True if the capitalization of pdbid should be preserved, False (default) if it should be converted to lowercase.

Return type

str

Returns

the path to an existing file ith the desired PDB code

schrodinger.protein.getpdb.download_pdb(pdb_code, biological_unit=False, try_as_cif=True)

Download the PDB record from www.rcsb.org into the CWD. If the PDB is too large to be downloaded as *.pdb file, it will be saved as *.cif.

Parameters
  • pdb_code (str) – Four character alphanumeric string for the PDB id.

  • biological_unit (bool) – If True, and the file needs to be downloaded, then download the file at the biological unit URL, otherwise use the typical record URL. Default is False, get the typical record. # NOTE: This option is no longer used by PrepWizard, but still # used by getpdb_utility.py ($SCHRODINGER/utilities/getpdb)

  • try_as_cif (bool) – Whether to try downloading the file as CIF format if the structure is too large to be represented in PDB format.

Returns

Path to the downloaded file.

Return type

str

Raises
  • requests.HTTPError – if error in connection to RCSB or pdb ID does not exist

  • RuntimeError – for other error retreiving file

schrodinger.protein.getpdb.download_cif(pdb_code)

Download *.cif file from Web for a given PDB code.

Parameters

pdb_code (str) – Four character alphanumeric string for the PDB id.

Returns

Path to the downloaded file.

Return type

str

Raises

requests.HTTPError – if error in connection to RCSB or pdb ID does not exist

schrodinger.protein.getpdb.requests_retry_session(max_retries=3, backoff_factor=0.3, status_forcelist=(500, 502, 503, 504), session=None)

Return a session to connect to a web url. In case of network failures the session will retry (number of re-attempts allowed is specified by retries) to connect to the url.

Parameters
  • retries (int) – Total number of retries allowed

  • backoff_factor (float) – Backoff factor to apply between attempts after the second try. urllib3 will sleep for: {backoff factor} * (2 ** ({number of total retries} - 1)) seconds before making next attempt.

  • status_forcelist (iterable of int) – Http error status codes for which retry will happen

  • session (requests.Session) – A session object

Returns

A session object

Return type

requests.Session

schrodinger.protein.getpdb.retrieve_ent(pdbid)

Retrieves the ENT file for the specified PDB ID from the third-party database and copies it to the CWD. File path is returned.

Raises RuntimeError on error.

schrodinger.protein.getpdb.download_ent(pdbid)

Downloads the ENT file for the specified PDB ID from the RCSB web site, and saves it to the CWD. File path is returned.

Raises
  • requests.HTTPError – if error in connection to RCSB

  • RuntimeError – for other error retreiving file

schrodinger.protein.getpdb.get_ent(pdbid, source=0)

Attempts to get the specified ENT file from either the database or the web, depending on the source option. Default is AUTO, which attempts the database first, and then the web.

pdbid - string of 4 characters source - one of: AUTO, DATABASE, WEB.

Raises
  • requests.HTTPError – if error in connection to RCSB

  • RuntimeError – for other error retreiving file

schrodinger.protein.getpdb.open_filename(filename, mode, encoding=None)

Opens a filename, or a temporary filename, if filename is not writeable. The name may change and is accessible via name attribute on file object.

schrodinger.protein.getpdb.download_reflection_data(pdbid)

Attempt to download reflection data type pdbid: str param pdbid: PDB ID