Knowledge Base

Active Learning Glide Hardware Requirements

Active Learning Glide runs on Linux only, and more information about the command-line syntax can be printed by:
$SCHRODINGER/run -FROM glide glide_active_learning.py -h

Driver Requirements

The driver (master job) must run for the complete duration of the job without being interrupted. This means the compute resource on which it runs cannot be a spot or preemptable cloud instance. These nodes can be preempted (terminated) and if that happens your whole job will be lost.

The -DRIVERHOST argument determines where the driver runs. Select a host entry that is for an on-demand (i.e. not preemptable) node type.

If sufficient licenses and computational resources are available to run multiple AL-Glide jobs simultaneously, it is recommended to configure the driver host entry so that it requests an entire node, to avoid multiple drivers potentially using the same node and scratch filesystem, and thereby doubling (or more) the space requirement.

Scratch Space

The amount of scratch space required on the driver host is related to the size of the input ligand file. Specifically, the driver host must have sufficient scratch space to accommodate the files described below.

Scratch requirements for an example are provided in red. All parameters are consistent with our recommendations for an ultra-large screen with Active Learning Glide.

Example Screen Parameters, based on 1 billion input ligands
  • 1 billion drug-like ligands in SMILES format (100 GB)
  • 3 iterations of active learning (-iter 3)
  • batch training size of 50 000 ligands (-train_size 50000)
  • the number of top ligands after each iteration retained is 100 million (-keep 100000000)
  • rescoring of the top 1 million ligands with Glide SP (-num_rescore_ligands 1000000)
  • write output poses in Maestro format for the rescored ligands (-write_pose)
Scratch Space Breakdown
  • a copy of the input file: 100 GB
  • the input file split into individual subjob input batches: 100 GB
  • CSV files containing the predictions of the top 10% of each batch (sorted by uncertainty). They are used to select input ligands for each iteration of training: 30 GB
  • CSV files containing the ligand_ml predictions for the ligands in all the batches: 100 GB×num_iteration
  • an output file for each iteration of training containing the predictions of the number of top-scoring compounds specified by the -keep command-line argument: 30 GB×num_iteration
  • Optional: If -num_rescore_ligand is specified, a single CSV file containing the top rescored poses with Glide SP compounds as specified by -num_rescore_ligand: 200 MB
  • Optional: If -write_pose is provided, a Maestro file containing the poses of the rescored ligands: 2 GB
  • Total space for this example: 622.2 GB (3 Iterations)

Memory

For a typical run of Active Learning Glide, 64 GB of RAM on the driver host is recommended.

This is based on the example workflow described above.

Subjob Requirements

Requirements for memory, disk space, and recommended Google Cloud Platform (GCP) instance type are listed below.

All values based on the example workflow described above.

ML Training

Nvidia T4 GPUs are recommended.

  • scratch space: 600 GB
  • memory: 64 GB (8 GB per CPU core)
  • compatible with preemptible nodes: no
  • recommended GCP node type: n1-highmem-8

ML Evaluation

  • scratch space: 100 GB
  • memory: 32 GB (4 GB per CPU core)
  • compatible with preemptible nodes: yes
  • recommended GCP node type: n2-standard-8

Glide Docking

  • scratch space: 100 GB
  • memory 32 GB (4 GB per CPU core
  • compatible with preemptible nodes: yes
  • recommended GCP node type: n2-standard-8

To ask a question or get help, please submit a support ticket or email us at help@schrodinger.com.

Back To Top