Knowledge Base
Knowledge Base
Active Learning Glide Hardware Requirements
$SCHRODINGER/run -FROM glide glide_active_learning.py -h
Driver Requirements
The driver (master job) must run for the complete duration of the job without being interrupted. This means the compute resource on which it runs cannot be a spot or preemptable cloud instance. These nodes can be preempted (terminated) and if that happens your whole job will be lost.
The -DRIVERHOST
argument determines where the driver runs. Select a host entry that is for an on-demand (i.e. not preemptable) node type.
If sufficient licenses and computational resources are available to run multiple AL-Glide jobs simultaneously, it is recommended to configure the driver host entry so that it requests an entire node, to avoid multiple drivers potentially using the same node and scratch filesystem, and thereby doubling (or more) the space requirement.
Scratch Space
The amount of scratch space required on the driver host is related to the size of the input ligand file. Specifically, the driver host must have sufficient scratch space to accommodate the files described below.
Scratch requirements for an example are provided in red. All parameters are consistent with our recommendations for an ultra-large screen with Active Learning Glide.
Example Screen Parameters, based on 1 billion input ligands
- 1 billion drug-like ligands in SMILES format (100 GB)
- 3 iterations of active learning (
-iter 3
) - batch training size of 50 000 ligands (
-train_size 50000
) - the number of top ligands after each iteration retained is 100 million (
-keep 100000000
) - rescoring of the top 1 million ligands with Glide SP (
-num_rescore_ligands 1000000
) - write output poses in Maestro format for the rescored ligands (
-write_pose
)
Scratch Space Breakdown
- a copy of the input file: 100 GB
- the input file split into individual subjob input batches: 100 GB
- CSV files containing the predictions of the top 10% of each batch (sorted by uncertainty). They are used to select input ligands for each iteration of training: 30 GB
- CSV files containing the ligand_ml predictions for the ligands in all the batches: 100 GB×
num_iteration
- an output file for each iteration of training containing the predictions of the number of top-scoring compounds specified by the
-keep
command-line argument: 30 GB×num_iteration
- Optional: If
-num_rescore_ligand
is specified, a single CSV file containing the top rescored poses with Glide SP compounds as specified by-num_rescore_ligand
: 200 MB - Optional: If
-write_pose
is provided, a Maestro file containing the poses of the rescored ligands: 2 GB - Total space for this example: 622.2 GB (3 Iterations)
Memory
For a typical run of Active Learning Glide, 64 GB of RAM on the driver host is recommended.
This is based on the example workflow described above.
Subjob Requirements
Requirements for memory, disk space, and recommended Google Cloud Platform (GCP) instance type are listed below.
All values based on the example workflow described above.
ML Training
Nvidia T4 GPUs are recommended.
- scratch space: 600 GB
- memory: 64 GB (8 GB per CPU core)
- compatible with preemptible nodes: no
- recommended GCP node type: n1-highmem-8
ML Evaluation
- scratch space: 100 GB
- memory: 32 GB (4 GB per CPU core)
- compatible with preemptible nodes: yes
- recommended GCP node type: n2-standard-8
Glide Docking
- scratch space: 100 GB
- memory 32 GB (4 GB per CPU core
- compatible with preemptible nodes: yes
- recommended GCP node type: n2-standard-8
To ask a question or get help, please submit a support ticket or email us at help@schrodinger.com.