Active Learning Glide

System Requirements

Supported Operating Systems
Hardware Requirements
Scratch space example
Subjob Requirements
GPGPU Requirements

Supported Operating Systems

Linux

RedHat Enterprise Linux (RHEL) 7.8-7.9, 8.6, 8.8, 9.0, 9.2

Please make sure the listed packages are installed:

Required packages

sudo yum/dnf install <lib>

Rocky Linux 8.8, 9.2

Please make sure the listed packages are installed:

Required packages

sudo yum/dnf install <lib>

CentOS 7.8 - 7.9

Please make sure the listed packages are installed:

Required packages

sudo yum install <lib>

Ubuntu 20.04 LTS and 22.04 LTS

Please make sure the listed packages are installed:

Required packages

sudo apt-get install <lib>

NOTES:

All supported distributions have a glibc of 2.12 or greater.

If using NFS, file locking must be enabled.

Timeline

We aim to provide support for new operating system versions 6 months after their public release.

Support cannot be provided once an OS platform version has reached "end of life" (EOL). Check with your platform provider for EOL information.

Upcoming Changes

24-2: Drop support for SUSE 12/15

Due to limited use by customers and short support cycle lifetimes for SUSE service packs we are aiming to deprecate SUSE support and concentrate our testing, support and development resources on our other supported Linux OSs.

24-3: Drop support for RHEL/CentOS 7

RHEL/CentOS 7 will reach their end of life on June 30, 2024 and will no longer be supported after release 24-2.

To view a list of upcoming infrastructure changes that may require changes from your IT team click here.

Hardware Requirements

	Required	Considerations
Driver	The driver (master job) must run for the complete duration of the job without being interrupted. This means the computing resource on which it runs cannot be a spot or preemptible cloud instance. These nodes can be pre-empted (terminated) and if that happens your whole job will be lost. The -DRIVERHOST argument determines where the driver runs. Select a host entry that is for an on-demand (i.e. not preemptible) node type.	If sufficient licenses and computational resources are available to run multiple Active Learning Glide jobs simultaneously, it is recommended to configure the driver host entry so that it requests an entire node, to avoid multiple drivers potentially using the same node and scratch filesystem, and thereby doubling (or more) the space requirement.
Processor (CPU)	x86_64 compatible processor	For large jobs, computing on a cluster with a queueing system is recommended, with the following hardware components: A highly capable file server for the external network. Shared storage for the intra-cluster network, to reduce traffic to and from the external network. Fast processors, large memory, and high-quality motherboards and network interfaces, especially on the management nodes.
System memory (RAM)		The amount of system memory is relative to the size of the input ligand file. See example below.
Disk space		The amount of scratch space required on the DRIVERHOST is relative to the size of the input ligand file. See example below.

Scratch space example

Scratch requirements for an example are provided in red. All parameters are consistent with our recommendations for an ultra-large screen with AL-Glide.

For larger input files, please substitute the size of the input file to obtain correct estimates for your jobs.

Example of requirements based on inputs

Inputs for example:

1 billion drug-like ligands in SMILES format (100GB)

3 iterations of active-learning (-iter 3)

Batch training size of ligands. (-train_size 50000)

The top ligands after each iteration retained is 100M

Rescoring of the top 1M ligands with Glide SP (-num_rescore_ligands 1000000)

Write output poses in Maestro format for the rescored ligands (-write_pose)

Required Optional

a single copy of the input file 100 GB

The input ligand file split into individual sub-job input batches
100 GB

Series of csv files containing the predictions of the top 10% of each batch (sorted by uncertain). They are used to select input ligands for each iteration of training. 30 GB

Series of csv files containing the ligand_ml predictions for the ligands in all the batches 100GNot B*num_iteration

An output file for each iteration of training containing the predictions of the number of top-scoring compounds specified by the -keep command-line argument 30 GB

If -num_rescore_ligand is specified, a single csv file containing the top rescored poses with Glide SP compounds as specified by num_rescore_ligand. (200 MB)

If -write_pose is provided, a Maestro file containing the poses of the rescored ligands. (2 GB)

Total disk space required 620 G(100 + 100 + 130*3 + 30) 3 iterations 822 G (100 + 100 + 130*3 + 30 + 200 + 2)

Subjob Requirements

Requirements for memory, disk space, and recommended Google Cloud instance type are listed below.

All values based on the example workflow described above.

	ML Training*	ML Evaluation	Glide Docking
Scratch Space	600GB	100GB	100GB
Memory	64GB (8 GB/CPU core)	32GB (4 GB/CPU core)	32GB (4 GB/CPU core)
Compatible with Preemptible Nodes	No	Yes	Yes
Recommended GCP Node Type	n1-highmem-8	n2-standard-8	n2-standard-8

* Nvidia Tesla T4 GPUs recommended.

GPGPU Requirements

(General-purpose computing on graphics processing units)

We support the following NVIDIA solutions:

Achritecture	Server / HPC	Workstation
Maxwell	Tesla M40 Tesla M60
Pascal	Tesla P40 Tesla P100	Quadro P5000
Volta	Tesla V100
Turing	Tesla T4	Quadro RTX 5000
Ampere	Tesla A100	RTX A4000 RTX A5000
Ada Lovelace	L4
Hopper	H100

Deprecated

Support for the Tesla K20, and Tesla K40 and Tesla K80 cards is deprecated. While we still expect our GPGPU codes to run, NVIDIA has deprecated support for these cards in the CUDA 11.2 toolkit.

Notes

We support only the NVIDIA 'recommended / certified / production branch' Linux drivers for these cards with minimum CUDA version 12.0.

For information on pre-configured Schrödinger compatible GPU boxes see MD Compatible Systems and FEP+ Compatible Systems.

Standard support does not cover consumer-level GPU cards such as GeForce GTX cards.
If you already have another NVIDIA GPGPU and would like to know if we have experience with it, please contact our support at help@schrodinger.com.

	Required	Optional
a single copy of the input file	100 GB
The input ligand file split into individual sub-job input batches	100 GB
Series of csv files containing the predictions of the top 10% of each batch (sorted by uncertain). They are used to select input ligands for each iteration of training.	30 GB
Series of csv files containing the ligand_ml predictions for the ligands in all the batches	100GNot B*num_iteration
An output file for each iteration of training containing the predictions of the number of top-scoring compounds specified by the -keep command-line argument	30 GB
		If -num_rescore_ligand is specified, a single csv file containing the top rescored poses with Glide SP compounds as specified by num_rescore_ligand. (200 MB)
		If -write_pose is provided, a Maestro file containing the poses of the rescored ligands. (2 GB)
Total disk space required	620 G(100 + 100 + 130*3 + 30) 3 iterations	822 G (100 + 100 + 130*3 + 30 + 200 + 2)

Active Learning Glide

System Requirements

Supported Operating Systems

Linux

RedHat Enterprise Linux (RHEL) 7.8-7.9, 8.6, 8.8, 9.0, 9.2

Rocky Linux 8.8, 9.2

CentOS 7.8 - 7.9

Ubuntu 20.04 LTS and 22.04 LTS

Timeline

Upcoming Changes

Hardware Requirements

Scratch space example

Example of requirements based on inputs

Inputs for example:

Subjob Requirements

ML Training*

ML Evaluation

Glide Docking

GPGPU Requirements

We support the following NVIDIA solutions:

Deprecated

Notes