schrodinger.active_learning.al_driver module¶
Implementation of screening large library with active learning scheme.
Active learning scheme 1. Select N ligands from the library 2. Dock the selected portion of the library. 3. Train a ligand_ml model with the scores. 4. Evaluate the whole library with the generated ligand_ml model. 5. Pick N from the top M best ligands predicted by the ligand_ml model. 6. Dock the ligands picked in step 5 and repeat step 3 until it reaches num_iter.
Copyright Schrodinger Inc, All Rights Reserved.
- class schrodinger.active_learning.al_driver.Option(*names, dest=None, help=None, type=<class 'str'>, metavar=None, default=None, action=None, nargs=None, choices=None, required=False)[source]¶
Bases:
object
A class to represent “options” which may be translated into argparse command-line arguments or an InputConfig spec for parsing input files. This is used to support the behavior of the legacy SiteMap driver, where every option could be specified in an input file or on the command line, with the latter taking precedence.
- __init__(*names, dest=None, help=None, type=<class 'str'>, metavar=None, default=None, action=None, nargs=None, choices=None, required=False)[source]¶
The arguments all have the same meaning as for argparse.ArgumentParser.add_argument(), except
min
andmax
which are only used by ConfigObj and limit the range of allowed values for numeric types.
- toArgparse(parser)[source]¶
Add an option to an argument parser.
- Parameters
parser (arparse.ArgumentParser) – argument parser
- schrodinger.active_learning.al_driver.get_workflow_node_names(task, num_iter, use_known_score, run_rescore_ligand, al_node_supplier)[source]¶
Return a list of stages needed to complete the workflow based on the task type, number of iteration, whether score is known and whether to run rescore stage.
- Parameters
task (str in [SCREEN_TASK, PILOT_TASK or EVAL_TASK]) – workflow task type
num_iter (int) – number of iterations
use_known_score (bool) – Use known scores in score_file to obtain the score.
run_rescore_ligand (bool) – run rescore stage for ligand.
al_node_supplier (ActiveLearningNodeSupplier) – Supplier of active learning nodes
- Returns
list of names of stages needed to complete the workflow
- Return type
list(str)
- schrodinger.active_learning.al_driver.validate_stop_after(stop_after_node, task, num_iter, use_known_score, run_rescore_ligand, restart_file, al_node_supplier)[source]¶
Check whether the node name user specified in -stop_after is valid.
- Parameters
stop_after_node (str in [ActiveLearningNode name] or 'FinishAll') – name of the node where workflow will exit when it was finished.
task (str in [SCREEN_TASK, PILOT_TASK or EVAL_TASK]) – workflow task type
num_iter (int) – number of iterations
use_known_score (bool) – Use known scores in use_known_score to obtain the score.
run_rescore_ligand (bool) – run rescore stage for ligands.
al_node_supplier (ActiveLearningNodeSupplier) – Supplier of active learning nodes
- Returns
error message if validation failed; None if it passed
- Return type
str or None
- schrodinger.active_learning.al_driver.validate_input_files(input_files, remote_input_ligands=False, allowed_format=None)[source]¶
Check the existence and format of input files. Return error message if validation failed, otherwise return None.
- Parameters
input_files (list(str)) – paths of input files.
remote_input_ligands (bool) – Whether input ligand files are located at remote.
allowed_format (list or None) – allowed input file formats.
- Returns
error message if validation failed; None if it passed
- Return type
str or None
- schrodinger.active_learning.al_driver.validate_input_smiles(input_files, smi_index, name_index, with_header=True, max_check=10)[source]¶
Validate SMILES in input files.
- Parameters
input_files (list(str)) – paths of input files.
smi_index (int) – column index of molecule SMILES
name_index (int) – column index of molecule name
with_header (bool) – Whether the file has header in its first line
max_check (int) – maximum number of SMILES to validate.
- Returns
error message if validation failed; None if it passed
- Return type
str or None
- schrodinger.active_learning.al_driver.count_ligands(file_list, with_header=True)[source]¶
Count the number of ligands in all the files by counting the total number of lines. We assume each line contains a SMILES string.
- Parameters
file_list (list(str)) – list of input file paths.
with_header (bool) – Whether the input files have header.
- Returns
Number of ligands in all the input files.
- Return type
int
- class schrodinger.active_learning.al_driver.ActiveLearningJob(args, al_node_supplier)[source]¶
Bases:
object
- __init__(args, al_node_supplier)[source]¶
Initialize the ActiveLearningJob from the cmd argumenets.
- Parameters
args (argparse.Namespace) – argument namespace with command line options
- static LoadPreviousNodes(restart_file)[source]¶
Load nodes that were finished in previous job.
- Parameters
restart_file (str) – filename of the AL .pkl restart file
- Returns
Nodes that were finished in previous job.
- Return type
OrderedDict that maps node name to node instance.
- static getNodeClasses(use_known_score, al_node_supplier)[source]¶
Return a list of node classes to run based on the job type.
- Parameters
use_known_score (bool) – Use known scores in score_file to obtain the score.
al_node_supplier (ActiveLearningNodeSupplier) – Supplier of active learning nodes
- Returns
a list of ActiveLearningNode subclass
- Return type
list
- LoadOptionalRestartFiles()[source]¶
Load the restart files for the possible restarting of the running node.
- Returns
list of filenames
- Type
list(str) or None
- property scored_csv_file_list¶
Get all the .csv files that contain scored ligands from ScoreProviderNode.
- Returns
list of .csv files contain score ligands.
- Return type
list(str)
- property restart_files¶
Get all the necessary files for restarting the workflow from finished nodes.
- Returns
a set of files for restarting.
- Return type
set(str)
- getPilotScoreFile()[source]¶
Reorder the columns in the pilot ligand score file for the use of machine learning model training input.
- Returns
name of reorder .csv file.
- Return type
str
- getRestartNode()[source]¶
Get the node for restarting the workflow.
- Returns
last finished node
- Return type
- schrodinger.active_learning.al_driver.read_paths_listed_in_file(old_paths, paths_list_file)[source]¶
Add the paths specified in the paths_list_file to old_paths.
- Parameters
old_paths (list) – None or list of original paths
paths_list_file (string) – path of the file that contains paths to be added
- Returns
list of paths
- Return type
list(str)
- schrodinger.active_learning.al_driver.restart_args_handler(args)[source]¶
Load the previous arguments stored in args.restart_file.
- Parameters
args (argparse.Namespace) – argument namespace with command line options
- Returns
updated argument namespace, argument namespace of previous job or None
- Return type
argument namespace, argument namespace or None
- schrodinger.active_learning.al_driver.common_parse_args(args)[source]¶
Parses command-line arguments.
- Parameters
args (argparse.Namespace) – argument namespace with command line options
- Returns
argument namespace with command line options
- Return type
argparse.Namespace