schrodinger.active_learning.al_utils module¶
- schrodinger.active_learning.al_utils.positive_int(s)[source]¶
ArgumentParser function to check whether input can be converted to positive integer.
- Parameters
s (str) – input string
- Returns
integer value of input string
- Return type
int
- schrodinger.active_learning.al_utils.split_smi_line(line)[source]¶
Split a line from .smi file to SMILES pattern and title. Return empty list if line is empty.
- Parameters
line (str) – line from .smi file
- Returns
SMILES pattern, title
- Return type
[str, str] or []
- schrodinger.active_learning.al_utils.get_smi_header()[source]¶
Create header for .smi input file. We assume the SMILES is in the first column and title in in the second column.
- Returns
header list, header index for reordering SMILES and title
- Return type
list(str), list(int)
- schrodinger.active_learning.al_utils.get_csv_header(filename, smi_index, name_index, delimiter=',', with_header=True)[source]¶
Create header for .csv input file. The reordered index will put SMILES at first column and title at the second column.
- Parameters
filename (str) – .csv input file
smi_index (int) – column index of molecule SMILES
name_index (int) – column index of molecule name
delimiter (str) – delimiter of input csv files
with_header (bool) – Whether the file has header in its first line
- Returns
header list, header index for reordering SMILES and title
- Return type
list(str), list(int)
- schrodinger.active_learning.al_utils.my_csv_reader(filename)[source]¶
Yield a csv reader that skips the first line.
- Parameters
filename (str) – .csv file name
- Returns
csv.reader that skips first line of the file.
- Return type
iterator
- schrodinger.active_learning.al_utils.read_score(score_file)[source]¶
Read known scores of ligands from args.score_file.
- Returns
a dictionary that maps ligand title to ligand score.
- Return type
dict
- schrodinger.active_learning.al_utils.random_split(file_list, num_ligands, prefix='splited', block_size=100000, name_index=0, smi_index=1, random_seed=None, delimiter=',', with_header=True)[source]¶
Combine input files, shuffle lines, split into files with block_size line per file. Reorder the columns such that SMILES and name is in the first and second column respectively.
- Parameters
file_list (list) – paths of input files.
num_ligands (int) – total number of ligands in all the input files.
prefix (str) – prefix of split files
block_size (int) – number of ligands in each sub .csv file.
name_index (int) – column index of molecule name
smi_index (int) – column index of molecule SMILES
random_seed (int or None) – random seed number for shuffling the ligands
delimiter (str) – delimiter of input csv files
with_header (bool) – Whether input file(s) has header in its first line.
- Returns
list of split files, reordered csv header
- Return type
list, list
- schrodinger.active_learning.al_utils.merge_ligand_ml_models(sub_model_name_list, final_model, job_directory)[source]¶
Merge multiple .tar.gz ligand_ml models to single zipped deepautoqsar model.
- Parameters
sub_model_name_list ([str]) – list of .tar.gz ligand_ml model name.
final_model (str) – full path of the final zipped deepautoqsar model.
job_directory (str) – directory of the .tar.gz ligand_ml models.
- schrodinger.active_learning.al_utils.convert_ligand_ml_model_format(qzip_model)[source]¶
Convert .qzip deepautoqsar model to .tar.gz ligand_ml model.
- Parameters
qzip_model (str) – .qzip deepautoqsar model filename.
- Returns
.tar.gz ligand_ml model filename
- Return type
str
- schrodinger.active_learning.al_utils.get_file_ext(filename)[source]¶
Get the extension of the file name. Skip ‘gz’ if it is a gz compressed file.
- Parameters
filename (str) – name of the file.
- Returns
‘gz’ excluded extension of the file.
- Return type
str
- schrodinger.active_learning.al_utils.check_driver_disk_space(active_learning_job)[source]¶
Estimate the driver disk usage of an active learning job with some assumed parameters. Print a warning is the available driver disk space is smaller than the estimate space.
- Parameters
active_learning_job (ActiveLearningJob instance.) – current AL driver.
- schrodinger.active_learning.al_utils.node_run_timer(func)[source]¶
Decorator for timing the running time of runNode method in ActiveLearningNode
- schrodinger.active_learning.al_utils.add_output_file(*output_files, incorporate=False)[source]¶
Add files to jobcontrol output files.
- Parameters
output_files (str) – files to be transferred.
incorporate (bool) – marked files for incorporation by maestro.
- schrodinger.active_learning.al_utils.add_input_file(jsb, *input_files)[source]¶
Check the existence of input file(s). Add it as jobcontrol input file if it exists, otherwise exit with error.
- Parameters
jsb (launchapi.JobSpecificationArgsBuilder) – job specification builder
input_files (str) – input file(s) to be added.
- schrodinger.active_learning.al_utils.concatenate_logs(combined_logfile, subjob_logfile_list, logger=None)[source]¶
Combine subjob logfiles into single combined logfile.
- Parameters
combined_logfile (str) – combined log file name
subjob_logfile_list (list(str)) – list of subjob logfile names to be combined.
logger (Logger or None) – logger for receiving the info and error message.