schrodinger.active_learning.al_utils module

schrodinger.active_learning.al_utils.positive_int(s)[source]

ArgumentParser function to check whether input can be converted to positive integer.

Parameters

s (str) – input string

Returns

integer value of input string

Return type

int

schrodinger.active_learning.al_utils.split_smi_line(line)[source]

Split a line from .smi file to SMILES pattern and title. Return empty list if line is empty.

Parameters

line (str) – line from .smi file

Returns

SMILES pattern, title

Return type

[str, str] or []

schrodinger.active_learning.al_utils.get_smi_header()[source]

Create header for .smi input file. We assume the SMILES is in the first column and title in in the second column.

Returns

header list, header index for reordering SMILES and title

Return type

list(str), list(int)

schrodinger.active_learning.al_utils.get_csv_header(filename, smi_index, name_index, delimiter=',', with_header=True)[source]

Create header for .csv input file. The reordered index will put SMILES at first column and title at the second column.

Parameters
  • filename (str) – .csv input file

  • smi_index (int) – column index of molecule SMILES

  • name_index (int) – column index of molecule name

  • delimiter (str) – delimiter of input csv files

  • with_header (bool) – Whether the file has header in its first line

Returns

header list, header index for reordering SMILES and title

Return type

list(str), list(int)

schrodinger.active_learning.al_utils.my_csv_reader(filename)[source]

Yield a csv reader that skips the first line.

Parameters

filename (str) – .csv file name

Returns

csv.reader that skips first line of the file.

Return type

iterator

schrodinger.active_learning.al_utils.read_score(score_file)[source]

Read known scores of ligands from args.score_file.

Returns

a dictionary that maps ligand title to ligand score.

Return type

dict

schrodinger.active_learning.al_utils.random_split(file_list, num_ligands, prefix='splited', block_size=100000, name_index=0, smi_index=1, random_seed=None, delimiter=',', with_header=True)[source]

Combine input files, shuffle lines, split into files with block_size line per file. Reorder the columns such that SMILES and name is in the first and second column respectively.

Parameters
  • file_list (list) – paths of input files.

  • num_ligands (int) – total number of ligands in all the input files.

  • prefix (str) – prefix of split files

  • block_size (int) – number of ligands in each sub .csv file.

  • name_index (int) – column index of molecule name

  • smi_index (int) – column index of molecule SMILES

  • random_seed (int or None) – random seed number for shuffling the ligands

  • delimiter (str) – delimiter of input csv files

  • with_header (bool) – Whether input file(s) has header in its first line.

Returns

list of split files, reordered csv header

Return type

list, list

schrodinger.active_learning.al_utils.merge_ligand_ml_models(sub_model_name_list, final_model, job_directory)[source]

Merge multiple .tar.gz ligand_ml models to single zipped deepautoqsar model.

Parameters
  • sub_model_name_list ([str]) – list of .tar.gz ligand_ml model name.

  • final_model (str) – full path of the final zipped deepautoqsar model.

  • job_directory (str) – directory of the .tar.gz ligand_ml models.

schrodinger.active_learning.al_utils.convert_ligand_ml_model_format(qzip_model)[source]

Convert .qzip deepautoqsar model to .tar.gz ligand_ml model.

Parameters

qzip_model (str) – .qzip deepautoqsar model filename.

Returns

.tar.gz ligand_ml model filename

Return type

str

schrodinger.active_learning.al_utils.get_file_ext(filename)[source]

Get the extension of the file name. Skip ‘gz’ if it is a gz compressed file.

Parameters

filename (str) – name of the file.

Returns

‘gz’ excluded extension of the file.

Return type

str

schrodinger.active_learning.al_utils.check_driver_disk_space(active_learning_job)[source]

Estimate the driver disk usage of an active learning job with some assumed parameters. Print a warning is the available driver disk space is smaller than the estimate space.

Parameters

active_learning_job (ActiveLearningJob instance.) – current AL driver.

schrodinger.active_learning.al_utils.node_run_timer(func)[source]

Decorator for timing the running time of runNode method in ActiveLearningNode

schrodinger.active_learning.al_utils.add_output_file(*output_files, incorporate=False)[source]

Add files to jobcontrol output files.

Parameters
  • output_files (str) – files to be transferred.

  • incorporate (bool) – marked files for incorporation by maestro.

schrodinger.active_learning.al_utils.add_input_file(jsb, *input_files)[source]

Check the existence of input file(s). Add it as jobcontrol input file if it exists, otherwise exit with error.

Parameters
schrodinger.active_learning.al_utils.concatenate_logs(combined_logfile, subjob_logfile_list, logger=None)[source]

Combine subjob logfiles into single combined logfile.

Parameters
  • combined_logfile (str) – combined log file name

  • subjob_logfile_list (list(str)) – list of subjob logfile names to be combined.

  • logger (Logger or None) – logger for receiving the info and error message.

schrodinger.active_learning.al_utils.get_host_ncpu()[source]

Return the host and number of CPU that should be used to submit subjobs. This function works both running under job control and not.

Return type

str, int