schrodinger.application.desmond.queue module

class schrodinger.application.desmond.queue.Queue(hosts: str, max_job: int, max_retries: int, periodic_callback=None)

Bases: object

__init__(hosts: str, max_job: int, max_retries: int, periodic_callback=None)
Parameters
  • hosts – string passed to -HOST.

  • max_job – Maximum number of jobs to run simultaneously.

  • max_retries – Maximum number of times to retry a failed job.

  • periodic_callback – Function to call periodically as the jobs run. This can be used to handle the halt message for stopping a running workflow.

run()

Run jobs for all multisim stages.

Starts a separate JobDJ for each multisim stage.:

queue.push(jobs)
queue.run()
    while jobs:  <---------------|
        jobdj.run()              |
        multisim_jobs.finish()   |
          stage.capture()        |
          next_stage.push()      |
          next_stage.release()   |
          queue.push(next_jobs) --
stop() int

Attempt to stop the subjobs, but kill them if they do not stop in time.

Returns

Number of subjobs killed due to a failure to stop.

push(jobs: List[cmj.Job])
property running_jobs: List[schrodinger.application.desmond.queue.JobAdapter]
class schrodinger.application.desmond.queue.JobAdapter(*args, multisim_job=None, **kwargs)

Bases: schrodinger.job.queue.JobControlJob

__init__(*args, multisim_job=None, **kwargs)

Job constructor.

Parameters
  • command – The command that runs the job.

  • command_dir – The directory from which to run the command.

  • name – The name of the job.

  • max_retries – Number of allowed retries for this job. If this is set, it is never overridden by the SCHRODINGER_MAX_RETRIES environment variable. If it is not set, the value of max_retries defined in JobDJ is used, and SCHRODINGER_MAX_RETRIES can be used to override this value at runtime. To prevent this job from being restarted altogether, set max_retries to zero.

  • timeout – Timeout (in seconds) after which the job will be killed. If None, the job is allowed to run indefinitely.

  • launch_timeout – Timeout (in seconds) for the job launch process to complete. If None, a default timeout will be used for jobserver and old jobcontrol jobs ( see get_default_timeout() ) unless a value for job timeout parameter is passed and is not greater than the default timeout.

  • launch_env_variables – A dictionary with the environment variables to add when the jobcontrol job is launched. The name of any additional variables to set should be in the keyword of the dict and the value should be the corresponding value. These will be added to any environment variables already present, but removed after the job has been launched.

  • kwargs – Additional keyword arguments. Provided for consistency of interface in subclasses.

  • resource_requirement – Whether the job will require special compute resources, such as GPU.

  • license_requirement – List of license tokens required for the job to be used for license checking when SMART_LICENSE_CHECK feature flag is turned on. This is useful for license checking the first job of the smart distribution launched directly to the localhost without canceling from the queue. The license requirements are not known until the job is launched. Each license token is in the form ‘TOKEN’ or ‘TOKEN:n’ where TOKEN is the name of the license, and n is the number of tokens.

getCommand() List[str]

Return the command used to run this job.

maxFailuresReached(**kwargs)

Print an error summary, including the last 20 lines from each log file in the LogFiles list of the job record.

acquireLicenseForSmartDistribution() bool

Acquire and hold licenses for a smart distribution job. This makes sure the job won’t fail due to unavailable licenses.

Returns True if the licenses registered for the job are acquired, and False if they are not. If no licenses are registered, it always returns True to avoid preventing jobs from using the smart distribution feature. For legacy jobcontrol, the license check is not performed, and is always returned True. We want to use this feature as a pitch to move users to JOB_SERVER.

addFinalizer(function: Callable[[schrodinger.job.queue.BaseJob], None], run_dir: Optional[str] = None)

Add a function to be invoked when the job completes successfully.

See also the add_multi_job_finalizer function.

addGroupPrereq(job: schrodinger.job.queue.BaseJob)

Make all jobs connected to job prerequisites of all jobs connected to this Job.

addLaunchEnv(key: str, val: str)

Adds the given environment key and and value to the list of launch environment.

Parameters
  • key – environment key to add to the launch environment.

  • val – environment value associcated with the key to add to the launch environment.

addPrereq(job: schrodinger.job.queue.BaseJob)

Add a job that is an immediate prerequisite for this one.

cancel()

Send kill request to jobcontrol managed job. This method will eventually deprecate JobControlJob.kill

cancelSubmitted(do_license_check: bool = False) schrodinger.job.queue.CancelSubmittedStatus

If the job is still in the ‘submitted’ state, cancel it, purge the jobrecord and set the job handle to None. This tries to acquire licenses for the job before canceling from the queue if do_license_check is turned on.

Parameters

do_license_check – Acquire licenses for the job before canceling from the queue.

Returns one of the status of CancelSubmittedStatus.

doCommand(host: str, local: bool = False)

Launch job on specified host using jobcontrol.launch_job().

Parameters
  • host – Host on which the job will be executed.

  • local – Removed in JOB_SERVER.

finalize()

Clean up after a job successfully runs.

genAllJobs(seen: Optional[Set[schrodinger.job.queue.BaseJob]] = None) Generator[schrodinger.job.queue.BaseJob, None, None]

A generator that yields all jobs connected to this one.

genAllPrereqs(seen=None) Generator[schrodinger.job.queue.BaseJob, None, None]

A generator that yields all jobs that are prerequisites on this one.

getCommandDir() str

Return the launch/command directory name. If None is returned, the job will be launched in the current directory.

getDuration() Optional[int]

Return the duration of the Job as recorded by job server. The duration does not include queue wait time.

If the job is running or has not launched, returns None.

Note that this method makes a blocking call to the job server.

getJob() Optional[schrodinger.job.jobcontrol.Job]

Return the job record as a schrodinger.job.jobcontrol.Job instance.

Returns None if the job hasn’t been launched.

getJobDJ() schrodinger.job.queue.JobDJ

Return the JobDJ instance that this job has been added to.

getPrereqs()

Return a set of all immediate prerequisites for this job.

getStatusStrings() Tuple[str, str, str]

Return a tuple of status strings for printing by JobDJ.

The strings returned are (status, jobid, host).

hasExited() bool

Returns True if this job finished, successfully or not.

hasStarted() bool

Returns True if this job has started (not waiting)

init_count = 0
isComplete() bool

Returns True if this job finished successfully

kill()

Send kill request to jobcontrol managed job

postCommand()

A method to restore things to the pre-command state.

preCommand()

A method to make pre-command changes, like cd’ing to the correct directory to run the command in.

retryFailure(max_retries: int = 0) bool

This method will be called when the job has failed, and JobDJ needs to know whether the job should be retried or not.

JobDJ’s value for the max_retries parameter is passed in, to be used when the job doesn’t have its own max_retries value.

Return True if this job should be retried, otherwise False.

run(*args, **kwargs)

Run the job.

The steps taken are as follows:
  1. Execute the preCommand method for things like changing the working directory.

  2. Call the doCommand to do the actual work of computation or job launching.

  3. Call the postCommand method to undo the changes from the preCommand that need to be undone.

runsLocally() bool

Return True if the job runs on the JobDJ control host, False if not. Jobs that run locally don’t need hosts.

There is no limit on the number of locally run jobs.

setup()

A method to do initial setup; executed after preCommand, just before doCommand.

property state: schrodinger.job.queue.JobState

Return the current state of the job.

Note that this method can be overridden by subclasses that wish to provide for restartability at a higher level than unpickling BaseJob instances. For example, by examining some external condition (e.g. presence of output files) the state JobState.DONE could be returned immediately and the job would not run.

update()

Checks for changes in job status, and updates the object appropriately (marks for restart, etc).

Raises

RuntimeError – if an unknown Job Status or ExitStatus is encountered.

usesJobServer() bool

Detect, by looking at the jobId, whether this job uses a job server.