Article ID: 1589 - Last Modified: December 14, 2011
I submitted a Glide docking job to our cluster a couple of weeks ago and we just had a system failure that caused our NFS disks to unmount and for the queuing system on our cluster to shutdown. Is there a way to restart the jobs?
Distributed Glide jobs can be restarted at a coarse level: that is, incomplete subjobs have to be started from the beginning, but any completed subjobs do not have to be rerun. For a given Glide job with multiple subjobs, first check to see what state Job Control thinks the job is in:
$SCHRODINGER/jobcontrol -list -c JobId
where JobId is the Schrodinger Job ID of the Glide job, visible in the Monitor panel or near the top of the
jobname.log file. You may find that some subjobs are in 'stranded' status, which happens when Job Control on the launch machine loses track of the superintending Job Control processes for the backends on the compute nodes. If there are 'running' subjobs when you know they really
aren't running, try
$SCHRODINGER/jobctonrol -ping -c JobId
to have Job Control refresh their statuses. Next, try
$SCHRODINGER/jobcontrol -recover -c JobId
to see if Job Control can recover any files from the compute nodes. It could be that the Glide backends on the compute nodes continued running and were able to produce pv or lib files.
Once the job has been cleaned up, from Job Control's perspective, and the main Glide driver job is in 'died' status, you can try to restart the job
$SCHRODINGER/glide -RESTART jobname.in
Glide will rerun any 'died' or 'killed' subjobs, plus any subjobs that didn't get run in the original job, and then combine all the old and new results together.
Keywords: Glide, restart
#656: I started a job to screen a million molecules, and it failed before finishing. How can I recover my results and finis...
#1054: How can I restart a Glide docking calculation for which some subjobs failed?
#1634: My serial Glide job was interrupted. Can I resume the job without losing the results of ligands already docked in the...
Type the words or phrases on which you would like to search, or click here to view a list of all
Knowledge Base articles