Article ID: 964 - Last Modified: June 6, 2013
Desmond and Jaguar parallel jobs fail when sent to an LSF queueing system. What can I do?
MPI parallel jobs may fail when using the LSF queuing system, giving an error in the log file similar to:
installation/mmshare-v19103/lib/Linux-x86_64/openmpi/bin/orterun: symbol lookup error: installation/mmshare-v19103/lib/Linux-x86_64/openmpi/lib/openmpi/mca_plm_lsf.so: undefined symbol: lsb_init
This failure is due to a bug in OpenMPI that causes problems for tight integration with LSF.This problem has been fixed in Suite 2010, with a patch to the version of Open MPI in the Schrödinger software distribution. If you are using another version of Open MPI and would like to recompile it with the patch included, send email to firstname.lastname@example.org.
Otherwise, the simplest workaround is to disable the tight integration with the LSF queue. Your jobs should still run, but it will be harder for LSF to clean up certain types of job failures (which should be rare).
To disable the tight integration, move these files
mca_ess_lsf.la mca_ess_lsf.so mca_plm_lsf.la mca_plm_lsf.so mca_ras_lsf.la mca_ras_lsf.so
out of the following directories:
Without the tight integration, OpenMPI uses passwordless ssh to start processes. You must therefore configure passwordless ssh between execution nodes in the cluster, for all users of Schrödinger software.
Type the words or phrases on which you would like to search, or click here to view a list of all
Knowledge Base articles