Knowledge Base

Article ID: 964 - Last Modified:

Desmond and Jaguar parallel jobs fail when sent to an LSF queueing system. What can I do?

MPI parallel jobs may fail when using the LSF queuing system, giving an error in the log file similar to:

installation/mmshare-v19103/lib/Linux-x86_64/openmpi/bin/orterun: symbol lookup error:
installation/mmshare-v19103/lib/Linux-x86_64/openmpi/lib/openmpi/mca_plm_lsf.so: undefined symbol: lsb_init

This failure is due to a bug in OpenMPI that causes problems for tight integration with LSF.

This problem has been fixed in Suite 2010, with a patch to the version of Open MPI in the Schrödinger software distribution. If you are using another version of Open MPI and would like to recompile it with the patch included, send email to help@schrodinger.com.

Otherwise, the simplest workaround is to disable the tight integration with the LSF queue. Your jobs should still run, but it will be harder for LSF to clean up certain types of job failures (which should be rare).

To disable the tight integration, move these files

mca_ess_lsf.la
mca_ess_lsf.so
mca_plm_lsf.la
mca_plm_lsf.so
mca_ras_lsf.la
mca_ras_lsf.so

out of the following directories:

installation/mmshare-v19103/lib/Linux-x86/openmpi/lib/openmpi
installation/mmshare-v19103/lib/Linux-x86_64/openmpi/lib/openmpi

Without the tight integration, OpenMPI uses passwordless ssh to start processes. You must therefore configure passwordless ssh between execution nodes in the cluster, for all users of Schrödinger software.

Back to Search Results

Was this information helpful?

What can we do to improve this information?


To ask a question or get help, please submit a support ticket or email us at help@schrodinger.com.
Knowledge Base Search

Type the words or phrases on which you would like to search, or click here to view a list of all
Knowledge Base articles