When I run Desmond, Jaguar, or QSite jobs on a cluster with an Infiniband network, the jobs fail with the following in the log file. How do I fix this? libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations.
The jobs are failing because of limits that are set in your shell. Have a look at the OpenMPI FAQs #14 and #15 listed here:
Of particular importance is the fact that shell limits can be set in many places (limits.conf, PAM, even the resource manager daemons e.g. SGE, LSF, etc.). Simply logging into a computer and testing the limits via ulimit -a (or equivalent) may yield inaccurate results.
To truly test the shell limits in the environment of a cluster job, submit a simple batch script to the queue that prints out ulimit -a.
You can increase the limit for RLIMIT_MEMLOCK by adding the following command in the shell startup files:
bash: ulimit -l unlimited
csh/tcsh: limit memorylocked unlimited