Article ID: 122 - Last Modified: November 7, 2011
When I run Desmond or Jaguar jobs on a cluster with an Infiniband network, the jobs fail with the following in the log file:
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
How can I get around this?
The jobs are failing because of limits that set in your shell. Have a look at the OpenMPI FAQs #14 and #15 listed here:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
Of particular importance is the fact that shell limits can be set in many places (limits.conf, PAM, even the resource manager daemons (e.g. SGE, LSF, etc.). Simply logging into a computer and testing the limits via 'ulimit -a' (or equivalent) may yield inaccurate results.
To truly test the shell limits in the environment of a cluster job, submit a simple batch script to the queue that prints out 'ulimit -a'.
Keywords: RLIMIT_MEMLOCK Infiniband cluster ulimit
Related Articles:
#1481: My parallel Jaguar calculation failed with an "out of memory" error. What is the problem?
Type the words or phrases on which you would like to search, or click here to view a list of all
Knowledge Base articles

