Node allocation for parallel jobs

From ScientificComputing
Jump to: navigation, search

The batch system uses your job's characteristics to choose appropriate nodes to run your job. In particular for parallel jobs it takes into account the number of requested cores, which MPI library is used, and whether threading is used. Until recently most nodes that can run parallel jobs had 24 cores. With the addition of 36-cores nodes, we have made some changes to allow jobs to run on all types of nodes efficiently.

The major change is that parallel jobs are scheduled in multiples of 12 rather than 24 cores.

Generally, a job will run as described below:

Shared-memory jobs

  • Jobs requesting up to 24 cores will run on a single node
  • Jobs requesting up to 36 cores and 185000 MB of total memory will run on a single node.

MPI and other distributed memory jobs

  • Most MPI jobs are split into blocks of 12 cores, running in 1, 2, or 3 of these blocks (12, 24, or 36 cores, respectively) on a node. If the number of cores is not divisible by 12, then they are split into blocks of 6, 5, or 4 cores. If the total number of cores is not a multiple of these, either, it will run with blocks of 24 cores.
  • Hybrid (e.g., MPI + OpenMP) jobs continue to run in multiple of 24 cores for now.
  • Jobs requesting more than 192 cores will run on as few 24- or 36-core nodes as possible.
  • Jobs requesting between 72 and 191 cores will run on as few 24- or 36-core nodes as possible, but possibly more.
  • Otherwise, a job will run in blocks of 24 cores.

Some examples

  • A 576-core MPI job will run either on 16 36-core nodes or 24 24-core nodes, filling them completely.
  • A 144-core MPI job will run in blocks of 12 cores. It may run on 4 or more 36-core nodes or 6 or more 24-core nodes. In the extreme case, it may run on 12 cores each on 12 different nodes.
  • A 1024-core MPI job may run on 28 full 36-core nodes and 16 cores of a 29th node or, alternatively, 42 full 24-core nodes and 16 cores of a 43rd node.
  • A 48-core MPI job may run on 2 24-core nodes but may run on 12 cores of 4 different nodes,

Controlling node allocation

Since the defaults described above may not be suitable for your job, you can control how your job will run.

Run only on dedicated nodes

To run only on dedicated nodes, such as 72 cores on three 24-core nodes or two 36-core nodes, request the fullnode bsub option:

bsub -n 72 -R fullnode mpirun ./my_program

The requested number of cores must be a multiple of 24 or 36 cores.

Run with an equal number of cores on every node

To run the same number of cores per nodes on every node, use the -R "span[ptile=X]" bsub option. For example, setting

bsub -n 72 -R "span[ptile=24]" mpirun ./my_program

will run this program on three different nodes using 24 cores on each. This may be 3 dedicated 24-core nodes or 3 non-dedicated 36-core nodes.

Run with blocks of cores

To run the your program is blocks of X cores on nodes, use the -R "span[block=X]" bsub option. The total number of requested cores must be a multiple of X. For example, to run your program in blocks of 4 cores, specify,

bsub -n 72 -R "span[block=4]" mpirun ./my_program

This can be especially useful if you to use much more RAM than the default of 1 GB per core or you want a short test job to be scheduled quickly.