Euler IV Beta Testing

From ScientificComputing
Revision as of 13:30, 26 March 2018 by Urbanb (talk | contribs) (MVAPICH2 clarification)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The new Euler IV nodes are generally available for beta testing. They have 36 cores and 192 GB of memory and are connected in a 100 Gbps EDR Infiniband fabric.

You are encouraged to test your jobs on the Euler IV nodes during this open beta testing phase.

Select or avoid Euler IV nodes

You can force or prevent your jobs from running there. To force your job to run on these nodes, request the “-R beta” bsub option:

bsub -R beta [other bsub options] ./my_command

To prevent your job from running on these nodes, request the “-R stable” bsub option:

bsub -R stable [other bsub options] ./my_command

If you encounter any problem with running your jobs on the new Euler IV nodes, then please report it to cluster support.

Changes in behavior

Since the new nodes have 36 cores compared to the 24 cores in the old nodes, there is a change in how a parallel job is split into nodes. During the beta testing phase this change only affects jobs submitted with the “-R beta” option. As before, the span[] options are honored. Otherwise, the defaults described below are used.

Threaded jobs

If a job is submitted and OMP_NUM_THREADS is set to a value other than 1, then the job will run on multiples of 24 cores by default. This behavior is unchanged.

Non-threaded jobs

For non-threaded jobs (most pure MPI jobs),

  • Jobs requesting up to 36 cores will run on a single node (previously: up to 24 cores).
  • Jobs requesting 192 or more cores and less than 5200 MB memory per core) will run on multiple, full 36-core nodes (previously: these run on full 24-core nodes).
  • Jobs requesting between 37 and 191 cores and less than 5200 MB memory per core will run in blocks of 12, 6, or 4 cores (this is new).

Known issues

See the Troubleshooting section below for solutions to issues you may encounter.

Infiniband and MPI
All versions of MVAPICH2 ≤ 2.2 are known to not work on these nodes. Only MVAPICH2 2.3rc1 works. LSF will reject any jobs that it determines that require MVAPICH2 ≤ 2.2 and the Euler IV nodes.

Troubleshooting