Euler VI Testing

From ScientificComputing
Revision as of 12:31, 31 January 2020 by Urbanb (talk | contribs)

Jump to: navigation, search

The new Euler VI nodes are available for beta testing. They have 128 cores, 512 GB of memory and are connected in a 200 Gbps EDR Infiniband fabric.

Roadmap

Euler VI will be put into regular production through the following phases. The current phase is highlighted in bold.

  • Closed beta testing: the new nodes are tested by the HPC group and interested users who contact us and agree to be beta-testers.
  • Open beta testing: the new nodes can be tested by anyone who is interested.
  • Gradual easement: a portion of all jobs may be able to run on the system, starting with a minimal set and gradually increasing.
  • Production: the new nodes are treated as any other node in the cluster


Select or avoid Euler VI nodes

During the testing and transition period you can force your job to use or avoid these nodes.

To force your job to run on these nodes, request the “-R beta” or “-R "select[model==EPYC_7742]"” bsub option:

bsub -R beta [other bsub options] ./my_command
bsub -R "select[model==EPYC_7742]" [other bsub options] ./my_command

To prevent your job from running on these nodes, request the “-R stable” bsub option:

bsub -R stable [other bsub options] ./my_command

If you encounter any problem with running your jobs on the new Euler VI nodes, then please report it to cluster support.

After the nodes are put into production, any jobs submitted with the “stable” option may run on the new nodes, too. Some time after than, jobs submitted with the “beta” option will no longer be able to run.

Changes in behavior

If you request Euler VI nodes, then the batch system will run jobs requesting up to 128 cores on a single node.

Threaded jobs

Non-threaded (multi-node MPI) jobs

You should use the “-R "span[ptile=128]"” (or other appropriate value instead of 128) if you intend to run multi-node jobs.

Known issues

Troubleshooting