Difference between revisions of "Euler VI Testing"

From ScientificComputing
Jump to: navigation, search
Line 1: Line 1:
 
The new [[Euler#Euler_VI|Euler VI]] nodes are available for beta testing. They have 128 cores, 512 GB of memory and are connected in a 200 Gbps EDR Infiniband fabric.
 
The new [[Euler#Euler_VI|Euler VI]] nodes are available for beta testing. They have 128 cores, 512 GB of memory and are connected in a 200 Gbps EDR Infiniband fabric.
 +
 +
== Roadmap ==
 +
 +
Euler VI will be put into regular production through the following phases. The current phase is highlighted in '''bold.'''
 +
 +
* '''Closed beta testing: the new nodes are tested by the HPC group and interested users who contact us and agree to be beta-testers.'''
 +
* Open beta testing: the new nodes can be tested by anyone who is interested.
 +
* Gradual easement: a portion of all jobs may be able to run on the system, starting with a minimal set and gradually increasing.
 +
* Production: the new nodes are treated as any other node in the cluster
  
 
<!-- You are encouraged to test your jobs on the Euler VI nodes during this open beta testing phase. -->
 
<!-- You are encouraged to test your jobs on the Euler VI nodes during this open beta testing phase. -->
Line 15: Line 24:
 
If you encounter any problem with running your jobs on the new Euler VI nodes, then please report it to {{Cluster_support}}.
 
If you encounter any problem with running your jobs on the new Euler VI nodes, then please report it to {{Cluster_support}}.
  
While you can always use the <tt>-R stable</tt> option, the <tt>-R beta</tt> option will not work after the Euler VI nodes are put into production.
+
After the nodes are put into production, any jobs submitted with the “stable” option may run on the new nodes, too. Some time after than, jobs submitted with the “beta” option will no longer be able to run.
  
 
== Changes in behavior ==
 
== Changes in behavior ==
Line 25: Line 34:
 
=== Non-threaded (multi-node MPI) jobs ===
 
=== Non-threaded (multi-node MPI) jobs ===
  
You should use the “-R "span[ptile=128]"” (or other appropriate value instead of 128) if you intend to run multi-node jobs.
+
You should use the “<tt>-R "span[ptile=128]"</tt>” (or other appropriate value instead of 128) if you intend to run multi-node jobs.
  
 
== Known issues ==
 
== Known issues ==
  
 
== Troubleshooting ==
 
== Troubleshooting ==

Revision as of 12:31, 31 January 2020

The new Euler VI nodes are available for beta testing. They have 128 cores, 512 GB of memory and are connected in a 200 Gbps EDR Infiniband fabric.

Roadmap

Euler VI will be put into regular production through the following phases. The current phase is highlighted in bold.

  • Closed beta testing: the new nodes are tested by the HPC group and interested users who contact us and agree to be beta-testers.
  • Open beta testing: the new nodes can be tested by anyone who is interested.
  • Gradual easement: a portion of all jobs may be able to run on the system, starting with a minimal set and gradually increasing.
  • Production: the new nodes are treated as any other node in the cluster


Select or avoid Euler VI nodes

During the testing and transition period you can force your job to use or avoid these nodes.

To force your job to run on these nodes, request the “-R beta” or “-R "select[model==EPYC_7742]"” bsub option:

bsub -R beta [other bsub options] ./my_command
bsub -R "select[model==EPYC_7742]" [other bsub options] ./my_command

To prevent your job from running on these nodes, request the “-R stable” bsub option:

bsub -R stable [other bsub options] ./my_command

If you encounter any problem with running your jobs on the new Euler VI nodes, then please report it to cluster support.

After the nodes are put into production, any jobs submitted with the “stable” option may run on the new nodes, too. Some time after than, jobs submitted with the “beta” option will no longer be able to run.

Changes in behavior

If you request Euler VI nodes, then the batch system will run jobs requesting up to 128 cores on a single node.

Threaded jobs

Non-threaded (multi-node MPI) jobs

You should use the “-R "span[ptile=128]"” (or other appropriate value instead of 128) if you intend to run multi-node jobs.

Known issues

Troubleshooting