Difference between revisions of "Euler VI Testing"

From ScientificComputing
Jump to: navigation, search
(Initial skeleton.)
 
Line 7: Line 7:
 
During the testing and transition period you can '''force''' your job to '''use''' or '''avoid''' these nodes.
 
During the testing and transition period you can '''force''' your job to '''use''' or '''avoid''' these nodes.
  
To '''force''' your job to run on these nodes, request the “-R beta” bsub option:
+
To '''force''' your job to run on these nodes, request the “-R beta” or “<tt>-R "select[model==EPYC_7742]"</tt>” bsub option:
 
  bsub -R beta [other bsub options] ./my_command
 
  bsub -R beta [other bsub options] ./my_command
 +
bsub -R "select[model==EPYC_7742]" [other bsub options] ./my_command
 
To '''prevent''' your job from running on these nodes, request the “-R stable” bsub option:
 
To '''prevent''' your job from running on these nodes, request the “-R stable” bsub option:
 
  bsub -R stable [other bsub options] ./my_command
 
  bsub -R stable [other bsub options] ./my_command
  
 
If you encounter any problem with running your jobs on the new Euler VI nodes, then please report it to {{Cluster_support}}.
 
If you encounter any problem with running your jobs on the new Euler VI nodes, then please report it to {{Cluster_support}}.
 +
 +
While you can always use the <tt>-R stable</tt> option, the <tt>-R beta</tt> option will not work after the Euler VI nodes are put into production.
  
 
== Changes in behavior ==
 
== Changes in behavior ==
 +
 +
If you request Euler VI nodes, then the batch system will run jobs requesting up to 128 cores on a single node.
  
 
=== Threaded jobs ===
 
=== Threaded jobs ===
  
=== Non-threaded jobs ===
+
=== Non-threaded (multi-node MPI) jobs ===
 +
 
 +
You should use the “-R "span[ptile=128]"” (or other appropriate value instead of 128) if you intend to run multi-node jobs.
  
 
== Known issues ==
 
== Known issues ==
 +
 
== Troubleshooting ==
 
== Troubleshooting ==

Revision as of 12:12, 31 January 2020

The new Euler VI nodes are available for beta testing. They have 128 cores, 512 GB of memory and are connected in a 200 Gbps EDR Infiniband fabric.


Select or avoid Euler VI nodes

During the testing and transition period you can force your job to use or avoid these nodes.

To force your job to run on these nodes, request the “-R beta” or “-R "select[model==EPYC_7742]"” bsub option:

bsub -R beta [other bsub options] ./my_command
bsub -R "select[model==EPYC_7742]" [other bsub options] ./my_command

To prevent your job from running on these nodes, request the “-R stable” bsub option:

bsub -R stable [other bsub options] ./my_command

If you encounter any problem with running your jobs on the new Euler VI nodes, then please report it to cluster support.

While you can always use the -R stable option, the -R beta option will not work after the Euler VI nodes are put into production.

Changes in behavior

If you request Euler VI nodes, then the batch system will run jobs requesting up to 128 cores on a single node.

Threaded jobs

Non-threaded (multi-node MPI) jobs

You should use the “-R "span[ptile=128]"” (or other appropriate value instead of 128) if you intend to run multi-node jobs.

Known issues

Troubleshooting