Difference between revisions of "Euler IV Beta Testing"

From ScientificComputing
Jump to: navigation, search
Line 1: Line 1:
The new [[Euler#Euler_IV|Euler IV]] nodes are available for beta testing. They have 36 cores and 192 GB of memory. In a first phase, selected beta testing users will get access to the new nodes. Later, there will be an open beta testing phase where all Euler users can submit jobs to the new nodes before the finally go into production.
+
The new [[Euler#Euler_IV|Euler IV]] nodes are generally available for beta testing. They have 36 cores and 192 GB of memory and are connected in a 100 Gbps EDR Infiniband fabric.
 +
 
 +
You are encouraged to test your jobs on the Euler IV nodes during this open beta testing phase.
  
 
== Select or avoid Euler IV nodes ==
 
== Select or avoid Euler IV nodes ==
Line 8: Line 10:
  
 
If you encounter any problem with running your jobs on the new Euler IV nodes, then please report it to {{Cluster_support}}.
 
If you encounter any problem with running your jobs on the new Euler IV nodes, then please report it to {{Cluster_support}}.
 +
 +
== Changes in behavior ==
 +
 +
Since the new nodes have 36 cores compared to the 24 cores in the old nodes, there is a change in how a parallel job is split into nodes. During the beta testing phase this change only affects jobs submitted with the “-R beta” option. As before, the <tt>span[]</tt> options are honored. Otherwise, the defaults described below are used.
 +
 +
=== Threaded jobs ===
 +
 +
If a job is submitted and OMP_NUM_THREADS is set to a value other than 1, then the job will run on multiples of 24 cores by default. This behavior is unchanged.
 +
 +
=== Non-threaded jobs ===
 +
 +
For non-threaded jobs (most pure MPI jobs),
 +
 +
* Jobs requesting up to 36 cores will run on a single node (previously: up to 24 cores).
 +
* Jobs requesting 192 or more cores and less than 5200 MB memory per core) will run on multiple, full 36-core nodes (previously: these run on full 24-core nodes).
 +
* Jobs requesting between 37 and 191 cores and less than 5200 MB memory per core will run in blocks of 12, 6, or 4 cores (this is new).
  
 
== Known issues ==
 
== Known issues ==

Revision as of 07:45, 22 March 2018

The new Euler IV nodes are generally available for beta testing. They have 36 cores and 192 GB of memory and are connected in a 100 Gbps EDR Infiniband fabric.

You are encouraged to test your jobs on the Euler IV nodes during this open beta testing phase.

Select or avoid Euler IV nodes

You can force or prevent your jobs from running there. To force your job to run on these nodes, request the “-R beta” bsub option:

bsub -R beta [other bsub options] ./my_command

To prevent your job from running on these nodes, request the “-R stable” bsub option:

bsub -R stable [other bsub options] ./my_command

If you encounter any problem with running your jobs on the new Euler IV nodes, then please report it to cluster support.

Changes in behavior

Since the new nodes have 36 cores compared to the 24 cores in the old nodes, there is a change in how a parallel job is split into nodes. During the beta testing phase this change only affects jobs submitted with the “-R beta” option. As before, the span[] options are honored. Otherwise, the defaults described below are used.

Threaded jobs

If a job is submitted and OMP_NUM_THREADS is set to a value other than 1, then the job will run on multiples of 24 cores by default. This behavior is unchanged.

Non-threaded jobs

For non-threaded jobs (most pure MPI jobs),

  • Jobs requesting up to 36 cores will run on a single node (previously: up to 24 cores).
  • Jobs requesting 192 or more cores and less than 5200 MB memory per core) will run on multiple, full 36-core nodes (previously: these run on full 24-core nodes).
  • Jobs requesting between 37 and 191 cores and less than 5200 MB memory per core will run in blocks of 12, 6, or 4 cores (this is new).

Known issues

See the Troubleshooting section below for solutions to issues you may encounter.

Infiniband and MPI
There have been reports of MVAPICH2 MPI not working.


Troubleshooting