Difference between revisions of "Euler I upgrade"

From ScientificComputing
Jump to: navigation, search
(What is the difference between the old and new nodes)
(Other questions?)
Line 27: Line 27:
 
== Other questions? ==
 
== Other questions? ==
  
Please contact [mailto:cluster-support@id.ethz.ch Cluster Support] if you have any question about the new nodes or about the upgrade procedure.
+
Please contact {{Cluster_support}} if you have any question about the new nodes or about the upgrade procedure.

Revision as of 12:35, 2 August 2018

The compute nodes in Euler I were bought at the end of 2013 and are reaching their end-of-life. Instead of throwing Euler I away, we have proposed to the management of IT Services and of ETH to upgrade it with new compute nodes equipped with the latest generation of Intel CPUs. The existing racks, chassis, and networks are still in perfect condition and will be reused for the new nodes, which will help reduce costs.

This proposal has just been approved by VPPR Prof. Ulrich Weidmann. The new compute nodes will be ordered and installed in the coming weeks. The existing Euler I nodes will be taken off-line in the morning of 20 August 2018 and shipped back to HPE as part of a trade-in program.

What is the difference between the old and new nodes

To simplify job scheduling, we have decide to keep the same number of cores per node (24). More precisely, the CPU will be upgraded from Intel Xeon E5-2697v2 (Ivy Bridge) to Xeon Gold 5118 (Skylake) and the memory will be increased from 64 GB (DDR3-1866) to 96 GB (DDR4-2400).

How will this affect shareholders?

This upgrade is completely transparent. Shareholders who bought Euler I nodes will have guaranteed access to the equivalent computing capacity during and after the upgrade, as long as their share is valid.

The new nodes will replace the current "standard" nodes in the Euler price list, which will be updated later this month.

How will this affect batch jobs?

The batch system has already been configured to not schedule jobs to Euler I, if these jobs would not be finished by 00:00, 20 August.

Jobs that explicitly request Euler I nodes, e.g. by selecting Intel Xeon E5-2697v2 processors, will not run after 20 August. To be on the safe side, you should stop requesting this CPU model immediately.

The E5-2697v2 CPUs does not support AVX2 instructions, which caused problems with some applications compiled for AVX2. After the upgrade, all nodes in Euler will support AVX2 instructions. You should target this architecture when compiling your programs on Euler. You can already select AVX2 nodes when you submit your job using the option

bsub -R avx2

Euler will continue to run normally throughout the upgrade. However, its overall computing capacity will be temporarily reduced from 45,000 to 35,000 cores, which may lead to longer queuing times. Considering that the new nodes will be significantly faster than the old ones, we count on your understanding during this transition phase.

Other questions?

Please contact cluster support if you have any question about the new nodes or about the upgrade procedure.