Euler I upgrade

From ScientificComputing
Revision as of 16:17, 2 August 2018 by Byrdeo (talk | contribs) (When will it happen?)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Why upgrade Euler I?

The compute nodes in Euler I were bought at the end of 2013 and are reaching their end-of-life. Instead of throwing Euler I away, we have proposed to the management of IT Services and of ETH to upgrade it with new compute nodes equipped with the latest generation of Intel CPUs. The existing racks, chassis, and networks are still in perfect condition and will be reused for the new nodes, which will help reduce costs.

When will this happen?

This proposal has just been approved by VPPR Prof. Ulrich Weidmann. The new compute nodes will be ordered and installed in the coming weeks. The existing Euler I nodes will be taken off-line in the morning of 20 August 2018 and shipped back to HPE as part of a trade-in program.

What is the difference between old and new nodes?

To simplify job scheduling, we have decide to keep the same number of cores per node (24). More precisely, the CPU will be upgraded from Intel Xeon E5-2697v2 (Ivy Bridge) to Xeon Gold 5118 (Skylake) and the memory will be increased from 64 GB (DDR3-1866) to 96 GB (DDR4-2400).

How will this affect shareholders?

This upgrade is completely transparent. Shareholders who bought Euler I nodes will have guaranteed access to the equivalent computing capacity during and after the upgrade, as long as their share is valid.

The new nodes will replace the current "standard" nodes in the Euler price list, which will be updated later this month.

How will this affect batch jobs?

The batch system has already been configured to not schedule jobs to Euler I, if these jobs would not be finished by 00:00, 20 August.

Jobs that explicitly request Euler I nodes, e.g. by selecting Intel Xeon E5-2697v2 processors, will not run after 20 August. To be on the safe side, you should stop requesting this CPU model immediately.

The E5-2697v2 CPUs does not support AVX2 instructions, which caused problems with some applications compiled for AVX2. After the upgrade, all nodes in Euler will support AVX2 instructions. You should target this architecture when compiling your programs on Euler. You can already select AVX2 nodes when you submit your job using the option

bsub -R avx2

Euler will continue to run normally throughout the upgrade. However, its overall computing capacity will be temporarily reduced from 45,000 to 35,000 cores, which may lead to longer queuing times. Considering that the new nodes will be significantly faster than the old ones, we count on your understanding during this transition phase.

Other questions?

Please contact cluster support if you have any question about the new nodes or about the upgrade procedure.