Integration of Leonhard Open into Euler

From ScientificComputing
Revision as of 11:18, 27 August 2021 by Sfux (talk | contribs) (When)

Jump to: navigation, search

Why

The Leonhard Open cluster, introduced in 2017 as a new platform for big data analytics and GPU computing, has become a victim of its own success. Due to the very high demand for GPU nodes, it has reached the space, power and cooling limits of our data center in Zurich. For this reason, all new GPU nodes bought in the last 12 months have been installed in Euler in Lugano. This has led to a situation where customers who initially bought a share of Leonhard ended up with GPU nodes in both clusters. Since moving individual shareholders from Leonhard to Euler is not practical, the Scientific IT Services have decided to completely integrate Leonhard Open into Euler. (The Leonhard Med cluster is not affected by this change.) This will benefit not only existing shareholders who had to deal with two separate clusters, but also future customers who had difficulty choosing between Euler and Leonhard. It will also simplify the work of the cluster management team.

How

The existing Leonhard GPU nodes will physically remain in Zurich but will be logically moved into the Euler network. They will be integrated into the cluster management tools and batch system of Euler.

All files currently currently stored in the "work" and "project" file systems of Leonhard Open will be transferred to their equivalent in Euler. This operation will be done by the cluster management team and will be mostly transparent to the users. It will require a short down-time during which Leonhard Open users will not be able to access their data. Once the transfer is done, they will find their files in Euler under the usual path (with a few exceptions).

Due to potential conflicts between the two clusters, the contents of "home" and "scratch" will not be transferred. Every user will have to copy the files they want to keep themselves. For this purpose, the login nodes and file systems of Leonhard Open will remain accessible (in read-only mode) for one month after the integration.

All Leonhard Open shares will be transferred to Euler. Leonhard users will therefore enjoy the same shareholder privileges and priority on Euler as they did on Leonhard Open. Apart from the hostname (euler.ethz.ch instead of login.leonhard.ethz.ch) nothing will change for the users.

The software environment of Euler has already been modified to support GPUs and features the same toolchains (GCC 4.8.5, 6.3.0, 8.2.0 and Intel 18.0.1). Packages that were only available on Leonhard Open are being installed on Euler to make the migration of your workflows as seamless as possible. The cluster support team will be happy to assist you in porting your workflows from Leonhard Open to Euler and will install any missing packages on demand.

When

The integration will take place on 14-15 September. The detailed schedule is:

Date and time Task
Now - 14.09.2021 Batch queues of Leonhard Open will be progressively inactivated to drain the compute nodes and ensure that no job is running on 14.09.2021
14.09.2021, 07:00 All batch queues will be closed, compute nodes will be taken out of operation and reconfigured as Euler nodes
14.09.2021, 15:00 All login nodes will be closed, Leonhard Open will be taken off-line
14.09.2021, 15:00 - 15.09.2021, 12:00 (noon) Work and project data will be transferred/synchronized from Leonhard Open to Euler
15.09.2021, 12:00 Leonhard Open users will find their data in Euler under the usual path (with a few exceptions)
15.09.2021, 12:00 The login nodes of Leonhard Open will be reopened for one month to allow users to copy data in their "home" and "scratch" directories (if needed)
14.10.2021, 12:00 Access to Leonhard Open will be closed, all remaining user data will be deleted

FAQ

What happens to my Leonhard Open share?

Nothing. You will get exactly the same resources that you had on Leonhard Open on Euler for the remaining part of the 4 years duration that a share is valid. Since the Euler cluster is much larger than the Leonhard Open cluster, there will be a larger elasticity allowing for higher peak usage.

Do I need to transfer my data to Euler?

You will need to transfer data from your "home" directory and your personal "scratch" directory. We will take care of transferring the data from your "work" and "project" directories (if applicable).

Is all the software that was available on Leonhard Open also available on Euler?

To prepare the integration of Leonhard Open into Euler we already started some months ago to identify differences in the software stacks of Leonhard Open and Euler and to install missing software packages on Euler that were only available on Leonhard Open. There will still be some packages missing that we can install again on request.