Integration of Leonhard Open into Euler

From ScientificComputing
Revision as of 12:53, 27 August 2021 by Byrdeo (talk | contribs) (FAQ)

Jump to: navigation, search

Why

The Leonhard Open cluster, introduced in 2017 as a new platform for big data analytics and GPU computing, has become a victim of its own success. Due to the very high demand for GPU nodes, it has reached the space, power and cooling limits of our data center in Zurich. For this reason, all new GPU nodes bought in the last 12 months have been installed in Euler in Lugano. This has led to a situation where customers who initially bought a share of Leonhard ended up with GPU nodes in both clusters. Since moving individual shareholders from Leonhard to Euler is not practical, the Scientific IT Services have decided to completely integrate Leonhard Open into Euler. (The Leonhard Med cluster is not affected by this change.) This will benefit not only existing shareholders who had to deal with two separate clusters, but also future customers who had difficulty choosing between Euler and Leonhard. It will also simplify the work of the cluster management team.

How

The existing Leonhard GPU nodes will physically remain in Zurich but will be logically moved into the Euler network. They will be integrated into the cluster management tools and batch system of Euler.

All files currently currently stored in the "work" and "project" file systems of Leonhard Open will be transferred to their equivalent in Euler. This operation will be done by the cluster management team and will be mostly transparent to the users. It will require a short down-time during which Leonhard Open users will not be able to access their data. Once the transfer is done, they will find their files in Euler under the usual path (with a few exceptions).

Due to potential conflicts between the two clusters, the contents of "home" and "scratch" will not be transferred. Every user will have to copy the files they want to keep themselves. For this purpose, the login nodes and file systems of Leonhard Open will remain accessible (in read-only mode) for one month after the integration.

All Leonhard Open shares will be transferred to Euler. Leonhard users will therefore enjoy the same shareholder privileges and priority on Euler as they did on Leonhard Open. Apart from the hostname (euler.ethz.ch instead of login.leonhard.ethz.ch) nothing will change for the users.

The software environment of Euler has already been modified to support GPUs and features the same toolchains (GCC 4.8.5, 6.3.0, 8.2.0 and Intel 18.0.1). Packages that were only available on Leonhard Open are being installed on Euler to make the migration of your workflows as seamless as possible. The cluster support team will be happy to assist you in porting your workflows from Leonhard Open to Euler and will install any missing packages on demand.

When

The integration will take place on 14-15 September. The detailed schedule is:

Date and time Task
Now - 14.09.2021 Batch queues of Leonhard Open will be progressively inactivated to drain the compute nodes and ensure that no job is running on 14.09.2021
14.09.2021, 07:00 All batch queues will be closed, compute nodes will be taken out of operation and reconfigured as Euler nodes
14.09.2021, 15:00 All login nodes will be closed, Leonhard Open will be taken off-line
14.09.2021, 15:00 - 15.09.2021, 12:00 (noon) Work and project data will be transferred/synchronized from Leonhard Open to Euler
15.09.2021, 12:00 Leonhard Open users will find their data in Euler under the usual path (with a few exceptions)
15.09.2021, 12:00 The login nodes of Leonhard Open will be reopened for one month to allow users to copy data in their "home" and "scratch" directories (if needed)
14.10.2021, 12:00 Access to Leonhard Open will be closed, all remaining user data will be deleted

FAQ

Why are you doing this change now?

The decision to integrate Leonhard Open into Euler was taken last year already but the change was delayed due to Covid-19 and last year's cyber-attack against many HPC sites. We have used this time to do a proof-of-concept to verify that Leonhard Open nodes in Zurich could be integrated into the Euler cluster in Lugano. The date was set during the summer holiday, before the start of the Fall semester, to minimise the impact on students and teachers who rely on Leonhard for their courses.

Does this change affect Leonhard Med?

No. Leonhard Med will continue to be operated as a separate, high-security system. Its users and data are not affected by this integration.

What happens to my Leonhard Open share?

Nothing. You will get exactly the same resources on Euler that you had on Leonhard Open and the temporal validity of your share will remain the same.

Does this integration bring any benefits to Leonhard Open shareholders?

Since Euler is much larger than Leonhard Open, it will provide more elasticity, thus allowing for higher peak usage. Also, Euler contains new GPUs that are not available on Leonhard, such as Nvidia Tesla A100.

Do I need to transfer my data to Euler?

You will need to transfer data from your "home" directory and your personal "scratch" directory. We will take care of transferring the data from your "work" and "project" directories (if applicable).

Is all the software that was available on Leonhard Open also available on Euler?

To prepare the integration of Leonhard Open into Euler we already started some months ago to identify differences in the software stacks of Leonhard Open and Euler and to install missing software packages on Euler. There will still be some packages missing that we can install again on request.

Need help?

If you have a question that is not covered by the FAQ above, or any concern about the integration of Leonhard Open into Euler (e.g., if you have a workflow tightly coupled with Leonhard Open), do not hesitate to contact our cluster support team.