Euler maintenance (August 2023)

From ScientificComputing
Jump to: navigation, search

Please note that the Euler cluster will be undergoing maintenance to replace the water-cooling elements that were installed with the first generation of Euler — Euler I — ten years ago.

Therefore

Euler will be OFFLINE from 12:00 (noon) Friday 4 August until 16:00 19:00 Friday 11 August.

As usual, batch queues will be inactivated prior to the maintenance to ensure that no jobs get killed when the cluster is shut down. Short jobs can still run until the cluster is offline. There is no action required on your part.

Sorry for the inconvenience

Please watch this page for regular updates before, during and after the maintenance.

Updates

2023-08-11 11:30
The infrastructure work at CSCS is taking more time than expected. We do not have power yet, so it will not be possible for us to reopen the cluster today as originally planned. More information will follow.
2023-08-11 14:00
The infrastructure work at CSCS is complete. Power has just been restored. We are now starting to power up the cluster's core systems (networks, storage, admin nodes).
2023-08-11 15:45
We're having some issues with the cluster's InfiniBand network. Our specialists on site are investigating and will try to fix it. It seems unlikely that we'll be able to reopen the cluster before the weekend.
2023-08-11 18:45
We will reopen a few login nodes at 19:00 to allow users to access their files and submit jobs. We still need some time to power up and test all compute nodes, so the batch queues will remain inactive until Monday morning.
2023-08-14 11:40
The first compute nodes are ready to process jobs. We opened the first queues (4h) and if there are no problems, then we will progressively open the remaining queues.
2023-08-14 13:40
24h queues are open
2023-08-14 14:50
The maintenance has finished. The 120h queues are open