Planned maintenance in June and August 2023

From ScientificComputing
Revision as of 09:21, 27 July 2023 by Sfux (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Please note that the Euler cluster will be undergoing maintenance for several days in June and August 2023:

  • Monday 5 June (morning) to Wednesday 7 June (morning): Hardware and software upgrades of the cluster's core components (admin nodes, login nodes, file servers) and migration of some data to a new storage system.
  • Monday 4 August (morning) to Friday 11 August (evening): Replacement of the water-cooling elements that were installed with the first generation of Euler — Euler I — ten years ago.

Both maintenances require a complete shutdown of Euler. The cluster's login nodes and file servers will be inaccessible during these times.

This is a pre-announcement to give you enough time to prepare for these downtimes. The exact times will be communicated 1-2 weeks prior to each maintenance.

As usual, batch queues will be progressively inactivated in the days/hours before the downtime to ensure that all nodes are empty when the cluster is shut down.


— Sorry for the inconvenience.

5-7 June 2023

Please note that the Euler cluster will be undergoing maintenance to perform hardware and software upgrades of the cluster's core components (admin nodes, login nodes, file servers) and migration of some data to a new storage system.

Therefore

Euler will be OFFLINE from 07:00 Monday 5 June until 11:00 Wednesday 7 June

As usual, batch queues will be inactivated prior to the maintenance to ensure that no jobs get killed when the cluster is shut down. Short jobs can still run until the cluster is offline. There is no action required on your part.

Sorry for the inconvenience

Please watch this page for regular updates before, during and after the maintenance.

Updates

2022-06-06 15:15
The login nodes of Euler are again open. Users can access their data and already submit jobs (they will stay in the queue until the queues are activated). We are still running some tests and will progressively reactivate the queues starting tomorrow morning.
2022-06-07 09:25
The maintenance has finished. The short queues are open and we will progressively activate the longer queues during the day.

4-11 August

Please note that the Euler cluster will be undergoing maintenance to replace the water-cooling elements that were installed with the first generation of Euler — Euler I — ten years ago.

Therefore

Euler will be OFFLINE from 12:00 (noon) Friday 4 August until 16:00 19:00 Friday 11 August.

As usual, batch queues will be inactivated prior to the maintenance to ensure that no jobs get killed when the cluster is shut down. Short jobs can still run until the cluster is offline. There is no action required on your part.

Sorry for the inconvenience

Please watch this page for regular updates before, during and after the maintenance.

Updates

2023-08-11 11:30
The infrastructure work at CSCS is taking more time than expected. We do not have power yet, so it will not be possible for us to reopen the cluster today as originally planned. More information will follow.
2023-08-11 14:00
The infrastructure work at CSCS is complete. Power has just been restored. We are now starting to power up the cluster's core systems (networks, storage, admin nodes).
2023-08-11 15:45
We're having some issues with the cluster's InfiniBand network. Our specialists on site are investigating and will try to fix it. It seems unlikely that we'll be able to reopen the cluster before the weekend.
2023-08-11 18:45
We will reopen a few login nodes at 19:00 to allow users to access their files and submit jobs. We still need some time to power up and test all compute nodes, so the batch queues will remain inactive until Monday morning.
2023-08-14 11:40
The first compute nodes are ready to process jobs. We opened the first queues (4h) and if there are no problems, then we will progressively open the remaining queues.
2023-08-14 13:40
24h queues are open
2023-08-14 14:50
The maintenance has finished. The 120h queues are open