Euler maintenance (October 2023)

From ScientificComputing
Revision as of 09:18, 27 October 2023 by Urbanb (talk | contribs) (All queues open.)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

CSCS informed us that the data center will be undergoing a site-wide cooling maintenance on Wednesday 25 October 2023, which will require a shutdown of Euler.

The date of this maintenance is fixed by CSCS and is beyond our control. Consequently, this maintenance cannot be cancelled or postponed.


Tentative schedule (subject to change):

Tue 24 Oct, 15:00 Start of power-down procedure, Euler off-line
Wed 25 Oct, whole day CSCS maintenance, Euler completely down
Thu 26 Oct, afternoon Login nodes up, access to storage possible
Fri 27 Oct, afternoon Gradual reactivation of batch queues: first 4h, then 24h, and finally 120h

As usual, batch queues will be progressively inactivated in the days and hours prior to the maintenance, to ensure that no jobs get killed when the cluster is shut down. Short jobs can still run until the cluster is taken off-line. You will not be able to access your data between Tuesday afternoon and Thursday afternoon. If all goes well, Euler will start running jobs again in the afternoon of Friday, 27 October and will be fully operational in the evening.

This is for your information only. No action is required on your part.

We are sorry for the inconvenience.

Updates

2023-10-25 18:00
Cooling in the data center was restored this afternoon already so we could start bringing up Euler a bit sooner than planned. We expect that the cluster will be back on-line Thursday afternoon.
2023-10-26 16:45
Euler is on-line again: the login nodes are open and all file systems are accessible. The batch system will be reactivated shortly, once enough compute nodes have been powered up and tested.
2023-10-27 09:15
All queues are open.