Power outage 2024-06-19

From ScientificComputing
Jump to: navigation, search

Due to a complete power outage in the CSCS datacenter at 1:35 AM, Euler went down and all running jobs were lost. Regular updates about this situation will be published on this wiki page.

We are sorry for the inconvenience

Updates

2024-06-19 11:20
There are no news from CSCS yet. We are still waiting for the power to be restored in the datacenter.
2024-06-19 12:10
Power and cooling at CSCS have been restored. We can now start powering up and testing the various components of Euler.
2024-06-19 14:30
Login nodes are again open and the batch system accepts jobs (queues are still closed, so jobs submitted now will stay pending until the queues are open).
2024-06-19 17:00
The 4h queues for CPU jobs are open in both the CentOS and Ubuntu parts of the cluster.
2024-06-20 07:45
The 4h queues for GPU jobs are open in the CentOS part of the cluster.
The powering up and testing of compute nodes is on-going, most of them seem to have survived the power outage without significant problems.