Euler power outage (08 February 2023)

From ScientificComputing
Jump to: navigation, search

Due to a power outage in the CSCS datacenter this morning, most of the Euler compute nodes went down around 9:30 AM. All running jobs have been lost. The HPC team is in close contact with CSCS and working on bringing the cluster back online. All queues are closed as we need to check the state of the batch system.

We are sorry for the inconvenience.

We will update this page as the situation evolves.

Updates

2023-02-08 12:50
Power in the LCA datacenter has been restored. We are working on restoring the most important services and will publish an update later this afternoon.
2023-02-08 15:00
The storage systems, batch systems and login nodes are back online. You can access your files and submit jobs. The queues will remain closed until compute nodes are checked and back online.
2023-02-09 10:25
We have opened the short queues (4h and 24h) and will subsequently open the remaining queues once we see that the system is running stable. The system status has been set back to green (fully operational)