Power outage 2023-08-29

From ScientificComputing
Jump to: navigation, search

Due to a short power outage in the CSCS datacenter, hundreds of compute nodes came down around 11:15 today. All jobs running on these compute nodes were lost.

Many of these nodes rebooted and came back up when the power was restored, but some were left in a bad state. We are currently investigating this issue with CSCS.


2023-08-29 16:40
As we investigate a network issue, we are keeping all Euler VII, which represents almost ⅔ of all CPU nodes, closed for the time being.
2023-08-30 11:15
We could resolve the network issue and the cluster is again fully operational