Euler power outage (26 October 2020)

From ScientificComputing
Revision as of 11:24, 27 October 2020 by Sfux (talk | contribs) (Updates)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Due to a power outage in the CSCS datacenter this morning, most of the Euler compute nodes went down around 7:50 AM. All running jobs have been lost. The HPC team is in close contact with CSCS and working on bringing the cluster back online. The login nodes are online, such that users can access the storage system. All queues are closed as we need to check the state of the batch system.

We are sorry for the inconvenience.

We will update this page as the situation evolves.

Updates

20120-10-26 09:30
Power in the LCA datacenter has been restored. Network is back. Our system administrators are now looking at the compute nodes and the batch system.
2020-10-26 17:00
Compute nodes are powered on and being checked. The job queues are being progressively activated as compute capacity becomes available.
2020-10-27 11:25
Most compute nodes are back in production. All queues are again open.