Euler power outage (5 July 2018)

From ScientificComputing
Revision as of 08:42, 5 July 2018 by Sfux (talk | contribs) (Created page with "Last night shortly after 1 AM a thunderstorm in Lugano caused a partial power outage in the CSCS data centre. '''Most of the compute nodes of Euler went down'''. All jobs that...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Last night shortly after 1 AM a thunderstorm in Lugano caused a partial power outage in the CSCS data centre. Most of the compute nodes of Euler went down. All jobs that were running on these nodes have crashed. (LSF will report their status as "UNKNOWN" until the nodes are rebooted.)

The cluster’s storage systems, which are connected to uninterruptible power supply (UPS) survived the outage, apparently without data loss.

The cluster team is busy bringing the cluster back on-line and testing all its components. The login nodes are up and accessible normally. Batch queues will remain inactive until we are sure that the cluster is healthy.

Sorry for the inconvenience.