Difference between revisions of "Euler power outage (24 July 2018)"

From ScientificComputing
Jump to: navigation, search
Line 7: Line 7:
 
==Updates==
 
==Updates==
  
11:30 —  More information about the outage (in Italian): https://www.ticinonews.ch/ticino/468791/e-luce-fu-elettricita-tornata
+
24 July, 11:30 —  More information about the outage (in Italian): https://www.ticinonews.ch/ticino/468791/e-luce-fu-elettricita-tornata
  
12:30 —  The power in Lugano is back on-line. CSCS is now restarting the data centre's cooling infrastructure
+
24 July, 12:30 —  The power in Lugano is back on-line. CSCS is now restarting the data centre's cooling infrastructure
  
13:15 —  Information about outage from the electricity provider (in Italian): https://www.ail.ch/meta-navigation/media/news-comunicati/Ripristino-interruzione-di-servizio.html
+
24 July, 13:15 —  Information about outage from the electricity provider (in Italian): https://www.ail.ch/meta-navigation/media/news-comunicati/Ripristino-interruzione-di-servizio.html
  
13:30 —  The cooling infrastructure at CSCS is operational again. We can now progressively restart the network and storage systems of Euler. This will take a few hours
+
24 July, 13:30 —  The cooling infrastructure at CSCS is operational again. We can now progressively restart the network and storage systems of Euler. This will take a few hours
  
15:00 —  All storage systems are up and healthy. We can now start powering up compute nodes
+
24 July, 15:00 —  All storage systems are up and healthy. We can now start powering up compute nodes
  
16:15 —  Most compute nodes are up. We are now testing the health and performance of the nodes and networks
+
24 July, 16:15 —  Most compute nodes are up. We are now testing the health and performance of the nodes and networks
  
17:15 —  '''Euler is back!''' Login nodes are open. Batch queues are currently inactive. They will be activated once we are sure that the batch system is working properly
+
24 July, 17:15 —  '''Euler is back!''' Login nodes are open. Batch queues are currently inactive. They will be activated once we are sure that the batch system is working properly
  
19:00 —  All 4-hour and 24-hour queues are active. If all goes well, 5-day and 30-day queues will be activated tomorrow morning.
+
24 July, 19:00 —  All 4-hour and 24-hour queues are active. If all goes well, 5-day and 30-day queues will be activated tomorrow morning.
 +
 
 +
25 July, 13:00 —  All queues are active, '''Euler is fully operational'''.

Revision as of 11:02, 25 July 2018

Due to a massive power outage in Lugano (and apparently a big part of Ticino), most compute nodes of Euler went down at 10:42 this morning, causing the loss of all running jobs.

Since our colleagues at CSCS do not know when the power will be restored, we have initiated an emergency shutdown of all systems connected to UPS, including storage systems, login nodes and admin nodes.

We will update this page as the situation evolves.

Updates

24 July, 11:30 — More information about the outage (in Italian): https://www.ticinonews.ch/ticino/468791/e-luce-fu-elettricita-tornata

24 July, 12:30 — The power in Lugano is back on-line. CSCS is now restarting the data centre's cooling infrastructure

24 July, 13:15 — Information about outage from the electricity provider (in Italian): https://www.ail.ch/meta-navigation/media/news-comunicati/Ripristino-interruzione-di-servizio.html

24 July, 13:30 — The cooling infrastructure at CSCS is operational again. We can now progressively restart the network and storage systems of Euler. This will take a few hours

24 July, 15:00 — All storage systems are up and healthy. We can now start powering up compute nodes

24 July, 16:15 — Most compute nodes are up. We are now testing the health and performance of the nodes and networks

24 July, 17:15 — Euler is back! Login nodes are open. Batch queues are currently inactive. They will be activated once we are sure that the batch system is working properly

24 July, 19:00 — All 4-hour and 24-hour queues are active. If all goes well, 5-day and 30-day queues will be activated tomorrow morning.

25 July, 13:00 — All queues are active, Euler is fully operational.