Difference between revisions of "Leonhard Open maintenance (December 2018)"

From ScientificComputing
Jump to: navigation, search
(Updates)
Line 16: Line 16:
 
;'''2018-12-10 16:30'''
 
;'''2018-12-10 16:30'''
 
:Testing the Leonhard Open cluster after the maintenance has revealed that the openmpi MPI module does not work as expected. Jobs that have been identified as MPI jobs have been suspended, though we encourage you to kill them and resubmit them once we solve the MPI issues. In the meantime, we suggest you do not submit new MPI jobs and do not use the openmpi modules.
 
:Testing the Leonhard Open cluster after the maintenance has revealed that the openmpi MPI module does not work as expected. Jobs that have been identified as MPI jobs have been suspended, though we encourage you to kill them and resubmit them once we solve the MPI issues. In the meantime, we suggest you do not submit new MPI jobs and do not use the openmpi modules.
 +
 +
;'''2018-12-11 16:40'''
 +
:We are still working on the OpenMPI issue. OpenMPI jobs will not fail, but they will have lots of warnings. This is a known problem, therefore please do not report these warnings to cluster support. Since many jobs are not using OpenMPI, we decided to open the 4h and 24h queues.

Revision as of 15:40, 11 December 2018

We would like to inform you about an upcoming maintenance of the Leonhard Open cluster.

The Leonhard Open cluster will be offline from 15:00 on Friday, 7 December 2018 to migrate data to a new storage system. We expect to bring the cluster online in the afternoon of Monday, 10 December 2018.

No action needs to be taken from your side. As usual, jobs that can not start before the downtime will be held in the queues until the end of the maintenance, after which they will start normally.

We are sorry for any inconvenience this may cause.

We will update this page before and during the maintenance.

Updates

2018-12-10 10:20
Our storage experts have successfully migrated the data to the new storage system and finished the integrity checks. Currently we are doing tests and will provide further updates on the maintenance in the afternoon.
2018-12-10 16:30
Testing the Leonhard Open cluster after the maintenance has revealed that the openmpi MPI module does not work as expected. Jobs that have been identified as MPI jobs have been suspended, though we encourage you to kill them and resubmit them once we solve the MPI issues. In the meantime, we suggest you do not submit new MPI jobs and do not use the openmpi modules.
2018-12-11 16:40
We are still working on the OpenMPI issue. OpenMPI jobs will not fail, but they will have lots of warnings. This is a known problem, therefore please do not report these warnings to cluster support. Since many jobs are not using OpenMPI, we decided to open the 4h and 24h queues.