Difference between revisions of "Work storage /cluster/work/ partially available (10 May 2022)"
|Line 106:||Line 106:|
: We are still working on fixing the problem with /cluster/work.
: We are still working on fixing the problem with /cluster/work . prevent jobs from users affected volumes from starting.
Revision as of 10:07, 11 May 2022
This morning a storage controller crashed which affects the /cluster/work storage. Parts of the /cluster/work/ are temporarily unavailable. Our storage specialists are in close contact with the vendor and work on bringing back the storage system as fast as possible. Please note that only some users are affected by this incident, not all.
If you have any command in your .bashrc or .bash_profile that accesses a storage volume that is temporarily unavailable, then your login might get stuck. If you encounter this problem, then please write to firstname.lastname@example.org and we can comment out those commands from your .bashrc and/or .bash_profile such that you can again login to Euler.
Volumes not affected:
We will update this news item whenever there is some new information.
We are sorry for the inconvenience.
- 2022-05-10 13:20
- The problem with the storage controller could not be fixed. It needs to be replaced. We don't know yet, how long the it will take until /cluster/work is back to normal operation (our current guess is 24 to 96 hours). After the replacement we will also run some integrity checks on the data. We will publish another update later this afternoon.
- 2022-05-10 16:30
- The vendor is sending a new controller that is already on the way to the data center. We will the publish another update tomorrow morning.
- 2022-05-11 11:55
- We are still working on fixing the problem with the affected /cluster/work volumes. We now prevent jobs from users which own affected volumes from starting. This will ensure that the jobs don't get stuck in D-state when trying to access an affected volume.