Difference between revisions of "Emergency maintenance to fix security vulnerability (CVE-2016-5195)"

From ScientificComputing
Jump to: navigation, search
m (Updates)
(Updates)
Line 19: Line 19:
 
''2016-10-25 15:15''
 
''2016-10-25 15:15''
  
We have installed our '''custom-made patch''' on the '''login nodes of Euler''' and plan to reopen them shortly. We will notify all users by email when this is done.
+
We have installed our '''custom-made patch''' on the '''login nodes of Euler''' and reopened them to all users. (You should have received a notification by email.)
  
 
Please note that '''we are doing this primarily to let Euler users to access their data on the cluster'''. All compute nodes will remain '''closed''' until they are reinstalled. This process will span several days as we need to wait until they are empty before we can reinstall them. (We don't want to kill the jobs running there unless absolutely necessary.) In the meantime, '''the computing capacity of Euler will remain severely limited''', which will result in long queueing times. In a first phase, only short (4h) jobs will be allowed to run. Longer jobs (24h) will be allowed once we are certain that our custom-made patch does not have any undesirable side effects, and once a sufficient number of compute nodes have been reinstalled and put back into production. '''Very long jobs (120h) will not be allowed to run until CentOS releases an official patch for CentOS 6.8'''.
 
Please note that '''we are doing this primarily to let Euler users to access their data on the cluster'''. All compute nodes will remain '''closed''' until they are reinstalled. This process will span several days as we need to wait until they are empty before we can reinstall them. (We don't want to kill the jobs running there unless absolutely necessary.) In the meantime, '''the computing capacity of Euler will remain severely limited''', which will result in long queueing times. In a first phase, only short (4h) jobs will be allowed to run. Longer jobs (24h) will be allowed once we are certain that our custom-made patch does not have any undesirable side effects, and once a sufficient number of compute nodes have been reinstalled and put back into production. '''Very long jobs (120h) will not be allowed to run until CentOS releases an official patch for CentOS 6.8'''.

Revision as of 15:04, 25 October 2016

A recently published vulnerability in the Linux kernel (CVE-2016-5195) allows any user to get full control of the operating system. This is a critical security issue, which leaves us with no choice but to take BOTH Brutus and Euler OFF-LINE until the issue has been fixed.

Since we cannot exclude the possibility that someone already exploited this vulnerability, all login nodes and compute nodes will need to be wiped clean and their OS reinstalled from scratch, before they can be put back in production.

The reinstallation of the login and compute nodes will affect only system files stored in these nodes' local file system (/bin, /etc, /sbin, /scratch, /tmp, /usr, etc.). User data (/cluster/home, /cluster/scratch, /cluster/work, /cluster/project) do not pose any security risk and will therefore not be touched in any way.

At the time of writing neither Red Hat nor CentOS have released a patch for the operating system that we are using on Brutus and Euler. No-one knows how long this will take. Please refrain from submitting tickets or sending emails asking when Brutus and Euler will be back on-line. We will publish regular status updates on this page and notify all cluster users by email when Brutus and Euler are on-line again.

Thank you for your understanding

Updates

2016-10-25 13:30

Red Hat released a patch for RHEL 7 yesterday evening. It may take some time until they release one for RHEL 6, and then for CentOS to port it to the version we are using on our clusters (CentOS 6.8).

Our local kernel expert has therefore decided to write her own patch for CentOS 6.8, based on the information publicly available about the kernel's vulnerability. The cluster support team is testing it right now. As far as we can tell, it fixes the vulnerability, but we still have to make sure that the new kernel does not have any undesirable side effects. If these tests are successful, we will deploy it to the login nodes of Euler, and then progressively reinstall all compute nodes. That should allow us to (partly) reopen Euler while we wait for the official patch for CentOS 6.8.

2016-10-25 15:15

We have installed our custom-made patch on the login nodes of Euler and reopened them to all users. (You should have received a notification by email.)

Please note that we are doing this primarily to let Euler users to access their data on the cluster. All compute nodes will remain closed until they are reinstalled. This process will span several days as we need to wait until they are empty before we can reinstall them. (We don't want to kill the jobs running there unless absolutely necessary.) In the meantime, the computing capacity of Euler will remain severely limited, which will result in long queueing times. In a first phase, only short (4h) jobs will be allowed to run. Longer jobs (24h) will be allowed once we are certain that our custom-made patch does not have any undesirable side effects, and once a sufficient number of compute nodes have been reinstalled and put back into production. Very long jobs (120h) will not be allowed to run until CentOS releases an official patch for CentOS 6.8.