Migration to CentOS 7

From ScientificComputing
Revision as of 08:21, 28 August 2017 by Urbanb (talk | contribs) (Describes name change)

Jump to: navigation, search

Introduction

The ID SIS HPC group is currently in the process of migrating the Euler cluster from CentOS 6 to CentOS 7. In a first phase back in March 2017, the new Euler III nodes were installed with CentOS 7 to test the new major release of CentOS. After carefully testing the new setup during the beta-test phase, we added some compatibility libraries to assure that the software stack that was compiled under CentOS 6 is still working on the Euler III nodes. Most locally installed software should work without problems on the upgraded nodes, but you may encounter missing libraries.

Schedule

On Monday 28 August 2017 we will start with the second phase, upgrading the remaining Euler nodes to CentOS 7, rack by rack. The upgrade will take place without downtime or interruption of user jobs. We plan to upgrade one rack per working day, such that the upgrade should be finished by the 21 September 2017.

Known issues

NAS NFS mounts

Upgraded nodes will be moved to the same network as the Euler III nodes, which is different from Euler nodes not yet upgraded. For central NAS shares from the ID storage group, there is no change required (we already took care of this). If you use your own NAS, then you need to change the export rules and/or update your firewall to include the new IP addresses. You can test whether your NAS is affected by submitting a test job to list some files on your NAS:
bsub -R "select[centos==7]" -Ip ls /nfs/my-nas-server/my-nas-volume (with appropirate substitutions).

/lib64/libc.so.6: version `GLIBC_2.14' not found

If you compile your software in a batch job which is dispatched to a compute node running CentOS 7, then your code will be compiled for GLIBC 2.14. If a job using this code is at a later stage dispatched to a CentOS 6 node (GLIBC 2.12), then it will crash with the error message:

/lib64/libc.so.6: version `GLIBC_2.14' not found

Therefore please either make sure that your jobs that are compiling software are dispatched to CentOS 6 nodes (then the software can run everywhere in the cluster) or make sure that job using software compiled under CentOS 7 will run only on compute nodes which are already upgraded to CentOS 7.

Submitting a job to a CentOS 6 compute node:

bsub -R "select[centos==6]" ...

Submitting a job to a CentOS 7 compute node:

bsub -R "select[centos==7]" ...

If you do not specify any beta or stable flag, then your job can be dispatched to either a CentOS 6 or CentOS 7 node.

Missing libraries

We have extensively tested CentOS 7 on the new Euler III in order to figure out which compatibility libraries are required to assure that the existing software stack is supported on CentOS 7. In case you receive an error message of the type

error while loading shared libraries: [library name].so: cannot open shared object
file: No such file or directory

when running your application/software on a CentOS 7 compute node, then please contact cluster support and we will help you to resolve this problem.

Jobs submitted from a CentOS 7 host

Jobs submitted from a CentOS 7 host (for instance when submitting a job out of another batch job) can only run on compute nodes with CentOS 7. In case a job submitted from a CentOS 7 host is requesting a resource that is only available on compute nodes which have not been upgraded yet, the job will be pending until these nodes are upgraded too.

Name change

The nodes running CentOS 7 have different names. Instead of the eXyyy pattern they are now named eu-c7-XXX-YY.