Intel compiler not compatible with CentOS 7.4

From ScientificComputing
Revision as of 10:11, 30 October 2017 by Urbanb (talk | contribs) (Wrapper for Open MPI)

Jump to: navigation, search

It has come to our attention that code compiled with most versions of the Intel compilers may result in unexpected numerical values when used under newer versions of glibc such as the one in CentOS 7.4, due to a bug in the Intel compiler. We are using this OS version only since the maintenance (October 2017). Jobs run before the maintenance (before 26 September 2017) are not affected .

What are the effects of the bug?

Code compiled with affected Intel compilers may give wrong results, NaN values, or abort.

Refer to Intel compiler not compatible with glibc 2.24-9 and newer and inconsistent program behavior on RedHat Enterprise Linux 7.4 if compiled with Intel

Who is affected? Are my jobs affected?

Anyone who uses software compiled with an Intel compiler. This includes:

  • self-compiled code
  • libraries that we provide
  • third-party software
  • commercial software including (but not limited to): Abaqus, ANSYS, CFX, Comsol, FLUENT, IMSL, Maple, Marc, Mathematica, Maxwell, Mentat, STAR-CCM+, STAR-CD
  • possibly (to be checked) applications linked against Intel MKL

Any job you have run since the maintenance (since October 2017) may be affected if it uses programs compiled with the Intel compilers. If you observed unexpected wrong results, NaN values, or numerics, then please contact cluster support.

Most of the software that we compile is compiled with the GNU C compiler. This compiler is not affected by the Intel compiler bug.

Is there a fix?

Although Intel originally said that version 2018.0 was not affected, that was apparently incorrect (or too optimistic).

Intel now claims that the problem will be fixed in the upcoming version 2018.0 update 1. We don't know when this version will be available.

Is there a workaround?

According to Intel compiler not compatible with glibc 2.24-9 and newer and inconsistent program behavior on RedHat Enterprise Linux 7.4 if compiled with Intel, a workaround for already compiled applications is to set:

export LD_BIND_NOW=1

before running the application. However, Euler users have reported that it does not work in all cases.

For programs using MPI, that variable must be passed to the executable after it is launched by the mpirun command. The syntax for Open MPI running on one node (up to 24 cores) is:

mpirun -x LD_BIND_NOW=1 [other MPI options] ./my_mpi_program

This will not work if you run on multiple nodes. In that case, you can use a wrapper script,

mpirun [other MPI options] /cluster/apps/local/ld_bind_now.sh ./my_mpi_program

For MVAPICH2:

mpirun -genv LD_BIND_NOW 1 [other MPI options] ./my_mpi_program

and for Intel MPI:

mpirun -env LD_BIND_NOW 1 [other MPI options] ./my_mpi_program

What can I do if I am affected?

You can enable the workaround in modules for software known to be affected by the bug. You are advised to rerun any calculations that may have been affected using the workaround or a different compiler.

Why don't you go back to CentOS 7.3?

This version contains numerous security vulnerabilities that can only be fixed by upgrading to CentOS 7.4.