Intel compiler not compatible with CentOS 7.4

From ScientificComputing
Revision as of 12:46, 25 October 2017 by Sfux (talk | contribs) (Is there a fix?)

Jump to: navigation, search

It has come to our attention that code compiled with most versions of the Intel compilers may result in unexpected numerical values when used under newer versions of glibc such as the one in CentOS 7.4, due to a bug in the Intel compiler. We are using this OS version only since the maintenance (October 2017). Jobs run before the maintenance (before 26 September 2017) are not affected .

What are the effects of the bug?

Code compiled with affected Intel compilers may give wrong results, NaN values, or abort. Refer to Intel

Who is affected? Are my jobs affected?

Anyone who uses software compiled with an Intel compiler up to release 4 of Parallel Studio XE 2017. This includes:

  • self compiled code
  • libraries that we provide
  • pre-build software provided by vendors

Any job you have run since the maintenance (since October 2017) may be affected if it uses programs compiled with the Intel compilers. If you observed unexpected wrong results, NaN values, or numerics, then please contact cluster support.

Most of the software that we compile is compiled with the GNU C compiler. This compiler is not affected by the Intel compiler bug.

Is there a fix?

No. The only fix is to recompile the code with a version of the Intel compiler not affected by the bug. We recommend to use the intel/2017.5 module, which has fixed the bug.

Since it has only been released recently we are still in the processes of compiling the corresponding libraries such as Open MPI, OpenBLAS, etc.

Why did you install Intel 2018.0.0 and told user to use this version, even though it is considered to not have the bug fixed according to the Intel erratum

Intel has silently changed the page (and did not change the "updated" date of 22. March 2017)

and claims now that the bug is fixed in Intel 2018.0 update 1. In the media:Intelbug-DPD200419088.pdf original text (we made a screenshot) it was stated that the bug is fixed in Intel 2018.0, without mentioning any update version.

Is there a workaround?

You can set the LD_BIND_NOW=1 environment variable,

export LD_BIND_NOW=1

which changes the behavior of the linker. This has been shown to break some programs, for example the Open MPI program. The solution is to pass the variable to the program executable after it is launched by the mpirun command. For Open MPI use the -x LD_BIND_NOW=1<tt> option:

mpirun -x LD_BIND_NOW=1 [other MPI options] ./my_mpi_program

for MVAPICH2 the <tt>-genv LD_BIND_NOW 1 option:

mpirun -genv LD_BIND_NOW 1 [other MPI options] ./my_mpi_program

and for Intel MPI the -env LD_BIND_NOW 1 option:

mpirun -env LD_BIND_NOW 1 [other MPI options] ./my_mpi_program

What can I do if I am affected?

You can enable the workaround in modules for software known to be affected by the bug. You are advised to rerun any calculations that may have been affected using the workaround or a different compiler.

Why do you not revert back to CentOS 7.3?

This version contains numerous security vulnerabilities that can only be fixed by upgrading to CentOS 7.4.