Difference between revisions of "Intel compiler not compatible with CentOS 7.4"

From ScientificComputing
Jump to: navigation, search
(Is there a workaround?)
 
(19 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
<span style="color:red;font-size:x-large;">The update of CentOS 7.4 to address the ''Meltdown'' and ''Spectre'' vulnerabilities also rolled back the change in ''libc'' that exposed the bug in the Intel compiler. As the bug is no longer visible on Euler, the information on this page is obsolete and the workaround below is no longer necessary.</span>
 +
 
It has come to our attention that '''code compiled''' with most versions of the '''Intel compilers''' may result in '''unexpected numerical values''' when used under newer versions of glibc such as the one in '''CentOS 7.4''', due to a bug in the Intel compiler. We are using this OS version only since the maintenance (October 2017). '''Jobs run before the maintenance (before 26 September 2017) are not affected''' .
 
It has come to our attention that '''code compiled''' with most versions of the '''Intel compilers''' may result in '''unexpected numerical values''' when used under newer versions of glibc such as the one in '''CentOS 7.4''', due to a bug in the Intel compiler. We are using this OS version only since the maintenance (October 2017). '''Jobs run before the maintenance (before 26 September 2017) are not affected''' .
  
 
===What are the effects of the bug?===
 
===What are the effects of the bug?===
Code compiled with affected Intel compilers may give '''wrong results, NaN values, or abort'''. Refer to [https://software.intel.com/en-us/articles/intel-compiler-not-compatible-with-glibc-224-9-and-newer Intel]
+
Code compiled with affected Intel compilers may give '''wrong results, NaN values, or abort'''.
 +
 
 +
Refer to [https://software.intel.com/en-us/articles/intel-compiler-not-compatible-with-glibc-224-9-and-newer Intel compiler not compatible with glibc 2.24-9 and newer] and [https://software.intel.com/en-us/articles/inconsistent-program-behavior-on-red-hat-enterprise-linux-74-if-compiled-with-intel inconsistent program behavior on RedHat Enterprise Linux 7.4 if compiled with Intel]
  
 
===Who is affected? Are my jobs affected?===
 
===Who is affected? Are my jobs affected?===
Anyone who uses software compiled with an Intel compiler up to release 4 of Parallel Studio XE 2017. This includes:
+
Anyone who uses software compiled with an Intel compiler. This includes:
  
* self compiled code
+
* self-compiled code
 
* libraries that we provide
 
* libraries that we provide
* pre-build software provided by vendors
+
* third-party software
 +
* commercial software including (but not limited to): Abaqus, ANSYS, CFX, Comsol, FLUENT, IMSL, Maple, Marc, Mathematica, Maxwell, Mentat, STAR-CCM+, STAR-CD
 +
* possibly (to be checked) applications linked against Intel MKL
  
 
'''Any job you have run since the maintenance''' (since October 2017) may be affected if it uses programs compiled with the Intel compilers. If you observed unexpected wrong results, NaN values, or numerics, then please contact {{Cluster_support}}.
 
'''Any job you have run since the maintenance''' (since October 2017) may be affected if it uses programs compiled with the Intel compilers. If you observed unexpected wrong results, NaN values, or numerics, then please contact {{Cluster_support}}.
Line 16: Line 22:
  
 
===Is there a fix?===
 
===Is there a fix?===
No. The only fix is to '''recompile the code with a version of the Intel compiler not affected by the bug'''. We now provide the newest Intel compiler release (Parallel Studio XE 2018), which has fixed the bug, in the intel/2018.0 module in the ''new'' section. Since it has only been released recently we are still in the processes of compiling the corresponding libraries such as Open&nbsp;MPI, OpenBLAS, etc.
+
 
 +
Although Intel originally said that version 2018.0 was not affected, that was apparently '''incorrect''' (or too optimistic).
 +
 
 +
Intel now claims that the problem '''will''' be fixed in the '''upcoming''' version 2018.0 update 1. We don't know when this version will be available.
  
 
===Is there a workaround?===
 
===Is there a workaround?===
You can set the LD_BIND_NOW=1 environment variable,
+
According to [https://software.intel.com/en-us/articles/intel-compiler-not-compatible-with-glibc-224-9-and-newer Intel compiler not compatible with glibc 2.24-9 and newer] and [https://software.intel.com/en-us/articles/inconsistent-program-behavior-on-red-hat-enterprise-linux-74-if-compiled-with-intel inconsistent program behavior on RedHat Enterprise Linux 7.4 if compiled with Intel], a workaround for already compiled applications is to set:
  
 
  export LD_BIND_NOW=1
 
  export LD_BIND_NOW=1
  
which changes the behavior of the linker. This has been shown to break some programs, for example the Open MPI program. The solution is to pass the variable to the program executable after it is launched by the mpirun command. For Open&nbsp;MPI use the <tt>-x LD_BIND_NOW=1<tt> option:
+
before running the application. However, Euler users have reported that it does not work in all cases.
 +
 
 +
For programs using MPI, that variable must be passed to the executable after it is launched by the mpirun command. The syntax for Open&nbsp;MPI '''running on one node''' (up to 24&nbsp;cores) is:
  
 
  mpirun -x LD_BIND_NOW=1 [other MPI options] ./my_mpi_program
 
  mpirun -x LD_BIND_NOW=1 [other MPI options] ./my_mpi_program
  
for MVAPICH2 the <tt>-genv LD_BIND_NOW 1</tt> option:
+
This will not work if you run on multiple nodes. In that case, you can use a wrapper script,
 +
 
 +
mpirun [other MPI options] /cluster/apps/local/ld_bind_now.sh ./my_mpi_program
 +
 
 +
For MVAPICH2:
  
 
  mpirun -genv LD_BIND_NOW 1 [other MPI options] ./my_mpi_program
 
  mpirun -genv LD_BIND_NOW 1 [other MPI options] ./my_mpi_program
  
and for Intel MPI the <tt>-env LD_BIND_NOW 1</tt> option:
+
and for Intel MPI:
 +
 
 
  mpirun -env LD_BIND_NOW 1 [other MPI options] ./my_mpi_program
 
  mpirun -env LD_BIND_NOW 1 [other MPI options] ./my_mpi_program
  
Line 37: Line 53:
 
You can enable the workaround in modules for software known to be affected by the bug. You are advised to rerun any calculations that may have been affected using the workaround or a different compiler.
 
You can enable the workaround in modules for software known to be affected by the bug. You are advised to rerun any calculations that may have been affected using the workaround or a different compiler.
  
===Why do you not revert back to CentOS 7.3?===
+
===Why don't you go back to CentOS 7.3?===
 
This version contains numerous security vulnerabilities that can only be fixed by upgrading to CentOS 7.4.
 
This version contains numerous security vulnerabilities that can only be fixed by upgrading to CentOS 7.4.

Latest revision as of 23:57, 29 January 2018

The update of CentOS 7.4 to address the Meltdown and Spectre vulnerabilities also rolled back the change in libc that exposed the bug in the Intel compiler. As the bug is no longer visible on Euler, the information on this page is obsolete and the workaround below is no longer necessary.

It has come to our attention that code compiled with most versions of the Intel compilers may result in unexpected numerical values when used under newer versions of glibc such as the one in CentOS 7.4, due to a bug in the Intel compiler. We are using this OS version only since the maintenance (October 2017). Jobs run before the maintenance (before 26 September 2017) are not affected .

What are the effects of the bug?

Code compiled with affected Intel compilers may give wrong results, NaN values, or abort.

Refer to Intel compiler not compatible with glibc 2.24-9 and newer and inconsistent program behavior on RedHat Enterprise Linux 7.4 if compiled with Intel

Who is affected? Are my jobs affected?

Anyone who uses software compiled with an Intel compiler. This includes:

  • self-compiled code
  • libraries that we provide
  • third-party software
  • commercial software including (but not limited to): Abaqus, ANSYS, CFX, Comsol, FLUENT, IMSL, Maple, Marc, Mathematica, Maxwell, Mentat, STAR-CCM+, STAR-CD
  • possibly (to be checked) applications linked against Intel MKL

Any job you have run since the maintenance (since October 2017) may be affected if it uses programs compiled with the Intel compilers. If you observed unexpected wrong results, NaN values, or numerics, then please contact cluster support.

Most of the software that we compile is compiled with the GNU C compiler. This compiler is not affected by the Intel compiler bug.

Is there a fix?

Although Intel originally said that version 2018.0 was not affected, that was apparently incorrect (or too optimistic).

Intel now claims that the problem will be fixed in the upcoming version 2018.0 update 1. We don't know when this version will be available.

Is there a workaround?

According to Intel compiler not compatible with glibc 2.24-9 and newer and inconsistent program behavior on RedHat Enterprise Linux 7.4 if compiled with Intel, a workaround for already compiled applications is to set:

export LD_BIND_NOW=1

before running the application. However, Euler users have reported that it does not work in all cases.

For programs using MPI, that variable must be passed to the executable after it is launched by the mpirun command. The syntax for Open MPI running on one node (up to 24 cores) is:

mpirun -x LD_BIND_NOW=1 [other MPI options] ./my_mpi_program

This will not work if you run on multiple nodes. In that case, you can use a wrapper script,

mpirun [other MPI options] /cluster/apps/local/ld_bind_now.sh ./my_mpi_program

For MVAPICH2:

mpirun -genv LD_BIND_NOW 1 [other MPI options] ./my_mpi_program

and for Intel MPI:

mpirun -env LD_BIND_NOW 1 [other MPI options] ./my_mpi_program

What can I do if I am affected?

You can enable the workaround in modules for software known to be affected by the bug. You are advised to rerun any calculations that may have been affected using the workaround or a different compiler.

Why don't you go back to CentOS 7.3?

This version contains numerous security vulnerabilities that can only be fixed by upgrading to CentOS 7.4.