Difference between revisions of "Intel compiler not compatible with CentOS 7.4"

From ScientificComputing
Jump to: navigation, search
(MPI solution)
(Is there a workaround?)
Line 27: Line 27:
 
  mpirun -x LD_BIND_NOW=1 [other MPI options] ./my_mpi_program
 
  mpirun -x LD_BIND_NOW=1 [other MPI options] ./my_mpi_program
  
and for MVAPICH2 the <tt>-genv LD_BIND_NOW 1</tt> option:
+
for MVAPICH2 the <tt>-genv LD_BIND_NOW 1</tt> option:
  
 
  mpirun -genv LD_BIND_NOW 1 [other MPI options] ./my_mpi_program
 
  mpirun -genv LD_BIND_NOW 1 [other MPI options] ./my_mpi_program
 +
 +
and for Intel MPI the <tt>-env LD_BIND_NOW 1</tt> option:
 +
mpirun -env LD_BIND_NOW 1 [other MPI options] ./my_mpi_program
  
 
===What can I do if I am affected?===
 
===What can I do if I am affected?===

Revision as of 10:37, 25 October 2017

It has come to our attention that code compiled with most versions of the Intel compilers may result in unexpected numerical values when used under newer versions of glibc such as the one in CentOS 7.4, due to a bug in the Intel compiler. We are using this OS version only since the maintenance (October 2017). Jobs run before the maintenance (before 26 September 2017) are not affected .

What are the effects of the bug?

Code compiled with affected Intel compilers may give wrong results, NaN values, or abort. Refer to Intel

Who is affected? Are my jobs affected?

Anyone who uses software compiled with an Intel compiler up to release 4 of Parallel Studio XE 2017. This includes:

  • self compiled code
  • libraries that we provide
  • pre-build software provided by vendors

Any job you have run since the maintenance (since October 2017) may be affected if it uses programs compiled with the Intel compilers. If you observed unexpected wrong results, NaN values, or numerics, then please contact cluster support.

Most of the software that we compile is compiled with the GNU C compiler. This compiler is not affected by the Intel compiler bug.

Is there a fix?

No. The only fix is to recompile the code with a version of the Intel compiler not affected by the bug. We now provide the newest Intel compiler release (Parallel Studio XE 2018), which has fixed the bug, in the intel/2018.0 module in the new section. Since it has only been released recently we are still in the processes of compiling the corresponding libraries such as Open MPI, OpenBLAS, etc.

Is there a workaround?

You can set the LD_BIND_NOW=1 environment variable,

export LD_BIND_NOW=1

which changes the behavior of the linker. This has been shown to break some programs, for example the Open MPI program. The solution is to pass the variable to the program executable after it is launched by the mpirun command. For Open MPI use the -x LD_BIND_NOW=1<tt> option:

mpirun -x LD_BIND_NOW=1 [other MPI options] ./my_mpi_program

for MVAPICH2 the <tt>-genv LD_BIND_NOW 1 option:

mpirun -genv LD_BIND_NOW 1 [other MPI options] ./my_mpi_program

and for Intel MPI the -env LD_BIND_NOW 1 option:

mpirun -env LD_BIND_NOW 1 [other MPI options] ./my_mpi_program

What can I do if I am affected?

You can enable the workaround in modules for software known to be affected by the bug. You are advised to rerun any calculations that may have been affected using the workaround or a different compiler.

Why do you not revert back to CentOS 7.3?

This version contains numerous security vulnerabilities that can only be fixed by upgrading to CentOS 7.4.