Shared memory parallelization

From ScientificComputing
Jump to: navigation, search

OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

OpenMP uses a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the standard desktop computer to the supercomputer.

If your application is parallelized using OpenMP or linked against a library using OpenMP (Intel MKL, OpenBLAS, etc.), the number of cores (or threads) that it can use is controlled by the environment variable OMP_NUM_THREADS. This variable must be set before you submit your job:

export OMP_NUM_THREADS=number_of_cores
sbatch --ntasks=1 --cpus-per-task=number_of_cores ...

Please note that for OpenMP, you request --ntasks=1 and then request the number of cores through the sbatch option --cpus-per-task.

NOTE: if OMP_NUM_THREADS is not set, your application will either use one core only, or will attempt to use all cores that it can find, stealing' them from other jobs if needed. In other words, your job will either use too few or too many cores.

Pthreads and other threaded applications

Their behavior is similar to OpenMP applications. It is important to limit the number of threads that the application spawns. There is no standard way to do this, so be sure to check the application's documentation on how to do this. Usually a program supports at least one of four ways to limit itself to N threads:

  • it understands the OMP_NUM_THREADS=N environment variable,
  • it has its own environment variable, such as GMX_NUM_THREADS=N for Gromacs,
  • it has a command-line option, such as -nt N (for Gromacs), or
  • it has an input-file option, such as num_threads N.

If you are unsure about the program's behavior, please contact us and we will analyze it.

Tips and Tricks

GNU libgomp

Virtually all OpenMP programs compiling using the GNU compilers use the GNU libgomp library. As such, some environment variables may make your program run faster or they may make it run much worse. It is safer not to use them than to use them without testing whether they help or not.

OMP_PROC_BIND
Binds threads to cores. This option is relatively safe to use. Some programs may run much slower with this option. To use, set the OMP_PROC_BIND environment variable to true before submitting the job (or in a job script):
export OMP_PROC_BIND=true
bsub -n 4 my_program
GOMP_CPU_AFFINITY
Binds specific threads to specific cores. Some programs may run much slower with this option. This is strongly discouraged: in most cases the OMP_PROC_BIND option is sufficient. To use it anyway, set the GOMP_CPU_AFFINITY environment variable in a job script (e.g., my_script.sh) according to the cores assigned by LSF:
#!/bin/bash
export GOMP_CPU_AFFINITY="${LSB_BIND_CPU_LIST//,/ }"
my_program
and submit the script to LSF:
bsub -n 4 < my_script.sh

Intel OpenMP library

Use the KMP_AFFINITY environment variable to control affinity. For example,

KMP_AFFINITY=compact

Refer to the Intel compiler documentation for details and other options.