Using hyperthreading

From ScientificComputing
Revision as of 11:15, 21 June 2018 by Urbanb (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Hyper-threading is Intel's proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multiple tasks at once) performed on x86 microprocessors.

For each processor core that is physically present, the operating system addresses two virtual (logical) cores and shares the workload between them when possible. The main function of hyper-threading is to increase the number of independent instructions in the pipeline; it takes advantage of superscalar architecture, in which multiple instructions operate on separate data in parallel. With HTT, one physical core appears as two processors to the operating system, allowing concurrent scheduling of two processes per core.

Hyperthreading on Euler

The operating system will now see 48 logical cores on a 24-core node. The batch system will also see these logical cores, but will continue to use physical cores when allocating resources to batch jobs. A job requesting 1 (physical) core will thus get two logical cores and will be able to execute two threads simultaneously — if the application supports it.

Relation to LSF job slots

LSF is aware of hyperthreading so there is no change to how jobs are assigned to physical cores. This means there continue to be 24 job slots on the 24 cores of an Euler I or II node. The slots, however, are assigned to both virtual cores of a physical core.

All of the supported MPI libraries we provide are also aware of hyperthreading and continue to schedule only one rank (MPI processes) to an individual physical core.

Using hyperthreading

In those cases where you are running a loosely-coupled parallel program, you can make use of hyperthreading to let twice as many processes run as you have requested cores. While each individual processes will run slower, the time-to-solution will probably be faster than if they sequentially one after the other.

To ensure your job will run only on nodes with hyperthreading enabled, use the -R "select[nthreads==2]" bsub option:

bsub -R "select[nthreads==2]" ...     # request nodes where HT is enabled (2 threads per core)