Getting started with GPUs
Contents
Introduction
There are GPU nodes in the Euler cluster. The GPU nodes are reserved exclusively to the shareholder groups that invested into them. Guest users and shareholder that purchase CPU nodes but no GPU nodes cannot use the GPU nodes.
CUDA and cuDNN
cuDNN versions provided are compiled for a particular CUDA version. We will soon add here a table with the compatible versions
How to submit a GPU job
All GPUs in Slurm are configured in non-exclusive process mode. For single node jobs, you can request a number of GPUs with the option --gpus=number of GPUs
sbatch --gpus=number of GPUs ...
For multi-node jobs, you can use the option --gpus-per-node=number of GPUs
sbatch --gpus-per-node=number of GPUs ...
or for example in a jobscript
#!/bin/bash #SBATCH --ntasks=8 #SBATCH --nodes=2 #SBATCH --gpus-per-node=1 command [argument]
This would request 2 nodes, each with 1 GPU and 4 CPU cores.
Sofware with GPU support
On Euler, packages with GPU support are only available in the new software stack. None of the packages in the old software stack on Euler has support for GPUs.
Available GPU node types
Euler
GPU Model | Slurm specifier | GPU per node | GPU memory per GPU | CPU cores per node | System memory per node | CPU cores per GPU | System memory per GPU | Compute capability | Minimal CUDA version required |
---|---|---|---|---|---|---|---|---|---|
NVIDIA GeForce GTX 1080 Ti | gtx_1080_ti | 8 | 11 GiB | 20 | 256 GiB | 2.5 | 32 GiB | 6.1 | 8.0 |
NVIDIA GeForce RTX 2080 Ti | rtx_2080_ti | 8 | 11 GiB | 36 | 384 GiB | 4.5 | 48 GiB | 7.5 | 10.0 |
NVIDIA GeForce RTX 2080 Ti | rtx_2080_ti | 8 | 11 GiB | 128 | 512 GiB | 16 | 64 GiB | 7.5 | 10.0 |
NVIDIA GeForce RTX 3090 | rtx_3090 | 8 | 24 GiB | 128 | 512 GiB | 16 | 64 GiB | 8.6 | 11.0 |
NVIDIA GeForce RTX 4090 | rtx_4090 | 8 | 24 GiB | 128 | 512 GiB | 16 | 64 GiB | 8.9 | 11.8 |
NVIDIA TITAN RTX | titan_rtx | 8 | 24 GiB | 128 | 512 GiB | 16 | 64 GiB | 7.5 | 10.0 |
NVIDIA Quadro RTX 6000 | quadro_rtx_6000 | 8 | 24 GiB | 128 | 512 GiB | 8 | 64 GiB | 7.5 | 10.0 |
NVIDIA Tesla V100-SXM2 32 GiB | v100 | 8 | 32 GiB | 48 | 768 GiB | 6 | 96 GiB | 7.0 | 9.0 |
NVIDIA Tesla V100-SXM2 32 GB | v100 | 8 | 32 GiB | 40 | 512 GiB | 5 | 64 GiB | 7.0 | 9.0 |
Nvidia Tesla A100 (40 GiB) | a100-pcie-40gb | 8 | 40 GiB | 48 | 768 GiB | 6 | 96 GiB | 8.0 | 11.0 |
Nvidia Tesla A100 (80 GiB) | a100_80gb | 10 | 80 GiB | 48 | 1024 GiB | 4.8 | 96 GiB | 8.0 | 11.0 |
How to select GPU memory
If you know that you will need more memory on a GPU than some models provide, i.e., more than 8 GB, then you can request that your job will run only on GPUs that have enough memory. Use the gpumem:XXg option, where XX is the amount of GPU memory in GB. For example, if you need 10 GB per GPU:
[sfux@eu-login-01 ~]$ sbatch --gpus=1 --gres=gpumem:10g ./my_cuda_program
This ensures your job will not run on GPUs with less than 10 GB of GPU memory.
How to select a GPU model
In some cases it is desirable or necessary to select the GPU model on which your job runs, for example if you know you code runs much faster on a newer model. However, you should consider that by narrowing down the list of allowable GPUs, your job may need to wait for a longer time.
To select a certain GPU model, use the --gpus=GPUMODEL:number resource requirement to bsub,
[sfux@eu-login-01 ~]$ sbatch --gpus=gtx_1080_ti:1 ./my_cuda_program
Python and GPUs
We provide separate Python modules (python/XXX and python_gpu/XXX) that point to the same Python installation. The python_gpu modules will in addition automatically load a CUDA, a CUDNN and an NCCL module.