GPU job submission with SLURM
Figure: Here is an example of CPU & GPU system architecture. There are several system architectures on the cluster. |
To use the GPUs for a job node you need to request the -G or --gpus resource. For example, to run a serial job with one GPU : $ sbatch -G 1 --wrap="./my_cuda_program" How to select GPU memoryIf you know that you will need more memory on a GPU than some models provide, i.e., more than 8 GB, then you can request that your job will run only on GPUs that have enough memory. Use the --gres=gpumem: option to do this. For example, if you need 10 GB (=10240 MB) per GPU: $ sbatch -G 1 --gres=gpumem:10g --wrap="./my_cuda_program" This ensures your job will not run on GPUs with less than 10 GB of GPU memory. The default unit for the gpumem option is bytes. You are therefore advised to specify units, for example 20g or 11000m. How to select a GPU modelIn some cases it is desirable or necessary to select the GPU model on which your job runs, for example if you know you code runs much faster on a newer model. However, you should consider that by narrowing down the list of allowable GPUs, your job may need to wait for a longer time. To select a certain GPU model, add the --gpus=slurm_specifierl:N resource requirement to sbatch, where the SLURM specifier for the GPU models is detailed in the table below, and N is the number of requested GPUs. $ sbatch -G 1 --gpus=gtx_1080:1 --wrap="./my_cuda_program" programs. |
Available GPU node types
GPU Model | Slurm specifier | GPU per node | GPU memory per GPU | CPU cores per node | System memory per node | CPU cores per GPU | System memory per GPU | Compute capability | Minimal CUDA version required |
---|---|---|---|---|---|---|---|---|---|
NVIDIA GeForce RTX 2080 Ti | rtx_2080_ti | 8 | 11 GiB | 36 | 384 GiB | 4.5 | 48 GiB | 7.5 | 10.0 |
NVIDIA GeForce RTX 2080 Ti | rtx_2080_ti | 8 | 11 GiB | 128 | 512 GiB | 16 | 64 GiB | 7.5 | 10.0 |
NVIDIA GeForce RTX 3090 | rtx_3090 | 8 | 24 GiB | 128 | 512 GiB | 16 | 64 GiB | 8.6 | 11.0 |
NVIDIA GeForce RTX 4090 | rtx_4090 | 8 | 24 GiB | 128 | 512 GiB | 16 | 64 GiB | 8.9 | 11.8 |
NVIDIA TITAN RTX | titan_rtx | 8 | 24 GiB | 128 | 512 GiB | 16 | 64 GiB | 7.5 | 10.0 |
NVIDIA Quadro RTX 6000 | quadro_rtx_6000 | 8 | 24 GiB | 128 | 512 GiB | 8 | 64 GiB | 7.5 | 10.0 |
NVIDIA Tesla V100-SXM2 32 GiB | v100 | 8 | 32 GiB | 48 | 768 GiB | 6 | 96 GiB | 7.0 | 9.0 |
NVIDIA Tesla V100-SXM2 32 GB | v100 | 8 | 32 GiB | 40 | 512 GiB | 5 | 64 GiB | 7.0 | 9.0 |
Nvidia Tesla A100 (40 GiB) | a100-pcie-40gb | 8 | 40 GiB | 48 | 768 GiB | 6 | 96 GiB | 8.0 | 11.0 |
Nvidia Tesla A100 (80 GiB) | a100_80gb | 10 | 80 GiB | 48 | 1024 GiB | 4.8 | 96 GiB | 8.0 | 11.0 |
Example
Further reading