Difference between revisions of "Getting started with GPUs"

From ScientificComputing
Jump to: navigation, search
(Adds info about the DGX1)
 
(37 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
==Introduction==
 
==Introduction==
Currently we only provide GPUs in the Leonhard Cluster, where access is restricted to Shareholders. Therefore the instructions on this wiki page are only referring to the Leonhard cluster.
+
There are GPU nodes in the Euler cluster. The GPU nodes are reserved exclusively to the shareholder groups that invested into them. Guest users and shareholder that purchase CPU nodes but no GPU nodes cannot use the GPU nodes.
 +
 
 +
==CUDA and cuDNN==
 +
cuDNN versions provided are compiled for a particular CUDA version. We will soon add here a table with the compatible versions
  
 
==How to submit a GPU job==
 
==How to submit a GPU job==
All GPUs in Leonhard are configured in Exclusive Process mode. The GPU nodes have 20&nbsp;cores, 8&nbsp;GPUs, and 256&nbsp;GB of RAM (of which only about 210&nbsp;GB is usable). To run multi-node job, you will need to request <tt>span[ptile=20]</tt>.
+
All GPUs in Slurm are configured in non-exclusive process mode. For single node jobs, you can request a number of GPUs with the option <tt>--gpus=''number of GPUs''</tt>
 
 
The LSF batch system has partial integrated support for GPUs. To use the GPUs for a job node you need to request the '''ngpus_excl_p''' resource. It refers to the number of GPUs '''per node'''. This is unlike other resources, which are requested '''per core'''.
 
  
For example, to run a serial job with one GPU,
+
  sbatch --gpus=''number of GPUs'' ...
  bsub -R "rusage[ngpus_excl_p=1]" ./my_cuda_program
 
or on a full node with all eight GPUs and up to 90&nbsp;GB of RAM,
 
bsub -n 20 -R "rusage[mem=4500,ngpus_excl_p=8]" ./my_cuda_program
 
or on two full nodes:
 
bsub -n 40 -R "rusage[mem=4500,ngpus_excl_p=8] span[ptile=20]" ./my_cuda_program
 
  
While your jobs will see all GPUs, LSF will set the [https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/ CUDA_VISIBLE_DEVICES] environment variable, which is honored by CUDA programs.
+
For multi-node jobs, you can use the option <tt>--gpus-per-node=''number of GPUs''</tt>
  
==Python and GPUs==
+
sbatch --gpus-per-node=''number of GPUs'' ...
Because certain Python packages need different installations for their CPU and GPU versions, we decided to have separate Python installations with regards to using CPUs and GPUs. For instance running the GPU version of TensorFlow on a CPU node will immediately crash, because TensorFlow is checking on start up if the compute node has a GPU driver.
 
  
{|class="wikitable" border=1 style="width: 65%;"
+
or for example in a jobscript
! CPU version !! GPU version
 
|-
 
|module load python_cpu/3.6.1 || module load python_gpu/3.6.1
 
|}
 
  
===Tensorflow example===
+
  #!/bin/bash
As an example for running a TensorFlow job on a GPU node, we are printing out the TensorFlow version, the string '''Hello TensorFlow!''' and the result of a simple matrix multiplication:
 
 
 
[leonhard@lo-login-01 ~]$ '''cd testrun/python'''
 
[leonhard@lo-login-01 python]$ '''module load python_gpu/2.7.13'''
 
[leonhard@lo-login-01 python]$ '''cat tftest1.py'''
 
  #/usr/bin/env python
 
from __future__ import print_function
 
import tensorflow as tf
 
 
   
 
   
  vers = tf.__version__
+
  #SBATCH --ntasks=8
  print(vers)
+
  #SBATCH --nodes=2
hello = tf.constant('Hello, TensorFlow!')
+
  #SBATCH --gpus-per-node=1
<nowiki>matrix1 = tf.constant([[3., 3.]])</nowiki>
 
matrix2 = tf.constant([[2.],[2.]])
 
  product = tf.matmul(matrix1, matrix2)
 
 
   
 
   
  sess = tf.Session()
+
  ''command [argument]''
print(sess.run(hello))
+
 
print(sess.run(product))
+
This would request 2 nodes, each with 1 GPU and 4 CPU cores.
sess.close()
+
 
[leonhard@lo-login-01 python]$ '''bsub -n 1 -W 4:00 -R "rusage[mem=2048, ngpus_excl_p=1]" python tftest1.py'''
+
==Sofware with GPU support==
Generic job.
+
On Euler, packages with GPU support are only available in the [[Euler_applications_and_libraries|new software stack]]. None of the packages in the old software stack on Euler has support for GPUs.
Job <10620> is submitted to queue <gpu.4h>.
 
[leonhard@lo-login-01 python]$ '''bjobs'''
 
JOBID      USER      STAT  QUEUE      FROM_HOST  EXEC_HOST  JOB_NAME  SUBMIT_TIME
 
10620      leonhard  PEND  gpu.4h    lo-login-01            *tftest.py Sep 28 08:02
 
[leonhard@lo-login-01 python]$ '''bjobs'''
 
JOBID      USER      STAT  QUEUE      FROM_HOST  EXEC_HOST  JOB_NAME  SUBMIT_TIME
 
10620      leonhard  RUN  gpu.4h    lo-login-01 lo-gtx-001  *ftest1.py Sep 28 08:03
 
[leonhard@lo-login-01 python]$ '''bjobs'''
 
No unfinished job found
 
[leonhard@lo-login-01 python]$ '''grep -A3 "Creating TensorFlow device" lsf.o10620'''
 
2017-09-28 08:08:43.235886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:04:00.0)
 
1.3.0
 
Hello, TensorFlow!
 
[[ 12.]]
 
[leonhard@lo-login-01 python]$
 
  
Please note, that your job will crash if you are running the GPU version of TensorFlow on a CPU node, because TensorFlow is checking on start up if the compute node has a GPU driver.
+
==Available GPU node types==
 +
===Euler===
 +
{{GPUTable}}
  
 
== How to select GPU memory ==
 
== How to select GPU memory ==
 +
If you know that you will need more memory on a GPU than some models provide, <em>i.e.,</em> more than 8&nbsp;GB, then you can request that your job will run only on GPUs that have enough memory. Use the <tt>gpumem:''XX''g</tt> option, where ''XX'' is the amount of GPU memory in GB. For example, if you need 10&nbsp;GB per&nbsp;GPU:
  
If you know that you will need more memory on a GPU than some models provide, <em>i.e.,</em> more than 8&nbsp;GB, then you can request that your job will run only on GPUs that have enough memory. Use the <tt>gpu_mtotal0</tt> host selection to do this. For example, if you need 10&nbsp;GB (=10240&nbsp; MB) per&nbsp;GPU:
+
   [sfux@eu-login-01 ~]$ '''sbatch --gpus=1 --gres=gpumem:10g ./my_cuda_program'''
 
 
   [leonhard@lo-login-01 ~]$ '''bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_mtotal0>=10240]" ./my_cuda_program'''
 
  
 
This ensures your job will not run on GPUs with less than 10&nbsp;GB of GPU memory.
 
This ensures your job will not run on GPUs with less than 10&nbsp;GB of GPU memory.
  
 
== How to select a GPU model ==
 
== How to select a GPU model ==
 
 
In some cases it is desirable or necessary to select the GPU model on which your job runs, for example if you know you code runs much faster on a newer model. However, you should consider that by narrowing down the list of allowable GPUs, your job may need to wait for a longer time.
 
In some cases it is desirable or necessary to select the GPU model on which your job runs, for example if you know you code runs much faster on a newer model. However, you should consider that by narrowing down the list of allowable GPUs, your job may need to wait for a longer time.
  
To select a certain GPU model, add the <tt>-R "select[gpu_model1==GPU_MODEL]"</tt> resource requirement to bsub,
+
To select a certain GPU model, use the <tt>--gpus=GPUMODEL:number</tt> resource requirement to bsub,
  
  [leonhard@lo-login-01 ~]$ '''bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==GeForceGTX1080]" ./my_cuda_program'''
+
  [sfux@eu-login-01 ~]$ '''sbatch --gpus=gtx_1080_ti:1 ./my_cuda_program'''
  
The list of possible GPU models you can specify are
+
==Python and GPUs==
{| class="wikitable"
+
We provide separate Python modules (python/XXX and python_gpu/XXX) that point to the same Python installation. The python_gpu modules will in addition automatically load a CUDA, a CUDNN and an NCCL module.
|-
 
! GPU Model !! Specifier
 
|-
 
| NVIDIA GeForce GTX 1080 || <tt>GeForceGTX1080</tt>
 
|-
 
| NVIDIA GeForce GTX 1080 Ti || <tt>GeForceGTX1080Ti</tt>
 
|-
 
| [[Nvidia_DGX-1_with_Tensor_Cores|NVIDIA Tesla V100-SXM2 32 GB]] || <tt>TeslaV100_SXM2_32GB</tt>
 
|}
 

Latest revision as of 07:14, 24 February 2023

Introduction

There are GPU nodes in the Euler cluster. The GPU nodes are reserved exclusively to the shareholder groups that invested into them. Guest users and shareholder that purchase CPU nodes but no GPU nodes cannot use the GPU nodes.

CUDA and cuDNN

cuDNN versions provided are compiled for a particular CUDA version. We will soon add here a table with the compatible versions

How to submit a GPU job

All GPUs in Slurm are configured in non-exclusive process mode. For single node jobs, you can request a number of GPUs with the option --gpus=number of GPUs

sbatch --gpus=number of GPUs ...

For multi-node jobs, you can use the option --gpus-per-node=number of GPUs

sbatch --gpus-per-node=number of GPUs ...

or for example in a jobscript

#!/bin/bash

#SBATCH --ntasks=8
#SBATCH --nodes=2
#SBATCH --gpus-per-node=1

command [argument]

This would request 2 nodes, each with 1 GPU and 4 CPU cores.

Sofware with GPU support

On Euler, packages with GPU support are only available in the new software stack. None of the packages in the old software stack on Euler has support for GPUs.

Available GPU node types

Euler

GPU Model Slurm specifier GPU per node GPU memory per GPU CPU cores per node System memory per node CPU cores per GPU System memory per GPU Compute capability Minimal CUDA version required
NVIDIA GeForce GTX 1080 Ti gtx_1080_ti 8 11 GiB 20 256 GiB 2.5 32 GiB 6.1 8.0
NVIDIA GeForce RTX 2080 Ti rtx_2080_ti 8 11 GiB 36 384 GiB 4.5 48 GiB 7.5 10.0
NVIDIA GeForce RTX 2080 Ti rtx_2080_ti 8 11 GiB 128 512 GiB 16 64 GiB 7.5 10.0
NVIDIA GeForce RTX 3090 rtx_3090 8 24 GiB 128 512 GiB 16 64 GiB 8.6 11.0
NVIDIA GeForce RTX 4090 rtx_4090 8 24 GiB 128 512 GiB 16 64 GiB 8.9 11.8
NVIDIA TITAN RTX titan_rtx 8 24 GiB 128 512 GiB 16 64 GiB 7.5 10.0
NVIDIA Quadro RTX 6000 quadro_rtx_6000 8 24 GiB 128 512 GiB 8 64 GiB 7.5 10.0
NVIDIA Tesla V100-SXM2 32 GiB v100 8 32 GiB 48 768 GiB 6 96 GiB 7.0 9.0
NVIDIA Tesla V100-SXM2 32 GB v100 8 32 GiB 40 512 GiB 5 64 GiB 7.0 9.0
Nvidia Tesla A100 (40 GiB) a100-pcie-40gb 8 40 GiB 48 768 GiB 6 96 GiB 8.0 11.0
Nvidia Tesla A100 (80 GiB) a100_80gb 10 80 GiB 48 1024 GiB 4.8 96 GiB 8.0 11.0

How to select GPU memory

If you know that you will need more memory on a GPU than some models provide, i.e., more than 8 GB, then you can request that your job will run only on GPUs that have enough memory. Use the gpumem:XXg option, where XX is the amount of GPU memory in GB. For example, if you need 10 GB per GPU:

 [sfux@eu-login-01 ~]$ sbatch --gpus=1 --gres=gpumem:10g ./my_cuda_program

This ensures your job will not run on GPUs with less than 10 GB of GPU memory.

How to select a GPU model

In some cases it is desirable or necessary to select the GPU model on which your job runs, for example if you know you code runs much faster on a newer model. However, you should consider that by narrowing down the list of allowable GPUs, your job may need to wait for a longer time.

To select a certain GPU model, use the --gpus=GPUMODEL:number resource requirement to bsub,

[sfux@eu-login-01 ~]$ sbatch --gpus=gtx_1080_ti:1 ./my_cuda_program

Python and GPUs

We provide separate Python modules (python/XXX and python_gpu/XXX) that point to the same Python installation. The python_gpu modules will in addition automatically load a CUDA, a CUDNN and an NCCL module.