Difference between revisions of "Getting started with GPUs"
(→Tensorflow 1.x example) |
(→Python and GPUs) |
||
Line 97: | Line 97: | ||
Please note, that your job will crash if you are running the GPU version of TensorFlow on a CPU node, because TensorFlow is checking on start up if the compute node has a GPU driver. | Please note, that your job will crash if you are running the GPU version of TensorFlow on a CPU node, because TensorFlow is checking on start up if the compute node has a GPU driver. | ||
− | |||
===Tensorflow 2.x example=== | ===Tensorflow 2.x example=== | ||
− | Tensorflow 2.x does no longer use sessions. Please find below an updated example job for tensorflow 2.0. | + | Tensorflow 2.x does no longer use sessions. Please find below an updated example job for tensorflow 2.0, where we create two 2000x2000 Matrices with random numbers and then carry out a matrix multiplication once on the CPU and once on the GPU and then compare the run times. |
− | [sfux@lo-login- | + | [sfux@lo-login-01 ~]$ '''cd testrun/tf/test2''' |
+ | [sfux@lo-login-01 test2]$ '''module load gcc/6.3.0 python_gpu/3.7.4''' | ||
The following have been reloaded with a version change: | The following have been reloaded with a version change: | ||
1) gcc/4.8.5 => gcc/6.3.0 | 1) gcc/4.8.5 => gcc/6.3.0 | ||
− | [sfux@lo-login- | + | [sfux@lo-login-01 test2]$ '''cat tf2test.py''' |
#/usr/bin/env python | #/usr/bin/env python | ||
Line 113: | Line 113: | ||
import tensorflow as tf | import tensorflow as tf | ||
− | + | k = 2000 | |
− | + | a = tf.random.uniform(shape=[k,k], minval=0, maxval=20,dtype=tf.float16) | |
+ | b = tf.random.uniform(shape=[k,k], minval=0, maxval=20,dtype=tf.float16) | ||
cpu_slot = 0 | cpu_slot = 0 | ||
− | gpu_slot = 0 | + | gpu_slot = 0 |
# Using CPU at slot 0 | # Using CPU at slot 0 | ||
with tf.device('/CPU:' + str(cpu_slot)): | with tf.device('/CPU:' + str(cpu_slot)): | ||
− | |||
start = time.time() | start = time.time() | ||
− | + | c1 = tf.matmul(a,b) | |
− | + | print("Time on CPU:") | |
− | |||
− | print( | ||
− | |||
− | |||
end = time.time() - start | end = time.time() - start | ||
print(end) | print(end) | ||
Line 134: | Line 130: | ||
# Using the GPU at slot 0 | # Using the GPU at slot 0 | ||
with tf.device('/GPU:' + str(gpu_slot)): | with tf.device('/GPU:' + str(gpu_slot)): | ||
− | |||
start = time.time() | start = time.time() | ||
− | + | c2 = tf.matmul(a,b) | |
− | + | print("Time on GPU:") | |
− | |||
− | print( | ||
− | |||
− | |||
end = time.time() - start | end = time.time() - start | ||
print(end) | print(end) | ||
− | [sfux@lo-login- | + | [sfux@lo-login-01 test2]$ '''bsub -n 1 -W 4:00 -R "rusage[mem=2048, ngpus_excl_p=1]" python tf2test.py''' |
Generic job. | Generic job. | ||
− | Job < | + | Job <5074756> is submitted to queue <gpu.4h>. |
− | [sfux@lo-login- | + | [sfux@lo-login-01 test2]$ '''bjobs''' |
− | -- | + | JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME |
+ | 5074756 sfux PEND gpu.4h lo-login-01 *f2test.py Mar 5 12:28 | ||
+ | [sfux@lo-login-01 test2]$ '''bjobs''' | ||
+ | JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME | ||
+ | 5074756 sfux RUN gpu.4h lo-login-01 lo-s4-082 *f2test.py Mar 5 12:28 | ||
+ | [sfux@lo-login-01 test2]$ '''bjobs''' | ||
+ | No unfinished job found | ||
+ | [sfux@lo-login-01 test2]$ '''grep -A1 "Time on" lsf.o5074756''' | ||
+ | Time on CPU: | ||
+ | 63.97628474235535 | ||
+ | Time on GPU: | ||
+ | 0.4504997730255127 | ||
+ | [sfux@lo-login-01 test2]$ |
Revision as of 13:13, 5 March 2020
Contents
Introduction
Currently we only provide GPUs in the Leonhard Cluster, where access is restricted to Shareholders. Therefore the instructions on this wiki page are only referring to the Leonhard cluster.
How to submit a GPU job
All GPUs in Leonhard are configured in Exclusive Process mode. To run multi-node job, you will need to request span[ptile=XX] with XX being the number of CPU cores per GPU node, which is depending on the node type (the node types are listed in the table below).
The LSF batch system has partial integrated support for GPUs. To use the GPUs for a job node you need to request the ngpus_excl_p resource. It refers to the number of GPUs per node. This is unlike other resources, which are requested per core.
For example, to run a serial job with one GPU,
bsub -R "rusage[ngpus_excl_p=1]" ./my_cuda_program
or on a full node with all 8 GeForce GTX 1080 Ti GPUs and up to 90 GB of RAM,
bsub -n 20 -R "rusage[mem=4500,ngpus_excl_p=8]" -R "select[gpu_model0==GeForceGTX1080Ti]" ./my_cuda_program
or on two full nodes:
bsub -n 40 -R "rusage[mem=4500,ngpus_excl_p=8] -R "select[gpu_model0==GeForceGTX1080Ti]" span[ptile=20]" ./my_cuda_program
While your jobs will see all GPUs, LSF will set the CUDA_VISIBLE_DEVICES environment variable, which is honored by CUDA programs.
Available GPU node types
GPU Model | Specifier | GPU memory per GPU | CPU cores per node | CPU memory per node |
---|---|---|---|---|
NVIDIA GeForce GTX 1080 | GeForceGTX1080 | 8 GiB | 20 | 256 GiB |
NVIDIA GeForce GTX 1080 Ti | GeForceGTX1080Ti | 11 GiB | 20 | 256 GiB |
NVIDIA GeForce RTX 2080 Ti | GeForceRTX2080Ti | 11 GiB | 36 | 384 GiB |
NVIDIA Tesla V100-SXM2 32 GB | TeslaV100_SXM2_32GB | 32 GiB | 40 | 512 GiB |
How to select GPU memory
If you know that you will need more memory on a GPU than some models provide, i.e., more than 8 GB, then you can request that your job will run only on GPUs that have enough memory. Use the gpu_mtotal0 host selection to do this. For example, if you need 10 GB (=10240 MB) per GPU:
[sfux@lo-login-01 ~]$ bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_mtotal0>=10240]" ./my_cuda_program
This ensures your job will not run on GPUs with less than 10 GB of GPU memory.
How to select a GPU model
In some cases it is desirable or necessary to select the GPU model on which your job runs, for example if you know you code runs much faster on a newer model. However, you should consider that by narrowing down the list of allowable GPUs, your job may need to wait for a longer time.
To select a certain GPU model, add the -R "select[gpu_model1==GPU_MODEL]" resource requirement to bsub,
[sfux@lo-login-01 ~]$ bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==GeForceGTX1080]" ./my_cuda_program
Python and GPUs
Because some Python packages need different installations for their CPU and GPU versions, we decided to have separate Python installations with regards to using CPUs and GPUs. For instance running the GPU version of TensorFlow on a CPU node will immediately crash, because TensorFlow is checking on start up if the compute node has a GPU driver. From TensorFlow 2.0.0 on, it is possible to have one installation for both, CPU and GPU.
For an overview on the available Python and TensorFlow versions, please have a look at Python on Leonhard
CPU version | GPU version |
---|---|
module load python_cpu/3.6.1 | module load python_gpu/3.6.1 |
Tensorflow 1.x example
As an example for running a TensorFlow job on a GPU node, we are printing out the TensorFlow version, the string Hello TensorFlow! and the result of a simple matrix multiplication:
[sfux@lo-login-01 ~]$ cd testrun/python [sfux@lo-login-01 python]$ module load python_gpu/2.7.13 [sfux@lo-login-01 python]$ cat tftest1.py #/usr/bin/env python from __future__ import print_function import tensorflow as tf vers = tf.__version__ print(vers) hello = tf.constant('Hello, TensorFlow!') matrix1 = tf.constant([[3., 3.]]) matrix2 = tf.constant([[2.],[2.]]) product = tf.matmul(matrix1, matrix2) sess = tf.Session() print(sess.run(hello)) print(sess.run(product)) sess.close() [sfux@lo-login-01 python]$ bsub -n 1 -W 4:00 -R "rusage[mem=2048, ngpus_excl_p=1]" python tftest1.py Generic job. Job <10620> is submitted to queue <gpu.4h>. [sfux@lo-login-01 python]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 10620 sfux PEND gpu.4h lo-login-01 *tftest.py Sep 28 08:02 [sfux@lo-login-01 python]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 10620 sfux RUN gpu.4h lo-login-01 lo-gtx-001 *ftest1.py Sep 28 08:03 [sfux@lo-login-01 python]$ bjobs No unfinished job found [sfux@lo-login-01 python]$ grep -A3 "Creating TensorFlow device" lsf.o10620 2017-09-28 08:08:43.235886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:04:00.0) 1.3.0 Hello, TensorFlow! 12. [sufx@lo-login-01 python]$
Please note, that your job will crash if you are running the GPU version of TensorFlow on a CPU node, because TensorFlow is checking on start up if the compute node has a GPU driver.
Tensorflow 2.x example
Tensorflow 2.x does no longer use sessions. Please find below an updated example job for tensorflow 2.0, where we create two 2000x2000 Matrices with random numbers and then carry out a matrix multiplication once on the CPU and once on the GPU and then compare the run times.
[sfux@lo-login-01 ~]$ cd testrun/tf/test2 [sfux@lo-login-01 test2]$ module load gcc/6.3.0 python_gpu/3.7.4 The following have been reloaded with a version change: 1) gcc/4.8.5 => gcc/6.3.0 [sfux@lo-login-01 test2]$ cat tf2test.py #/usr/bin/env python import time import tensorflow as tf k = 2000 a = tf.random.uniform(shape=[k,k], minval=0, maxval=20,dtype=tf.float16) b = tf.random.uniform(shape=[k,k], minval=0, maxval=20,dtype=tf.float16) cpu_slot = 0 gpu_slot = 0 # Using CPU at slot 0 with tf.device('/CPU:' + str(cpu_slot)): start = time.time() c1 = tf.matmul(a,b) print("Time on CPU:") end = time.time() - start print(end) # Using the GPU at slot 0 with tf.device('/GPU:' + str(gpu_slot)): start = time.time() c2 = tf.matmul(a,b) print("Time on GPU:") end = time.time() - start print(end) [sfux@lo-login-01 test2]$ bsub -n 1 -W 4:00 -R "rusage[mem=2048, ngpus_excl_p=1]" python tf2test.py Generic job. Job <5074756> is submitted to queue <gpu.4h>. [sfux@lo-login-01 test2]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 5074756 sfux PEND gpu.4h lo-login-01 *f2test.py Mar 5 12:28 [sfux@lo-login-01 test2]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 5074756 sfux RUN gpu.4h lo-login-01 lo-s4-082 *f2test.py Mar 5 12:28 [sfux@lo-login-01 test2]$ bjobs No unfinished job found [sfux@lo-login-01 test2]$ grep -A1 "Time on" lsf.o5074756 Time on CPU: 63.97628474235535 Time on GPU: 0.4504997730255127 [sfux@lo-login-01 test2]$