Difference between revisions of "GPU job submission"
From ScientificComputing
Line 26: | Line 26: | ||
While your jobs will see all GPUs, LSF will set the [https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/ CUDA_VISIBLE_DEVICES] environment variable, which is honored by CUDA programs. | While your jobs will see all GPUs, LSF will set the [https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/ CUDA_VISIBLE_DEVICES] environment variable, which is honored by CUDA programs. | ||
+ | == Example == | ||
+ | * [[Neural network training with TensorFlow on GPU]] | ||
== Further reading == | == Further reading == | ||
* [[Getting started with GPUs]] | * [[Getting started with GPUs]] |
Revision as of 20:31, 14 June 2021
To use the GPUs for a job node you need to request the ngpus_excl_p resource. It refers to the number of GPUs per node. This is unlike other resources, which are requested per core.
For example, to run a serial job with one GPU,
bsub -R "rusage[ngpus_excl_p=1]" ./my_cuda_program
or on a full node with all 8 GeForce GTX 1080 Ti GPUs and up to 90 GB of RAM,
bsub -n 20 -R "rusage[mem=4500,ngpus_excl_p=8]" -R "select[gpu_model0==GeForceGTX1080Ti]" ./my_cuda_program
or on two full nodes:
bsub -n 40 -R "rusage[mem=4500,ngpus_excl_p=8]" -R "select[gpu_model0==GeForceGTX1080Ti]" -R "span[ptile=20]" ./my_cuda_program
While your jobs will see all GPUs, LSF will set the CUDA_VISIBLE_DEVICES environment variable, which is honored by CUDA programs.
Example
Further reading