CUDA 10 on Leonhard

From ScientificComputing
Revision as of 12:41, 21 September 2021 by Sfux (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page contains information about the Leonhard Open cluster, which is now obsolete as the cluster has been integrated into the Euler cluster on 14/15 September 2021


Introduction

Nvidia has released CUDA 10 end of February 2019. This new CUDA SDK release requires a sufficiently new GPU driver (>=410.48), which was the reason that CUDA 10 was not provided on Leonhard yet. We have now updated the GPU driver on all GPU nodes in Leonhard Open and installed the CUDA 10.0.130 SDK. In order to make use of the new CUDA 10 SDK, we provide a new Python installation (python_gpu/3.7.1) and installed the most recent versions of the common machine learning frameworks.

Modules

To use the new CUDA 10 release and cuDNN 7.6, please load the cuda/10.0.130 and the cudnn/7.5 module:

[sfux@lo-gtx-001 ~]$ module list 

Currently Loaded Modules:
  1) StdEnv   2) gcc/4.8.5

[sfux@lo-gtx-001 ~]$ module load cuda/10.0.130 cudnn/7.5
[sfux@lo-gtx-001 ~]$ module list

Currently Loaded Modules:
  1) StdEnv   2) gcc/4.8.5   3) cuda/10.0.130   4) cudnn/7.5 

If you load the python_gpu/3.7.1 module, then it will automatically load cuda/10.0.130 and cudnn/7.5

[sfux@lo-gtx-001 ~]$ module list 

Currently Loaded Modules:
  1) StdEnv   2) gcc/4.8.5

[sfux@lo-gtx-001 ~]$ module load python_gpu/3.7.1
[sfux@lo-gtx-001 ~]$ module list

Currently Loaded Modules:
  1) StdEnv      3) openblas/0.2.19   5) cudnn/7.5      7) jpeg/9b         9) python_gpu/3.7.1
  2) gcc/4.8.5   4) cuda/10.0.130     6) nccl/2.3.7-1   8) libpng/1.6.27

Available frameworks

  • Tensorflow 1.13.1
  • Scikit-learn 0.20.3
  • Keras 2.2.4
  • Theano 1.0.4
  • PyTorch 1.0.1
[sfux@lo-gtx-001 ~]$ module load python_gpu/3.7.1
[sfux@lo-gtx-001 ~]$ python
Python 3.7.1 (default, Mar 20 2019, 09:01:27) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>> tensorflow.__version__
'1.13.1'
>>> import sklearn
>>> sklearn.__version__
'0.20.3'
>>> import keras
Using TensorFlow backend.
>>> keras.__version__
'2.2.4'
>>> import theano
>>> theano.__version__
'1.0.4'
>>> import torch
>>> torch.__version__
'1.0.1.post2'
>>>

Tensorflow 1.13.1

The precompiled wheels for tensorflow 1.13.1 provided on pypi will not work on Leonhard, as it does not support CentOS 7.5 (it requires a newer libc). We have therefore compiled tensorflow 1.13.1 with support for CUDA 10.0.130, cuDNN 7.5. The code has been optimized for AVX2 (CPU part) and with regards to GPU architectures (compute 61, 70 and 75). It is therefore optimized for the GeForce 1080 Ti, our new DGX-1 (Tesla V100) as well as the coming generation of GPUs (RTX 2080 and similar).