Python on Leonhard

From ScientificComputing
Jump to: navigation, search

Introduction

Because certain Python packages need different installations for their CPU and GPU versions, we decided to have separate Python installations with regards to using CPUs and GPUs.

CPU version GPU version
module load python_cpu/3.6.1 module load python_gpu/3.6.1

TensorFlow

On Leonhard, we provide several versions of TensorFlow. The following combinations are available:

CPU
Module command TensorFlow version
module load python_cpu/2.7.12 Python 2.7.12, TensorFlow 1.2.1
module load python_cpu/2.7.13 Python 2.7.13, TensorFlow 1.3
module load python_cpu/2.7.14 Python 2.7.14, TensorFlow 1.7
module load python_cpu/3.6.0 Python 3.6.0, TensorFlow 1.2.1
module load python_cpu/3.6.1 Python 3.6.1, TensorFlow 1.3
module load python_cpu/3.6.4 Python 3.6.4, TensorFlow 1.7
module load python_cpu/3.7.1 Python 3.7.1, TensorFlow 1.13.1
GPU
Module command TensorFlow version
module load python_gpu/2.7.12 Python 2.7.12, TensorFlow 1.2.1, CUDA 8.0.61, cuDNN 5.1
module load python_gpu/2.7.13 Python 2.7.13, TensorFlow 1.3, CUDA 8.0.61, cuDNN 6.0
module load python_gpu/2.7.14 Python 2.7.14, TensorFlow 1.7, CUDA 9.0.176, cuDNN 7.0
module load python_gpu/3.6.0 Python 3.6.0, TensorFlow 1.2.1, CUDA 8.0.61, cuDNN 5.1
module load python_gpu/3.6.1 Python 3.6.1, TensorFlow 1.3, CUDA 8.0.61, cuDNN 6.0
module load python_gpu/3.6.4 Python 3.6.4, TensorFlow 1.7, CUDA 9.0.176, cuDNN 7.0
module load python_gpu/3.7.1 Python 3.7.1, TensorFlow 1.13.1, CUDA 10.0.130, cuDNN 7.5

If you would like to run a TensorFlow job on a CPU node, then you would need to load a CPU version of TensorFlow, whereas you would need to load a GPU version of TensorFlow for running a TensorFlow job on a GPU node.

Troubleshooting

ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

libcuda.so.1 is part of the Nvidia GPU driver and not of the CUDA SDK. If you get this error message, then you are most likely running a code that requires a GPU on a host which does not have any GPU. If you would like to run a software that requires access to the GPU driver, then you need to submit it as a batch job and request a GPU from the batch system.

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

If you link a code to a CUDA library, then it will always be linked to a versioned library. For example, if you link a code against libcublas of the CUDA X.Y release, then it will link against libcublas.so.X.Y.

When you are getting this error message, then it indicates that you are having a different CUDA version loaded than the code was compiled with.

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

To resolve this error, you would need to load the same CUDA release that the code was compiled with. For this example, you would have to load cuda/9.0.176.

Cluster is missing the h5py python package

The h5py Python package is linked against the HDF5 library, therefore you need to also load the HDF5 module, such that h5py can located the HDF5 libraries.

[leonhard@lo-s4-019 ~]$ module load python_gpu/3.6.1 hdf5/1.10.1
[leonhard@lo-s4-019 ~]$ python
Python 3.6.1 (default, Sep 27 2017, 13:27:13)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import h5py
>>> h5py.__version__
'2.7.1'
>>>