Difference between revisions of "Python on Leonhard"
(→Troubleshooting) |
(→TensorFlow) |
||
Line 34: | Line 34: | ||
|module load python_cpu/3.6.4 | |module load python_cpu/3.6.4 | ||
|Python 3.6.4, TensorFlow 1.7 | |Python 3.6.4, TensorFlow 1.7 | ||
+ | |- | ||
+ | |module load python_cpu/3.7.1 | ||
+ | |Python 3.7.1, TensorFlow 1.13.1 | ||
|- | |- | ||
! colspan=2 | GPU | ! colspan=2 | GPU |
Revision as of 09:29, 19 March 2019
Contents
Introduction
Because certain Python packages need different installations for their CPU and GPU versions, we decided to have separate Python installations with regards to using CPUs and GPUs.
CPU version | GPU version |
---|---|
module load python_cpu/3.6.1 | module load python_gpu/3.6.1 |
TensorFlow
On Leonhard, we provide several versions of TensorFlow. The following combinations are available:
CPU | |
---|---|
Module command | TensorFlow version |
module load python_cpu/2.7.12 | Python 2.7.12, TensorFlow 1.2.1 |
module load python_cpu/2.7.13 | Python 2.7.13, TensorFlow 1.3 |
module load python_cpu/2.7.14 | Python 2.7.14, TensorFlow 1.7 |
module load python_cpu/3.6.0 | Python 3.6.0, TensorFlow 1.2.1 |
module load python_cpu/3.6.1 | Python 3.6.1, TensorFlow 1.3 |
module load python_cpu/3.6.4 | Python 3.6.4, TensorFlow 1.7 |
module load python_cpu/3.7.1 | Python 3.7.1, TensorFlow 1.13.1 |
GPU | |
Module command | TensorFlow version |
module load python_gpu/2.7.12 | Python 2.7.12, TensorFlow 1.2.1, CUDA 8.0.61, cuDNN 5.1 |
module load python_gpu/2.7.13 | Python 2.7.13, TensorFlow 1.3, CUDA 8.0.61, cuDNN 6.0 |
module load python_gpu/2.7.14 | Python 2.7.14, TensorFlow 1.7, CUDA 9.0.176, cuDNN 7.0 |
module load python_gpu/3.6.0 | Python 3.6.0, TensorFlow 1.2.1, CUDA 8.0.61, cuDNN 5.1 |
module load python_gpu/3.6.1 | Python 3.6.1, TensorFlow 1.3, CUDA 8.0.61, cuDNN 6.0 |
module load python_gpu/3.6.4 | Python 3.6.4, TensorFlow 1.7, CUDA 9.0.176, cuDNN 7.0 |
If you would like to run a TensorFlow job on a CPU node, then you would need to load a CPU version of TensorFlow, whereas you would need to load a GPU version of TensorFlow for running a TensorFlow job on a GPU node.
Troubleshooting
libcuda.so.1 is part of the Nvidia GPU driver and not of the CUDA SDK. If you get this error message, then you are most likely running a code that requires a GPU on a host which does not have any GPU. If you would like to run a software that requires access to the GPU driver, then you need to submit it as a batch job and request a GPU from the batch system.
If you link a code to a CUDA library, then it will always be linked to a versioned library. For example, if you link a code against libcublas of the CUDA X.Y release, then it will link against libcublas.so.X.Y.
When you are getting this error message, then it indicates that you are having a different CUDA version loaded than the code was compiled with.
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
To resolve this error, you would need to load the same CUDA release that the code was compiled with. For this example, you would have to load cuda/9.0.176.
Cluster is missing the h5py python package
The h5py Python package is linked against the HDF5 library, therefore you need to also load the HDF5 module, such that h5py can located the HDF5 libraries.
[leonhard@lo-s4-019 ~]$ module load python_gpu/3.6.1 hdf5/1.10.1 [leonhard@lo-s4-019 ~]$ python Python 3.6.1 (default, Sep 27 2017, 13:27:13) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import h5py >>> h5py.__version__ '2.7.1' >>>