Difference between revisions of "Neural network training with TensorFlow on GPU"
From ScientificComputing
Line 8: | Line 8: | ||
Load the Python module which contains TensorFlow 2.0.0 package | Load the Python module which contains TensorFlow 2.0.0 package | ||
[jarunanp@eu-login-11 ~]$ module load gcc/6.3.0 python_gpu/3.8.5 eth_proxy | [jarunanp@eu-login-11 ~]$ module load gcc/6.3.0 python_gpu/3.8.5 eth_proxy | ||
+ | |||
The following have been reloaded with a version change: | The following have been reloaded with a version change: | ||
+ | 1) gcc/4.8.5 => gcc/6.3.0 | ||
+ | |||
+ | [jarunanp@eu-login-29 ~]$ module list | ||
− | 1) | + | Currently Loaded Modules: |
+ | 1) StdEnv 4) cuda/11.0.3 7) python_gpu/3.8.5 | ||
+ | 2) gcc/6.3.0 5) cudnn/8.0.5 8) eth_proxy | ||
+ | 3) openblas/0.2.20 6) nccl/2.7.8-1 | ||
+ | |||
Check if we could import the TensorFlow package | Check if we could import the TensorFlow package |
Revision as of 09:55, 15 June 2021
< Examples |
Load modules
We will use the new software stack in this tutorial:
[jarunanp@eu-login-11 ~]$ env2lmod
Load the Python module which contains TensorFlow 2.0.0 package
[jarunanp@eu-login-11 ~]$ module load gcc/6.3.0 python_gpu/3.8.5 eth_proxy The following have been reloaded with a version change: 1) gcc/4.8.5 => gcc/6.3.0
[jarunanp@eu-login-29 ~]$ module list Currently Loaded Modules: 1) StdEnv 4) cuda/11.0.3 7) python_gpu/3.8.5 2) gcc/6.3.0 5) cudnn/8.0.5 8) eth_proxy 3) openblas/0.2.20 6) nccl/2.7.8-1
Check if we could import the TensorFlow package
[jarunanp@eu-login-11 tf_gpu]$ python -c "import tensorflow as tf; print(tf.__version__)" 2.3.0
A neural network model
Create a working directory on $SCRATCH
[jarunanp@eu-login-11 ~]$ cd $SCRATCH [jarunanp@eu-login-11 jarunanp]$ mkdir tf_gpu [jarunanp@eu-login-11 tf_gpu]$ cd tf_gpu
Download the script train_mnist_gpu.py containing a neural network model which is trained on MNIST dataset. This example is taken from TensorFlow tutorials.
[jarunanp@eu-login-11 tf_gpu]$ wget https://scicomp.ethz.ch/public/examples/tensorflow/train_mnist_gpu.py
Request an interactive session on a compute node
[jarunanp@eu-login-11 tf_gpu]$ bsub -n 4 -R "rusage[ngpus_excl_p=2]" -Is bash Generic job. Job <175537249> is submitted to queue <gpu.4h>. <<Waiting for dispatch ...>> <<Starting on eu-g3-001>> FILE: /sys/fs/cgroup/cpuset/lsf/euler/job.175537249.83337.1623700766/tasks[jarunanp@eu-g3-001 tf_gpu]$ [jarunanp@eu-g3-001 tf_gpu]$
Launch the training
[jarunanp@eu-g3-001 tf_gpu]$ python train_mnist_gpu.py
< Examples |