Neural network training with TensorFlow on CPU

From ScientificComputing
Jump to: navigation, search

< Examples

Load modules

We will use the new software stack in this tutorial:

 $ env2lmod  

Load the Python module which contains TensorFlow 2.0.0 package

 $ module load gcc/6.3.0 python/3.8.5 eth_proxy

 The following have been reloaded with a version change:
   1) gcc/4.8.5 => gcc/6.3.0

Check if we could import the TensorFlow package

 $ python -c "import tensorflow as tf; print(tf.__version__)"
 2.4.0

To run on CPU only, define the number of threads in the environment variable OMP_NUM_THREADS, for example, we would like to run here on 4 processors.

 $ export OMP_NUM_THREADS=4

The value of this environment variable is then used to configure the threading in a TensorFlow script:

 nthreads = int(os.environ['OMP_NUM_THREADS'])
 tf.config.threading.set_intra_op_parallelism_threads(nthreads)

These two lines are already included in our example script train_mnist.py.

A neural network model

Download the script train_mnist.py containing a neural network model which is trained on MNIST dataset. This example is taken from TensorFlow beginner tutorials.

 $ wget https://gitlab.ethz.ch/jarunanp/hpc-examples/-/raw/main/tensorflow/tf_cpu/train_mnist.py?inline=false -O train_mnist.py

Submit a batch job

Submit a job to a compute node

 $ bsub -n 4 -W 01:00 python train_mnist.py
 Generic job.
 Job <153279665> is submitted to queue <normal.4h>.

Or create a job script and submit the script to the BSUB command. Please see an example here.

Check the job status

 $ bjobs
 JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
 153279665  jarunan PEND  normal.4h  eu-login-02             *mnist.py  Nov 25 14:12


< Examples