Neural network training with TensorFlow on CPU
We will use the new software stack in this tutorial:
Load the Python module which contains TensorFlow 2.0.0 package
$ module load gcc/6.3.0 python/3.8.5 eth_proxy The following have been reloaded with a version change: 1) gcc/4.8.5 => gcc/6.3.0
Check if we could import the TensorFlow package
$ python -c "import tensorflow as tf; print(tf.__version__)" 2.4.0
To run on CPU only, define the number of threads in the environment variable OMP_NUM_THREADS, for example, we would like to run here on 4 processors.
$ export OMP_NUM_THREADS=4
The value of this environment variable is then used to configure the threading in a TensorFlow script:
nthreads = int(os.environ['OMP_NUM_THREADS']) tf.config.threading.set_intra_op_parallelism_threads(nthreads)
These two lines are already included in our example script train_mnist.py.
A neural network model
Download the script train_mnist.py containing a neural network model which is trained on MNIST dataset. This example is taken from TensorFlow beginner tutorials.
$ wget https://gitlab.ethz.ch/jarunanp/hpc-examples/-/raw/main/tensorflow/tf_cpu/train_mnist.py?inline=false -O train_mnist.py
Submit a batch job
Submit a job to a compute node
$ bsub -n 4 -W 01:00 python train_mnist.py Generic job. Job <153279665> is submitted to queue <normal.4h>.
Or create a job script and submit the script to the BSUB command. Please see an example here.
Check the job status
$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 153279665 jarunan PEND normal.4h eu-login-02 *mnist.py Nov 25 14:12