Difference between revisions of "Neural network training with TensorFlow on CPU"
m (Jarunanp moved page Train a neural network model with TensorFlow to Neural network training with TensorFlow on CPU without leaving a redirect) |
|||
Line 28: | Line 28: | ||
Download the script ''train_mnist.py'' containing a neural network model which is trained on MNIST dataset. This example is taken from [https://www.tensorflow.org/tutorials/quickstart/beginner TensorFlow beginner tutorials]. | Download the script ''train_mnist.py'' containing a neural network model which is trained on MNIST dataset. This example is taken from [https://www.tensorflow.org/tutorials/quickstart/beginner TensorFlow beginner tutorials]. | ||
− | $ wget https:// | + | $ wget https://gitlab.ethz.ch/jarunanp/hpc-examples/-/raw/main/tensorflow/tf_cpu/train_mnist.py?inline=false -O train_mnist.py |
== Submit a batch job == | == Submit a batch job == |
Revision as of 12:12, 29 November 2021
< Examples |
Load modules
We will use the new software stack in this tutorial:
$ env2lmod
Load the Python module which contains TensorFlow 2.0.0 package
$ module load gcc/6.3.0 python/3.7.4 eth_proxy The following have been reloaded with a version change: 1) gcc/4.8.5 => gcc/6.3.0
Check if we could import the TensorFlow package
$ python -c "import tensorflow as tf; print(tf.__version__)" 2.0.0
To run on CPU only, define the number of threads in the environment variable OMP_NUM_THREADS, for example, we would like to run here on 4 processors.
$ export OMP_NUM_THREADS=4
The value of this environment variable is then used to configure the threading in a TensorFlow script:
nthreads = int(os.environ['OMP_NUM_THREADS']) tf.config.threading.set_intra_op_parallelism_threads(nthreads)
These two lines are already included in our example script train_mnist.py.
A neural network model
Download the script train_mnist.py containing a neural network model which is trained on MNIST dataset. This example is taken from TensorFlow beginner tutorials.
$ wget https://gitlab.ethz.ch/jarunanp/hpc-examples/-/raw/main/tensorflow/tf_cpu/train_mnist.py?inline=false -O train_mnist.py
Submit a batch job
Submit a job to a compute node
$ bsub -n 4 -W 01:00 python train_mnist.py Generic job. Job <153279665> is submitted to queue <normal.4h>.
Or create a job script and submit the script to the BSUB command. Please see an example here.
Check the job status
$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 153279665 jarunan PEND normal.4h eu-login-02 *mnist.py Nov 25 14:12
< Examples |