Difference between revisions of "Neural network training with TensorFlow on CPU"
m (Jarunanp moved page Train a Neural Network Model with TensorFlow to Train a neural network model with TensorFlow) |
(→Load modules) |
||
(9 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
+ | __NOTOC__ | ||
+ | {{back_to_tutorials}} | ||
+ | |||
== Load modules == | == Load modules == | ||
We will use the new software stack in this tutorial: | We will use the new software stack in this tutorial: | ||
Line 4: | Line 7: | ||
Load the Python module which contains TensorFlow 2.0.0 package | Load the Python module which contains TensorFlow 2.0.0 package | ||
− | $ module load gcc/6.3.0 python/3. | + | $ module load gcc/6.3.0 python/3.8.5 eth_proxy |
+ | |||
The following have been reloaded with a version change: | The following have been reloaded with a version change: | ||
1) gcc/4.8.5 => gcc/6.3.0 | 1) gcc/4.8.5 => gcc/6.3.0 | ||
Line 10: | Line 14: | ||
Check if we could import the TensorFlow package | Check if we could import the TensorFlow package | ||
$ python -c "import tensorflow as tf; print(tf.__version__)" | $ python -c "import tensorflow as tf; print(tf.__version__)" | ||
− | 2. | + | 2.4.0 |
To run on CPU only, define the number of threads in the environment variable OMP_NUM_THREADS, for example, we would like to run here on 4 processors. | To run on CPU only, define the number of threads in the environment variable OMP_NUM_THREADS, for example, we would like to run here on 4 processors. | ||
Line 23: | Line 27: | ||
== A neural network model == | == A neural network model == | ||
− | Download the script train_mnist.py containing a neural network model which is trained on MNIST dataset. This example is taken from [https://www.tensorflow.org/tutorials/quickstart/beginner TensorFlow beginner tutorials]. | + | Download the script ''train_mnist.py'' containing a neural network model which is trained on MNIST dataset. This example is taken from [https://www.tensorflow.org/tutorials/quickstart/beginner TensorFlow beginner tutorials]. |
+ | $ wget https://gitlab.ethz.ch/jarunanp/hpc-examples/-/raw/main/tensorflow/tf_cpu/train_mnist.py?inline=false -O train_mnist.py | ||
== Submit a batch job == | == Submit a batch job == | ||
Line 30: | Line 35: | ||
Generic job. | Generic job. | ||
Job <153279665> is submitted to queue <normal.4h>. | Job <153279665> is submitted to queue <normal.4h>. | ||
+ | |||
+ | Or create a job script and submit the script to the BSUB command. Please see an example [[MPI_hello_world_in_C#Create_a_job_script|here]]. | ||
Check the job status | Check the job status | ||
Line 35: | Line 42: | ||
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME | JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME | ||
153279665 jarunan PEND normal.4h eu-login-02 *mnist.py Nov 25 14:12 | 153279665 jarunan PEND normal.4h eu-login-02 *mnist.py Nov 25 14:12 | ||
+ | |||
+ | |||
+ | {{back_to_tutorials}} |
Latest revision as of 09:50, 9 May 2022
< Examples |
Load modules
We will use the new software stack in this tutorial:
$ env2lmod
Load the Python module which contains TensorFlow 2.0.0 package
$ module load gcc/6.3.0 python/3.8.5 eth_proxy The following have been reloaded with a version change: 1) gcc/4.8.5 => gcc/6.3.0
Check if we could import the TensorFlow package
$ python -c "import tensorflow as tf; print(tf.__version__)" 2.4.0
To run on CPU only, define the number of threads in the environment variable OMP_NUM_THREADS, for example, we would like to run here on 4 processors.
$ export OMP_NUM_THREADS=4
The value of this environment variable is then used to configure the threading in a TensorFlow script:
nthreads = int(os.environ['OMP_NUM_THREADS']) tf.config.threading.set_intra_op_parallelism_threads(nthreads)
These two lines are already included in our example script train_mnist.py.
A neural network model
Download the script train_mnist.py containing a neural network model which is trained on MNIST dataset. This example is taken from TensorFlow beginner tutorials.
$ wget https://gitlab.ethz.ch/jarunanp/hpc-examples/-/raw/main/tensorflow/tf_cpu/train_mnist.py?inline=false -O train_mnist.py
Submit a batch job
Submit a job to a compute node
$ bsub -n 4 -W 01:00 python train_mnist.py Generic job. Job <153279665> is submitted to queue <normal.4h>.
Or create a job script and submit the script to the BSUB command. Please see an example here.
Check the job status
$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 153279665 jarunan PEND normal.4h eu-login-02 *mnist.py Nov 25 14:12
< Examples |