Difference between revisions of "Neural network training with TensorFlow on CPU"
From ScientificComputing
m (Jarunanp moved page Tutorial TensorFlow to Training a CNN model with TensorFlow) |
|||
Line 8: | Line 8: | ||
1) gcc/4.8.5 => gcc/6.3.0 | 1) gcc/4.8.5 => gcc/6.3.0 | ||
− | Check if we could import TensorFlow package | + | Check if we could import the TensorFlow package |
$ python -c "import tensorflow as tf; print(tf.__version__)" | $ python -c "import tensorflow as tf; print(tf.__version__)" | ||
2.0.0 | 2.0.0 | ||
− | |||
− | + | To run on CPU only, define the number of threads in the environment variable OMP_NUM_THREADS, for example, we would like to run here on 4 processors. | |
+ | $ export OMP_NUM_THREADS=4 | ||
+ | |||
+ | The value of this environment variable is then used to configure the threading in a TensorFlow script: | ||
+ | nthreads = int(os.environ['OMP_NUM_THREADS']) | ||
+ | tf.config.threading.set_intra_op_parallelism_threads(nthreads) | ||
+ | |||
+ | These two lines are already included in our example script train_mnist.py. | ||
+ | |||
+ | == Create a neural network model == | ||
+ | |||
+ | The script train_mnist.py contains a neural network model which is trained on MNIST dataset. This example is taken from [https://www.tensorflow.org/tutorials/quickstart/beginner TensorFlow beginner tutorials]. | ||
== Submit a batch job == | == Submit a batch job == | ||
Submit a job to the compute node | Submit a job to the compute node | ||
− | $ bsub -n | + | $ bsub -n 4 -W 01:00 python train_mnist.py |
Generic job. | Generic job. | ||
Job <153279665> is submitted to queue <normal.4h>. | Job <153279665> is submitted to queue <normal.4h>. | ||
Line 24: | Line 34: | ||
$ bjobs | $ bjobs | ||
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME | JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME | ||
− | 153279665 jarunan PEND normal.4h eu-login-02 * | + | 153279665 jarunan PEND normal.4h eu-login-02 *mnist.py Nov 25 14:12 |
Revision as of 15:24, 8 December 2020
Load Python with TensorFlow module
We will use the new software stack in this tutorial:
$ env2lmod
Load the Python module which contains TensorFlow 2.0.0 package
$ module load gcc/6.3.0 python/3.7.4 hdf5 The following have been reloaded with a version change: 1) gcc/4.8.5 => gcc/6.3.0
Check if we could import the TensorFlow package
$ python -c "import tensorflow as tf; print(tf.__version__)" 2.0.0
To run on CPU only, define the number of threads in the environment variable OMP_NUM_THREADS, for example, we would like to run here on 4 processors.
$ export OMP_NUM_THREADS=4
The value of this environment variable is then used to configure the threading in a TensorFlow script:
nthreads = int(os.environ['OMP_NUM_THREADS']) tf.config.threading.set_intra_op_parallelism_threads(nthreads)
These two lines are already included in our example script train_mnist.py.
Create a neural network model
The script train_mnist.py contains a neural network model which is trained on MNIST dataset. This example is taken from TensorFlow beginner tutorials.
Submit a batch job
Submit a job to the compute node
$ bsub -n 4 -W 01:00 python train_mnist.py Generic job. Job <153279665> is submitted to queue <normal.4h>.
Check the job status
$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 153279665 jarunan PEND normal.4h eu-login-02 *mnist.py Nov 25 14:12