GNU Parallel

From ScientificComputing
Revision as of 14:36, 10 April 2018 by Urbanb (talk | contribs) (Tutorial on GNU Parallel)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

If you have many very short calculations to run, then you can use GNU parallel to run these calculations within a batch single job. This job can be serial, parallel, or even distributed over several nodes.

Using GNU Parallel

Assume you have a file listing several commands to run:

[leonhard@eu-login-00 ~]$ cat parcommands.txt 
printf "Running 1 on $HOSTNAME\n"
printf "Running 2 on $HOSTNAME\n"
printf "Running 3 on $HOSTNAME\n"
printf "Running 4 on $HOSTNAME\n"
printf "Running 5 on $HOSTNAME\n"
printf "Running 6 on $HOSTNAME\n"
printf "Running 7 on $HOSTNAME\n"
printf "Running 8 on $HOSTNAME\n"
printf "Running 9 on $HOSTNAME\n"
printf "Running 10 on $HOSTNAME\n"
printf "Running 11 on $HOSTNAME\n"

GNU Parallel can take this list of commands and run them concurrently within a parallel job. For example,

[leonhard@eu-login-00 ~]$ bsub -n 4 "parallel < parcommands.txt > paroutput.txt"

and when it is done

[leonhard@eu-login-00 ~]$ cat paroutput.txt
Running 1 on eu-ms-001-01
Running 2 on eu-ms-001-01
Running 3 on eu-ms-001-01
Running 4 on eu-ms-001-01
Running 5 on eu-ms-001-01
Running 6 on eu-ms-001-01
Running 7 on eu-ms-001-01
Running 8 on eu-ms-001-01
Running 9 on eu-ms-001-01
Running 10 on eu-ms-001-01
Running 11 on eu-ms-001-01

Using GNU Parallel on multiple nodes

You can also use GNU Parallel to run the commands over several nodes. For example,

[leonhard@eu-login-00 ~]$ bsub -n 48 'parallel --ssh "blaunch" -S "$(/cluster/apps/local/hostlist_parallel.sh)" < parcommands.txt"