GNU Parallel

From ScientificComputing
Jump to: navigation, search

If you have many very short calculations to run, then you can use GNU parallel to run these calculations within a batch single job. This job can be serial, parallel, or even distributed over several nodes.

Using GNU Parallel

Assume you have a file listing several commands to run:

[sfux@eu-login-01 ~]$ cat parcommands.txt 
printf "Running 1 on $HOSTNAME\n"
printf "Running 2 on $HOSTNAME\n"
printf "Running 3 on $HOSTNAME\n"
printf "Running 4 on $HOSTNAME\n"
printf "Running 5 on $HOSTNAME\n"
printf "Running 6 on $HOSTNAME\n"
printf "Running 7 on $HOSTNAME\n"
printf "Running 8 on $HOSTNAME\n"
printf "Running 9 on $HOSTNAME\n"
printf "Running 10 on $HOSTNAME\n"
printf "Running 11 on $HOSTNAME\n"

GNU Parallel can take this list of commands and run them concurrently within a parallel job. For example,

[sfux@eu-login-01 ~]$ sbatch --ntasks=4 --wrap="parallel < parcommands.txt > paroutput.txt"

and when it is done

[sfux@eu-login-01 ~]$ cat paroutput.txt
Running 1 on eu-ms-001-01
Running 2 on eu-ms-001-01
Running 3 on eu-ms-001-01
Running 4 on eu-ms-001-01
Running 5 on eu-ms-001-01
Running 6 on eu-ms-001-01
Running 7 on eu-ms-001-01
Running 8 on eu-ms-001-01
Running 9 on eu-ms-001-01
Running 10 on eu-ms-001-01
Running 11 on eu-ms-001-01

Using GNU Parallel on multiple nodes

You can also use GNU Parallel to run the commands over several nodes. For example,

[sfux@eu-login-01 ~]$ sbatch --ntasks=48 --wrap='parallel --ssh "srun" -S "$(/cluster/apps/local/hostlist_parallel.sh)" < parcommands.txt'