Job arrays
Contents
Introduction
Many cluster users are running embarrassingly parallel simulations consisting of hundreds or thousands of similar calculations, each one executing the same program but with slightly different — or random in the case of Monte-Carlo simulation — parameters. The usual approach is to submit each one as an independent job. This works fine, although keeping track of all these jobs is not easy, and can get quite complicated if these jobs must be executed in a coordinated fashion (e.g. master/slave). It would be much simpler if one could submit all these jobs at once, and manage them as a single entity. The good news is that it is indeed possible using a so-called job array. Jobs in an array have a common job-ID, plus a specific job-index ($SLURM_ARRAY_TASK_ID) corresponding to their position in the array.
Submitting a job array
Let's take for example a simulation consisting of 4 independent calculations. Normally, one would submit them as 4 individual jobs:
sbatch --job-name="calc 1" --wrap="./program [arguments]" sbatch --job-name="calc 2" --wrap="./program [arguments]" sbatch --job-name="calc 3" --wrap=./program"[arguments]" sbatch --job-name="calc 4" --wrap=./program"[arguments]"
or
for ((n=1;n<=4;n++)); do sbatch --job-name="calc $n" --wrap="./program [arguments]" done
Using a job array, however, one can submit these calculations all at once, using a single sbatch command:
sbatch --array=1-4 --wrap="./program [arguments]"
[sfux@eu-login-40 ~]$ sbatch --array=1-4 --wrap="echo \"Hello, I am an independent job\"" Submitted batch job 1189055 [sfux@eu-login-40 ~]$ squeue -u sfux JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1189055_[1-4] normal.4h wrap sfux PD 0:00 1 (None)
A job array creates a Slurm logfile for each element, which will have the name slurm-JOBID_ELEMENT:
[sfux@eu-login-40 ~]$ ls -ltr slurm* -rw-r--r-- 1 sfux sfux-group 31 Oct 24 10:50 slurm-1189055_1.out -rw-r--r-- 1 sfux sfux-group 31 Oct 24 10:50 slurm-1189055_2.out -rw-r--r-- 1 sfux sfux-group 31 Oct 24 10:50 slurm-1189055_3.out -rw-r--r-- 1 sfux sfux-group 31 Oct 24 10:50 slurm-1189055_4.out [sfux@eu-login-40 ~]$ cat slurm-1189055_1.out Hello, I am an independent job
Setting a range of 1-4 will submit 4 jobs (using the default step size 1).
Limiting the number of jobs that are allowed to run at the same time
A job array allows a large number of jobs to be submitted with one command, potentially flooding a system, and job slot limits provide a way to limit the impact a job array may have on a system. You can set this limit by adding %job_slot_limit after specifying the range of the array
sbatch --array=[1-10000]%10 --wrap="echo \"Hello, I am an independent job\""
In this example the array contains 10000 elements and maximally 10 jobs are allowed to run at the same time.
Simulation parameters
Since all jobs in an array execute the same program (or script), you need to define specific parameters for each calculation. You can do this using different mechanisms:
- create a different input file for each job
- pass the job index as argument to the program
- use a "commands" file with 1 command per line
Input and output files
One can use the special strings %A (jobid) and %a (task/element id) in the job's input file name as a placeholder. For example:
sbatch --job-name="testjob" --array=1-4 --input="param.%A.%a" --wrap="command [argument]" sbatch --job-name="testjob" --array=1-4 --input="calc%A.%a.in" --wrap="command [argument]"
The same mechanism also applies to the output file:
sbatch --job-name="testjob" --array=1-4 --output="result.%A.%a" --wrap="command [argument]" sbatch --job-name="testjob" --array=1-4 --output="calc%A.%a.out" --wrap="command [argument]"
or the error file:
sbatch --job-name="testjob" --array=1-4 --error="error.%A.%a" --wrap="command [argument]" sbatch --job-name="testjob" --array=1-4 --error="%A.%a.err" --wrap="command [arguments]"
Program arguments
A common case is to pass the parameter value (the array index $SLURM_ARRAY_TASK_ID) as a command-line argument. Here is an example for a MATLAB function with the parameter as its sole argument:
sbatch --job-name="hello" --array=1-4 --wrap="matlab -nodisplay -singleCompThread -r my_function(\$SLURM_ARRAY_TASK_ID)"
It is important that the $ sign in front of SLURM_ARRAY_TASK_ID is masked with a backslash \$, as the variable needs to be evaluated at runtime. This example would be equivalent to submitting 4 jobs in a row:
sbatch --job-name="hello" --wrap="matlab -nodisplay -singleCompThread -r my_function(1)" sbatch --job-name="hello" --wrap="matlab -nodisplay -singleCompThread -r my_function(2)" sbatch --job-name="hello" --wrap="matlab -nodisplay -singleCompThread -r my_function(3)" sbatch --job-name="hello" --wrap="matlab -nodisplay -singleCompThread -r my_function(4)"
You can specify the range for the job array by using the format
start-end:step
For example
sbatch --job-name="testjob" --array=10-20:2 --wrap="echo \$SLURM_ARRAY_TASK_ID"
would create a job array with 6 elements that would be equivalent to submitting the following six commands:
sbatch --job-name="testjob" --wrap="echo 10" sbatch --job-name="testjob" --wrap="echo 12" sbatch --job-name="testjob" --wrap="echo 14" sbatch --job-name="testjob" --wrap="echo 16" sbatch --job-name="testjob" --wrap="echo 18" sbatch --job-name="testjob" --wrap="echo 20"
Please find below an overview on the available environment variables for job arrays in Slurm:
Environment variable | Description |
---|---|
$SLURM_ARRAY_TASK_COUNT | Number of Slurm jobs in the array |
$SLURM_ARRAY_TASK_ID | Array index of the elements in the array |
$SLURM_ARRAY_TASK_MIN | Minimum index in the job array |
$SLURM_ARRAY_TASK_MAX | Maximum index in the job array |
Using a "commands" file
The approach to use the job index works well for a single parameter, or a set of parameters that can be mapped to natural numbers (in this case, the different parameter would be calculated from the job index). There are also cases with multiple parameters that cannot be mapped to natural numbers. Then an alternative technique would be to create a text file "commands" which contains 1 command per line.
Then the variable $SLURM_ARRAY_TASK_ID is a pointer determining which line of the file a job executes.
sbatch --job-name="testjob" --array=1-4 --wrap="awk -v jindex=\$SLURM_ARRAY_TASK_ID 'NR==jindex' commands | bash"
The awk command extracts line number $SLURM_ARRAY_TASK_ID from the "commands" and passes it to bash, such that the command is executed.
The first job would then execute the first command from the commands files, the second job the second command etc.
Group calculations into fewer jobs
Often the jobs within a job array are too short (anything below a few minutes) because every job in the array runs just one short calculation.
You can increase the throughput of your entire job array be grouping several calculations into a fewer number of jobs instead of running a single calculation per job. You should target each job to run for at least about half an hour and 5 minutes at the very least.
In the previous example, we showed how to run four matlab function calls (matlab -nodisplay -singleCompThread -r "my_function(\$SLURM_ARRAY_TASK_ID)") as a job array with four jobs. Now let us convert this to a job array with two jobs, each of which runs two of the function calls. In the first step we will put the matlab call into a script, run_my_function.sh:
#!/bin/bash matlab -nodisplay -singleCompThread -r "my_function($SLURM_ARRAY_TASK_ID)"
which can be submitted by redirecting it to the bsub command:
sbatch --job-name="hello" --array=1-4 < run_my_function.sh
So far nothing has changed except for how the the command is passed to sbatch. Note that there is no backslash before $SLURM_ARRAY_TASK_ID in the script. In the second step, change the run_my_function.sh script to run two matlab function calls by writing a for loop. Define the STEP variable to be the number of calculations to run in each loop. In our case this is 2:
#!/bin/bash STEP=2 for ((i=1;i<=$STEP;i++)); do MY_JOBINDEX=$((($SLURM_ARRAY_TASK_ID-1)*$STEP + $i)) matlab -nodisplay -singleCompThread -r "my_function($MY_JOBINDEX)" done
Note that we now pass MY_JOBINDEX instead of SLURM_ARRAY_TASK_ID to the my_function call so that each calculations gets its unique index. Submit this script but tell Slurm to run just two jobs in the job array (4 calculations/(2 calculations/job) = 2 jobs):
sbatch --job-name="hello --array=1-2 < run_my_function.sh
If the number of calculations to run is not divisible by the number of calculations per job (let's say we want to run 3 calculations per job), then expand the script to be as follows:
#!/bin/bash STEP=3 MAXINDEX=4 for ((i=1;i<=$STEP;i++)); do MY_JOBINDEX=$((($SLURM_ARRAY_TASK_ID-1)*$STEP + $i)) if [ $MY_JOBINDEX -gt $MAXINDEX ]; then break fi matlab -nodisplay -singleCompThread -r "my_function($MY_JOBINDEX)" done
Submit this script and set the ending value to ceiling(MAXINDEX/STEP)=ceiling(4/3)=2,
sbatch --job-name="hello" --array=1-2 < run_my_function.sh
Monitoring job arrays
You can monitor a job array with the squeue, scontrol or sacct command:
squeue -j JOBID # all jobs in an array squeue -j JOBID_ELEMENT # specific job in an array scontrol show jobid -dd JOBID # all jobs in an array scontrol show jobid -dd JOBID_ELEMENT # specific job in an array sacct --format JobID,User,State,AllocCPUS,Elapsed,NNodes,NTasks,TotalCPU,REQMEM,MaxRSS,ExitCode JOBID # all jobs in an array sacct --format JobID,User,State,AllocCPUS,Elapsed,NNodes,NTasks,TotalCPU,REQMEM,MaxRSS,ExitCode JOBID_ELEMENT
For instance
scontrol show jobid -dd 1010910 # all jobs in 1010910 scontrol show jobid -dd 1010910_4 # fourth job in the array 1010910
Rerunning failed jobs
If some jobs in the job array are failing, then Slurm will automatically try to rerun them.