Job arrays

From ScientificComputing
Revision as of 15:58, 26 October 2022 by Sfux (talk | contribs) (Monitoring job arrays)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

Many cluster users are running embarrassingly parallel simulations consisting of hundreds or thousands of similar calculations, each one executing the same program but with slightly different — or random in the case of Monte-Carlo simulation — parameters. The usual approach is to submit each one as an independent job. This works fine, although keeping track of all these jobs is not easy, and can get quite complicated if these jobs must be executed in a coordinated fashion (e.g. master/slave). It would be much simpler if one could submit all these jobs at once, and manage them as a single entity. The good news is that it is indeed possible using a so-called job array. Jobs in an array have a common job-ID, plus a specific job-index ($SLURM_ARRAY_TASK_ID) corresponding to their position in the array.

Submitting a job array

Let's take for example a simulation consisting of 4 independent calculations. Normally, one would submit them as 4 individual jobs:

sbatch --job-name="calc 1" --wrap="./program [arguments]"
sbatch --job-name="calc 2" --wrap="./program [arguments]"
sbatch --job-name="calc 3" --wrap=./program"[arguments]"
sbatch --job-name="calc 4" --wrap=./program"[arguments]"

or

for ((n=1;n<=4;n++)); do
    sbatch --job-name="calc $n" --wrap="./program [arguments]"
done

Using a job array, however, one can submit these calculations all at once, using a single sbatch command:

sbatch --array=1-4 --wrap="./program [arguments]"
[sfux@eu-login-40 ~]$ sbatch --array=1-4 --wrap="echo \"Hello, I am an independent job\""
Submitted batch job 1189055
[sfux@eu-login-40 ~]$ squeue -u sfux
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
     1189055_[1-4] normal.4h     wrap     sfux PD       0:00      1 (None) 

A job array creates a Slurm logfile for each element, which will have the name slurm-JOBID_ELEMENT:

[sfux@eu-login-40 ~]$ ls -ltr slurm*
-rw-r--r-- 1 sfux sfux-group 31 Oct 24 10:50 slurm-1189055_1.out
-rw-r--r-- 1 sfux sfux-group 31 Oct 24 10:50 slurm-1189055_2.out
-rw-r--r-- 1 sfux sfux-group 31 Oct 24 10:50 slurm-1189055_3.out
-rw-r--r-- 1 sfux sfux-group 31 Oct 24 10:50 slurm-1189055_4.out
[sfux@eu-login-40 ~]$ cat slurm-1189055_1.out
Hello, I am an independent job

Setting a range of 1-4 will submit 4 jobs (using the default step size 1).

Limiting the number of jobs that are allowed to run at the same time

A job array allows a large number of jobs to be submitted with one command, potentially flooding a system, and job slot limits provide a way to limit the impact a job array may have on a system. You can set this limit by adding %job_slot_limit after specifying the range of the array

sbatch --array=[1-10000]%10 --wrap="echo \"Hello, I am an independent job\""

In this example the array contains 10000 elements and maximally 10 jobs are allowed to run at the same time.

Simulation parameters

Since all jobs in an array execute the same program (or script), you need to define specific parameters for each calculation. You can do this using different mechanisms:

  • create a different input file for each job
  • pass the job index as argument to the program
  • use a "commands" file with 1 command per line

Input and output files

One can use the special strings %A (jobid) and %a (task/element id) in the job's input file name as a placeholder. For example:

sbatch --job-name="testjob" --array=1-4 --input="param.%A.%a" --wrap="command [argument]"
sbatch --job-name="testjob" --array=1-4 --input="calc%A.%a.in" --wrap="command [argument]"

The same mechanism also applies to the output file:

sbatch --job-name="testjob" --array=1-4 --output="result.%A.%a" --wrap="command [argument]"
sbatch --job-name="testjob" --array=1-4 --output="calc%A.%a.out" --wrap="command [argument]"

or the error file:

sbatch --job-name="testjob" --array=1-4 --error="error.%A.%a" --wrap="command [argument]"
sbatch --job-name="testjob" --array=1-4 --error="%A.%a.err" --wrap="command [arguments]"

Program arguments

A common case is to pass the parameter value (the array index $SLURM_ARRAY_TASK_ID) as a command-line argument. Here is an example for a MATLAB function with the parameter as its sole argument:

sbatch --job-name="hello" --array=1-4 --wrap="matlab -nodisplay -singleCompThread -r my_function(\$SLURM_ARRAY_TASK_ID)"

It is important that the $ sign in front of SLURM_ARRAY_TASK_ID is masked with a backslash \$, as the variable needs to be evaluated at runtime. This example would be equivalent to submitting 4 jobs in a row:

sbatch --job-name="hello" --wrap="matlab -nodisplay -singleCompThread -r my_function(1)"
sbatch --job-name="hello" --wrap="matlab -nodisplay -singleCompThread -r my_function(2)"
sbatch --job-name="hello" --wrap="matlab -nodisplay -singleCompThread -r my_function(3)"
sbatch --job-name="hello" --wrap="matlab -nodisplay -singleCompThread -r my_function(4)"

You can specify the range for the job array by using the format

start-end:step

For example

sbatch --job-name="testjob" --array=10-20:2 --wrap="echo \$SLURM_ARRAY_TASK_ID"

would create a job array with 6 elements that would be equivalent to submitting the following six commands:

sbatch --job-name="testjob" --wrap="echo 10"
sbatch --job-name="testjob" --wrap="echo 12"
sbatch --job-name="testjob" --wrap="echo 14"
sbatch --job-name="testjob" --wrap="echo 16"
sbatch --job-name="testjob" --wrap="echo 18"
sbatch --job-name="testjob" --wrap="echo 20"

Please find below an overview on the available environment variables for job arrays in Slurm:

Environment variable Description
$SLURM_ARRAY_TASK_COUNT Number of Slurm jobs in the array
$SLURM_ARRAY_TASK_ID Array index of the elements in the array
$SLURM_ARRAY_TASK_MIN Minimum index in the job array
$SLURM_ARRAY_TASK_MAX Maximum index in the job array

Using a "commands" file

The approach to use the job index works well for a single parameter, or a set of parameters that can be mapped to natural numbers (in this case, the different parameter would be calculated from the job index). There are also cases with multiple parameters that cannot be mapped to natural numbers. Then an alternative technique would be to create a text file "commands" which contains 1 command per line.

Then the variable $SLURM_ARRAY_TASK_ID is a pointer determining which line of the file a job executes.

sbatch --job-name="testjob" --array=1-4 --wrap="awk -v jindex=\$SLURM_ARRAY_TASK_ID 'NR==jindex' commands | bash"

The awk command extracts line number $SLURM_ARRAY_TASK_ID from the "commands" and passes it to bash, such that the command is executed.

The first job would then execute the first command from the commands files, the second job the second command etc.

Group calculations into fewer jobs

Often the jobs within a job array are too short (anything below a few minutes) because every job in the array runs just one short calculation.

You can increase the throughput of your entire job array be grouping several calculations into a fewer number of jobs instead of running a single calculation per job. You should target each job to run for at least about half an hour and 5 minutes at the very least.

In the previous example, we showed how to run four matlab function calls (matlab -nodisplay -singleCompThread -r "my_function(\$SLURM_ARRAY_TASK_ID)") as a job array with four jobs. Now let us convert this to a job array with two jobs, each of which runs two of the function calls. In the first step we will put the matlab call into a script, run_my_function.sh:

#!/bin/bash
matlab -nodisplay -singleCompThread -r "my_function($SLURM_ARRAY_TASK_ID)"

which can be submitted by redirecting it to the bsub command:

sbatch --job-name="hello" --array=1-4 < run_my_function.sh

So far nothing has changed except for how the the command is passed to sbatch. Note that there is no backslash before $SLURM_ARRAY_TASK_ID in the script. In the second step, change the run_my_function.sh script to run two matlab function calls by writing a for loop. Define the STEP variable to be the number of calculations to run in each loop. In our case this is 2:

#!/bin/bash
STEP=2
for ((i=1;i<=$STEP;i++)); do
    MY_JOBINDEX=$((($SLURM_ARRAY_TASK_ID-1)*$STEP + $i))
    matlab -nodisplay -singleCompThread -r "my_function($MY_JOBINDEX)"
done

Note that we now pass MY_JOBINDEX instead of SLURM_ARRAY_TASK_ID to the my_function call so that each calculations gets its unique index. Submit this script but tell Slurm to run just two jobs in the job array (4 calculations/(2 calculations/job) = 2 jobs):

sbatch --job-name="hello --array=1-2 < run_my_function.sh

If the number of calculations to run is not divisible by the number of calculations per job (let's say we want to run 3 calculations per job), then expand the script to be as follows:

#!/bin/bash
STEP=3
MAXINDEX=4
for ((i=1;i<=$STEP;i++)); do
    MY_JOBINDEX=$((($SLURM_ARRAY_TASK_ID-1)*$STEP + $i))
    if [ $MY_JOBINDEX -gt $MAXINDEX ]; then
        break
    fi
    matlab -nodisplay -singleCompThread -r "my_function($MY_JOBINDEX)"
done

Submit this script and set the ending value to ceiling(MAXINDEX/STEP)=ceiling(4/3)=2,

sbatch --job-name="hello" --array=1-2 < run_my_function.sh

Monitoring job arrays

You can monitor a job array with the squeue, scontrol or sacct command:

squeue -j JOBID                         # all jobs in an array
squeue -j JOBID_ELEMENT                 # specific job in an array
scontrol show jobid -dd JOBID           # all jobs in an array
scontrol show jobid -dd JOBID_ELEMENT   # specific job in an array
sacct --format JobID,User,State,AllocCPUS,Elapsed,NNodes,NTasks,TotalCPU,REQMEM,MaxRSS,ExitCode JOBID # all jobs in an array
sacct --format JobID,User,State,AllocCPUS,Elapsed,NNodes,NTasks,TotalCPU,REQMEM,MaxRSS,ExitCode JOBID_ELEMENT

For instance

scontrol show jobid -dd 1010910         # all jobs in 1010910
scontrol show jobid -dd 1010910_4       # fourth job in the array 1010910

Rerunning failed jobs

If some jobs in the job array are failing, then Slurm will automatically try to rerun them.

See also