LSF to Slurm quick reference
Contents
- 1 Introduction
- 2 Job submission
- 3 Job control
- 4 Environment variables
Introduction
The commands for Slurm are similar to the ones used in LSF. You can find a mapping of the relevant commands below.
Job submission
Simple command
LSF | Slurm |
---|---|
bsub command | sbatch --wrap=command |
bsub "command1 ; command2" | sbatch --wrap="command1 ; command2" |
bsub "command1 | command2" | sbatch --wrap="command1 | command2" |
bsub [LSF options] command | sbatch [slurm options] --wrap="command" |
Frequently used bsub/sbatch options
Parameter | bsub | sbatch |
---|---|---|
Job name | -J job_name | -J job_name or --job-name=job_name |
Job array consisting of N sub-jobs | -J job_name[1-N]" | -a 1-N or --array=1-N |
Ouput file (stdout) | -o file_name (default: lsf.oJOBID) |
-o file_name or --output=file_name (default: slurm-JOBID.out) |
Error file (stderr) | -e file_name (default: merged with output file) |
-e file_name or --error=file_name (default: merged with output file) |
Wall-clock time (default: 4h) | -W HH:MM | -t DD-HH[:MM] or --time=MM or --time=HH:MM:SS |
Number of cores (default: 1) | -n cores | -n cores or --ntasks=cores for MPI jobs and --ntasks=1 --cpus-per-task=cores for OpenMP jobs |
Number of cores per node | -R "span[ptile=cores_per_node]" | --ntasks-per-node=cores_per_node |
Memory per core (default: 1024 MB) | -R "rusage[mem=MB]" | --mem-per-cpu=MB (can also be expressed in GB using "G" suffix) |
Number of GPUs (default: 0) | -R "rusage[ngpus_excl_p=N]" | -G N or --gpus=N |
Memory per GPU | -R "select[gpu_mtotal0>=MB]" | --gres=gpumem:MB (can also be expressed in GB using "G" suffix) |
Local scratch space per core | -R "rusage[scratch=MB]" | not available |
Local scratch space per node | not available | --tmp=MB (can also be expressed in GB using "G" suffix) |
Run job under a specific shareholder group | -G shareholder_group | -A shareholder_group or --account=shareholder_group |
Notify user by email when job starts | -B | --mail-type=BEGIN |
Notify user by email when job ends | -N | --mail-type=END,FAIL (multiple types can be combined in one option, e.g. --mail-type=BEGIN,END,FAIL) |
Shell script
LSF | Slurm |
---|---|
bsub [options] < jobscript.sh | sbatch [options] < jobscript.sh or sbatch [options] jobscript.sh [arguments] |
Job parameters can be passed as options to bsub or placed inside jobscript.sh using #BSUB pragmas:#!/bin/bash #BSUB -n 4 #BSUB -W 08:00 #BSUB -R "rusage[mem=2000]" #BSUB -R "rusage[scratch=1000]" # per core #BSUB -J analysis1 #BSUB -o analysis1.out #BSUB -e analysis1.err module load xyz/123 command1 command2 ... |
Job parameters can be passed as options to sbatch or placed inside jobscript.sh using #SBATCH pragmas:#!/bin/bash #SBATCH -n 4 #SBATCH --time=8:00:00 #SBATCH --mem-per-cpu=2000 #SBATCH --tmp=4000 # per node!! #SBATCH --job-name=analysis1 #SBATCH --output=analysis1.out #SBATCH --error=analysis1.err module load xyz/123 command1 command2 ... |
Note:
- In LSF, the jobscript.sh must be passed to bsub via the "<" operator
- In LSF, scratch space is expressed per core, while in Slurm it is per node
- In LSF, the default output file is "lsf.oJOBID", while in Slurm it is "slurm-JOBID.out"
Interactive job
LSF | Slurm |
---|---|
bsub -Is [LSF options] bash | srun --pty bash |
Parallel job
LSF | Slurm |
---|---|
bsub -n 128 -R "span[ptile=128]" | sbatch -n 1 --cpus-per-task=128 |
Distributed memory (MPI, processes)
LSF | Slurm |
---|---|
bsub -n 256 -R "span[ptile=128]" | sbatch -n 256 --ntasks-per-node=128 or sbatch -n 256 --nodes=2 |
The Slurm options
- --ntasks-per-core,
- --cpus-per-task,
- --nodes, and
- --ntasks-per-node
are supported.
Please note that for larger parallel MPI jobs that use more than a single node (more than 128 cores), you should add the sbatch option
-C ib
to make sure that they get dispatched to nodes that have the infiniband highspeed interconnect, as this will result a much better performance.
Job array
LSF | Slurm |
---|---|
bsub -J jobname[1-N]" | sbatch --array=1-N |
bsub -J jobname[1-N%step]" | sbatch --array=1-N:step |
Environment variables defined in each job:
|
Environment variables defined in each job:
|
LSF example:
bsub -J "myarray[1-4]" 'echo "Hello, I am task $LSB_JOBINDEX of $LSB_JOBINDEX_END"'
Slurm example:
sbatch --array=1-4 --wrap='echo "Hello, I am task $SLURM_ARRAY_TASK_ID of $SLURM_ARRAY_TASK_COUNT"'
GPU job
LSF | Slurm |
---|---|
bsub -R "rusage[ngpus_excl_p=1]" | sbatch --gpus=1 |
For multi-node jobs you need to use the --gpus-per-node option instead.
GPU job requiring a specific GPU model
LSF | Slurm |
---|---|
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==NVIDIAGeForceGTX1080]" | sbatch --gpus=gtx_1080:1 |
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==NVIDIAGeForceRTX3090]" | sbatch --gpus=rtx_3090:1 |
- For Slurm, currently the specifiers gtx_1080 and rtx_3090 are supported until we add more GPU types.
GPU job requiring a given amount of GPU memory
LSF | Slurm |
---|---|
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_mtotal0>=20480]" | sbatch --gpus=1 --gres=gpumem:20g |
The default unit for gpumem is bytes. You are therefore advised to specify units, for example 20g or 11000m.
LSF | Slurm |
---|---|
bsub -G es_example | sbatch -A es_example |
In Slurm, one can define a default share using the command: "echo account=es_example >> $HOME/.slurm/defaults"
Submit a job on a specific CPU model
LSF | Slurm |
---|---|
bsub -R "select[model==EPYC_7H12]" | sbatch --constraint=EPYC_7H12 |
Job chains
LSF | Slurm |
---|---|
bsub -J job_chain bsub -J job_chain -w "done(job_chain)" |
sbatch -J job_chain -d singleton |
Job dependencies
LSF | Slurm |
---|---|
Job #1: bsub -J job1 command1 Job #2: bsub -J job2 -w "done(job1)" command2 |
Job #1: myjobid=$(sbatch --parsable -J job1 --wrap="command1") Job #2: sbatch -J job2 -d afterany:$myjobid --wrap="command2" |
In Slurm, sbatch --parsable returns the JOBID of the job
Job control
Job status
LSF | Slurm |
---|---|
bjobs [JOBID] | squeue [-j JOBID] |
bjobs -p | squeue -u USERNAME -t PENDING |
bjobs -r | squeue -u USERNAME -t RUNNING |
Resource usage
LSF | Slurm |
---|---|
bbjobs [JOBID] | myjobs -j JOBID |
scontrol show jobid -dd JOBID | |
sacct -l -j JOBID for finished jobs | |
sstat [--all] JOBID for running jobs |
Use --format JobID,User,State,AllocCPUS,Elapsed,NNodes,NTasks,TotalCPU,REQMEM,MaxRSS,ExitCode instead of -l for a customizable, more readable output.
Killing a job
LSF | Slurm |
---|---|
bkill [JOBID] | scancel [JOBID] |
Environment variables
LSF | Slurm |
---|---|
$LSB_JOBID | $SLURM_JOB_ID |
$LSB_SUBCWD | $SLURM_SUBMIT_DIR |