Difference between revisions of "LSF to Slurm quick reference"
From ScientificComputing
(→Submit a job using a specific share) |
(→GPU job requiring a given amount of GPU memory: Add note about units to gpumem selector.) |
||
Line 171: | Line 171: | ||
| style="width:50%;" | <tt>bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_mtotal0>=20480]"</tt> || <tt>sbatch --gpus=1 --gres=gpumem:20g</tt> | | style="width:50%;" | <tt>bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_mtotal0>=20480]"</tt> || <tt>sbatch --gpus=1 --gres=gpumem:20g</tt> | ||
|} | |} | ||
+ | |||
+ | The default unit for gpumem is '''bytes'''. You are therefore advised to specify units, for example <tt>20'''g'''</tt> or <tt>11000'''m'''</tt>. | ||
===Submit a job using a specific share=== | ===Submit a job using a specific share=== |
Revision as of 15:33, 16 September 2022
Introduction
The commands for Slurm are similar to the ones used in LSF. You can find a mapping of the relevant commands below.
Job submission
Simple command
LSF | Slurm |
---|---|
bsub command | sbatch --wrap command |
bsub "command1 ; command2" | sbatch --wrap "command1 ; command2" |
bsub "command1 | command2" | sbatch --wrap "command1 | command2" |
Frequently used bsub/sbatch options
Parameter | bsub | sbatch |
---|---|---|
Job name | -J job_name | -J job_name or --job-name=job_name |
Job array consisting of N sub-jobs | -J job_name[1-N]" | -a 1-N or --array=1-N |
Ouput file (stdout) | -o file_name (default: lsf.oJOBID) |
-o file_name or --output=file_name (default: slurm-JOBID.out) |
Error file (stderr) | -e file_name (default: merged with output file) |
-e file_name or --error=file_name (default: merged with output file) |
Wall-clock time (default: 4h) | -W HH:MM | -t DD-HH[:MM] or --time=MM or --time=HH:MM:SS |
Number of cores (default: 1) | -n cores | -n cores or --ntasks=cores |
Number of cores per node | -R "span[ptile=cores_per_node]" | --ntasks-per-node=cores_per_node |
Memory per core (default: 1024 MB) | -R "rusage[mem=MB]" | --mem-per-cpu=MB (can also be expressed in GB using "G" suffix) |
Number of GPUs (default: 0) | -R "rusage[ngpus_excl_p=N]" | -G N or --gpus=N |
Memory per GPU | -R "select[gpu_mtotal0>=MB]" | --gres=gpumem:MB (can also be expressed in GB using "G" suffix) |
Local scratch space per core | -R "rusage[scratch=MB]" | not available |
Local scratch space per node | not available | --tmp=MB (can also be expressed in GB using "G" suffix) |
Run job under a specific shareholder group | -G shareholder_group | -A shareholder_group or --account=shareholder_group |
Notify user by email when job starts | -B | --mail-type=BEGIN |
Notify user by email when job ends | -N | --mail-type=END (multiple types can be combined in one option, e.g. --mail-type=BEGIN,END) |
Shell script
LSF | Slurm |
---|---|
bsub [options] < jobscript.sh | sbatch [options] jobscript.sh |
Job parameters can be passed as options to bsub or placed inside jobscript.sh using #BSUB pragmas:#!/bin/bash #BSUB -n 4 #BSUB -W 08:00 #BSUB -R "rusage[mem=2000]" #BSUB -R "rusage[scratch=1000]" # per core #BSUB -J analysis1 #BSUB -o analysis1.out #BSUB -e analysis1.err module load xyz/123 command1 command2 ... |
Job parameters can be passed as options to sbatch or placed inside jobscript.sh using #SBATCH pragmas:#!/bin/bash #SBATCH -n 4 #SBATCH --time=8:00 #SBATCH --mem-per-cpu=2000 #SBATCH --tmp=4000 # per node!! #SBATCH --job-name=analysis1 #SBATCH --output=analysis1.out #SBATCH --error=analysis1.err module load xyz/123 command1 command2 ... |
Note:
- In LSF, the jobscript.sh must be passed to bsub via the "<" operator
- In LSF, scratch space is expressed per core, while in Slurm it is per node
- In LSF, the default output file is "lsf.oJOBID", while in Slurm it is "slurm-JOBID.out"
Interactive job
LSF | Slurm |
---|---|
bsub -Is [LSF options] bash | srun --pty bash |
Parallel job
LSF | Slurm |
---|---|
bsub -n 256 -R "span[ptile=128]" | sbatch -n 256 --ntasks-per-node=128 or sbatch -n 256 --nodes=2 |
The Slurm options
- --ntasks-per-core,
- --cpus-per-task,
- --nodes, and
- --ntasks-per-node
are supported.
Job array
LSF | Slurm |
---|---|
bsub -J jobname[1-N]" | sbatch --array=1-N |
bsub -J jobname[1-N%step]" | sbatch --array=1-N:step |
Environment variables defined in each job:
|
Environment variables defined in each job:
|
LSF example:
bsub -J "myarray[1-4]" 'echo "Hello, I am task $LSB_JOBINDEX of $LSB_JOBINDEX_END"'
Slurm example:
sbatch --array=1-4 --wrap 'echo "Hello, I am task $SLURM_ARRAY_TASK_ID of $SLURM_ARRAY_TASK_COUNT"'
GPU job
LSF | Slurm |
---|---|
bsub -R "rusage[ngpus_excl_p=1]" | sbatch --gpus=1 |
For multi-node jobs you need to use the --gpus-per-node option instead.
GPU job requiring a specific GPU model
LSF | Slurm |
---|---|
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==NVIDIAGeForceGTX1080]" | sbatch --gpus=gtx_1080:1 |
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==NVIDIAGeForceRTX3090]" | sbatch --gpus=rtx_3090:1 |
- For Slurm, currently the specifiers gtx_1080 and rtx_3090 are supported until we add more GPU types.
GPU job requiring a given amount of GPU memory
LSF | Slurm |
---|---|
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_mtotal0>=20480]" | sbatch --gpus=1 --gres=gpumem:20g |
The default unit for gpumem is bytes. You are therefore advised to specify units, for example 20g or 11000m.
LSF | Slurm |
---|---|
bsub -G es_example | sbatch -A es_example To set as future default: echo account=es_example >> $HOME/.slurm/defaults |
Job control
Job status
LSF | Slurm |
---|---|
bjobs [JOBID] | squeue [-j JOBID] |
Resources usage
LSF | Slurm |
---|---|
bbjobs [JOBID] | sacct -l -j JOBID or sstat [--all] JOBID for running jobs |
Use --format JobID,AveCPU,MaxRSS instead of -l for a customizable, more readable output.
Killing a job
LSF | Slurm |
---|---|
bkill [JOBID] | scancel [JOBID] |
Environment variables
LSF | Slurm |
---|---|
$LSB_JOBID | $SLURM_JOB_ID |
$LSB_SUBCWD | $SLURM_SUBMIT_DIR |