Difference between revisions of "LSF to Slurm quick reference"

From ScientificComputing
Jump to: navigation, search
(Job dependencies)
 
(75 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
==Introduction==
 
==Introduction==
 +
 
The commands for Slurm are similar to the ones used in LSF. You can find a mapping of the relevant commands below.
 
The commands for Slurm are similar to the ones used in LSF. You can find a mapping of the relevant commands below.
  
==Submitting a batch job==
+
==Job submission==
 +
 
 +
===Simple command===
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
! LSF !! Slurm
 
! LSF !! Slurm
 
|-
 
|-
| style="width:50%;" | <tt>bsub < jobscript.sh</tt> || <tt>sbatch jobscript.sh</tt>
+
| style="width:50%;" | <tt>bsub ''command''</tt> || <tt>sbatch --wrap=''command''</tt>  
 
|-
 
|-
| <tt>jobscript.sh:</tt><br />
+
| style="width:50%;" | <tt>bsub "''command1'' ; ''command2''"</tt> || <tt>sbatch --wrap="''command1'' ; ''command2''"</tt>
 +
|-
 +
| style="width:50%;" | <tt>bsub "''command1'' &#124; ''command2''"</tt> || <tt>sbatch --wrap="''command1'' &#124; ''command2''"</tt>
 +
|-
 +
| style="width:50%;" | <tt>bsub [LSF options] ''command''</tt> || <tt>sbatch [slurm options] --wrap="''command''"</tt>
 +
|}
 +
 
 +
===Frequently used <tt>bsub/sbatch</tt> options===
 +
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 +
! style="width:25%;"|Parameter
 +
! style="width:25%;"|<tt>bsub</tt>
 +
! style="width:50%;"|<tt>sbatch</tt>
 +
|- style="vertical-align:top;"
 +
| Job name||<tt>-J ''job_name''</tt>||<tt>-J ''job_name''</tt> &emsp; or &emsp; <tt>--job-name=''job_name''</tt>
 +
|- style="vertical-align:top;"
 +
| Job array consisting of ''N'' sub-jobs||<tt>-J ''job_name[1-''N'']"</tt>||<tt>-a 1-''N''</tt> &emsp; or &emsp; <tt>--array=1-''N''</tt>
 +
|- style="vertical-align:top;"
 +
|Ouput file (stdout)||<tt>-o ''file_name''</tt><br>(default: <tt>lsf.o''JOBID''</tt>)||<tt>-o ''file_name''</tt> &emsp; or &emsp; <tt>--output=''file_name''</tt><br>(default: <tt>slurm-''JOBID''.out</tt>)
 +
|- style="vertical-align:top;"
 +
|Error file (stderr)||<tt>-e ''file_name''</tt><br>(default: merged with output file)||<tt>-e ''file_name''</tt> &emsp; or &emsp; <tt>--error=''file_name''</tt><br>(default: merged with output file)
 +
|- style="vertical-align:top;"
 +
| Wall-clock time (default: 4h)||<tt>-W ''HH:MM''</tt>||<tt>-t ''DD-HH[:MM]''</tt> &emsp; or &emsp; <tt>--time=''MM''</tt> &emsp; or &emsp; <tt>--time=''HH:MM:SS''</tt>
 +
|- style="vertical-align:top;"
 +
| Number of cores (default: 1)||<tt>-n ''cores''</tt>||<tt>-n ''cores''</tt> &emsp; or &emsp; <tt>--ntasks=''cores''</tt>
 +
|- style="vertical-align:top;"
 +
| Number of cores per node||<tt>-R "span[ptile=''cores_per_node'']"</tt>||<tt>--ntasks-per-node=''cores_per_node''</tt>
 +
|- style="vertical-align:top;"
 +
| Memory per core (default: 1024 MB)||<tt>-R "rusage[mem=''MB'']"</tt>||<tt>--mem-per-cpu=''MB''</tt> &emsp; (can also be expressed in GB using "G" suffix)
 +
|- style="vertical-align:top;"
 +
| Number of GPUs (default: 0)||<tt>-R "rusage[ngpus_excl_p=''N'']"</tt>||<tt>-G ''N''</tt> &emsp; or &emsp; <tt>--gpus=''N''</tt>
 +
|- style="vertical-align:top;"
 +
| Memory per GPU||<tt>-R "select[gpu_mtotal0>=''MB'']"</tt>||<tt>--gres=gpumem:''MB''</tt> &emsp; (can also be expressed in GB using "G" suffix)
 +
|- style="vertical-align:top;"
 +
| Local scratch space per core||<tt>-R "rusage[scratch=''MB'']"</tt>||''not available''
 +
|- style="vertical-align:top;"
 +
| Local scratch space per node||''not available''|| <tt>--tmp=''MB''</tt> &emsp; (can also be expressed in GB using "G" suffix)
 +
|- style="vertical-align:top;"
 +
| Run job under a specific shareholder group||<tt>-G ''shareholder_group''</tt>||<tt>-A ''shareholder_group''</tt> &emsp; or &emsp; <tt>--account=''shareholder_group''</tt>
 +
|- style="vertical-align:top;"
 +
| style="vertical-align:top;"|Notify user by email when job starts||<tt>-B</tt>||<tt>--mail-type=BEGIN</tt>
 +
|- style="vertical-align:top;"
 +
| |Notify user by email when job ends||<tt>-N</tt>||<tt>--mail-type=END,FAIL</tt><br>(multiple types can be combined in one option, e.g. <tt>--mail-type=BEGIN,END,FAIL</tt>)
 +
|}
 +
 
 +
===Shell script===
 +
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 +
! LSF !! Slurm
 +
|-
 +
| style="width:50%;" | <tt>bsub [options] < jobscript.sh</tt> || <tt>sbatch [options] < jobscript.sh</tt> &emsp; or &emsp; <br><tt>sbatch [options] jobscript.sh [arguments]</tt>
 +
|-
 +
|Job parameters can be passed as options to <tt>bsub</tt> or placed inside <tt>jobscript.sh</tt> using #BSUB pragmas:<br />
 
  #!/bin/bash
 
  #!/bin/bash
 
   
 
   
Line 14: Line 67:
 
  #BSUB -W 08:00
 
  #BSUB -W 08:00
 
  #BSUB -R "rusage[mem=2000]"
 
  #BSUB -R "rusage[mem=2000]"
 +
#BSUB -R "rusage[scratch=1000]"    # per core
 
  #BSUB -J analysis1
 
  #BSUB -J analysis1
 
  #BSUB -o analysis1.out
 
  #BSUB -o analysis1.out
 
  #BSUB -e analysis1.err
 
  #BSUB -e analysis1.err
 
   
 
   
  # load modules
+
  module load ''xyz/123''
  # run command
+
''command1''
| <tt>jobscript.sh:</tt><br />
+
''command2''
 +
  ...
 +
|Job parameters can be passed as options to <tt>sbatch</tt> or placed inside <tt>jobscript.sh</tt> using #SBATCH pragmas:<br />
 
  #!/bin/bash
 
  #!/bin/bash
 
   
 
   
Line 26: Line 82:
 
  #SBATCH --time=8:00
 
  #SBATCH --time=8:00
 
  #SBATCH --mem-per-cpu=2000
 
  #SBATCH --mem-per-cpu=2000
 +
#SBATCH --tmp=4000                        # per node!!
 
  #SBATCH --job-name=analysis1
 
  #SBATCH --job-name=analysis1
 
  #SBATCH --output=analysis1.out
 
  #SBATCH --output=analysis1.out
 
  #SBATCH --error=analysis1.err
 
  #SBATCH --error=analysis1.err
 
   
 
   
  # load modules
+
  module load ''xyz/123''
  # run command
+
  ''command1''
 +
''command2''
 +
...
 
|}
 
|}
  
==Interactive job==
+
Note:
 +
* In LSF, the <tt>jobscript.sh</tt> '''must''' be passed to <tt>bsub</tt> via the "<tt><</tt>" operator
 +
* In LSF, scratch space is expressed per '''core''', while in Slurm it is per '''node'''
 +
* In LSF, the default output file is "<tt>lsf.o''JOBID''</tt>", while in Slurm it is "<tt>slurm-''JOBID''.out</tt>"
 +
 
 +
===Interactive job===
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
! LSF !! Slurm
 
! LSF !! Slurm
Line 41: Line 105:
 
|}
 
|}
  
==Monitoring a job==
+
===Parallel job===
 +
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 +
! LSF !! Slurm
 +
|-
 +
| style="width:50%;" | <tt>bsub -n 256 -R "span[ptile=128]"</tt> || <tt>sbatch -n 256 --ntasks-per-node=128</tt> &emsp; or &emsp; <br><tt>sbatch -n 256 --nodes=2</tt>
 +
|}
 +
The Slurm options
 +
* <tt>--ntasks-per-core</tt>,
 +
* <tt>--cpus-per-task</tt>,
 +
* <tt>--nodes</tt>, and
 +
* <tt>--ntasks-per-node</tt>
 +
are supported.
 +
 
 +
===Job array===
 +
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 +
! LSF !! Slurm
 +
|-
 +
| style="width:50%;" | <tt>bsub -J ''jobname''[1-''N'']"</tt> || <tt>sbatch --array=1-''N''</tt>
 +
|-
 +
| style="width:50%;" | <tt>bsub -J ''jobname''[1-''N''%''step'']"</tt> || <tt>sbatch --array=1-''N'':''step''</tt>
 +
|- style="vertical-align:top;"
 +
| Environment variables defined in each job:
 +
* index of '''current''' sub-job: <tt>LSB_JOBINDEX</tt>
 +
* maximum index (''N''): <tt>LSB_JOBINDEX_END</tt>
 +
* step:<tt> LSB_JOBINDEX_STEP</tt>
 +
| Environment variables defined in each job:
 +
* number of sub-jobs: <tt>SLURM_ARRAY_TASK_COUNT</tt>
 +
* index of '''current''' sub-job: <tt>SLURM_ARRAY_TASK_ID</tt>
 +
* minimum index: <tt>SLURM_ARRAY_TASK_MIN</tt>
 +
* maximum index: <tt>SLURM_ARRAY_TASK_MAX</tt>
 +
|}
 +
 
 +
LSF example:
 +
 
 +
bsub -J "myarray[1-4]" 'echo "Hello, I am task $LSB_JOBINDEX of $LSB_JOBINDEX_END"'
 +
 
 +
Slurm example:
 +
 
 +
sbatch --array=1-4 --wrap='echo "Hello, I am task $SLURM_ARRAY_TASK_ID of $SLURM_ARRAY_TASK_COUNT"'
 +
 
 +
===GPU job===
 +
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 +
! LSF !! Slurm
 +
|-
 +
| style="width:50%;" | <tt>bsub -R "rusage[ngpus_excl_p=1]"</tt> || <tt>sbatch --gpus=1</tt>
 +
|}
 +
For multi-node jobs you need to use the <tt>--gpus-per-node</tt> option instead.
 +
 
 +
====GPU job requiring a specific GPU model====
 +
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 +
! LSF !! Slurm
 +
|-
 +
| style="width:50%;" | <tt>bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==NVIDIAGeForceGTX1080]"</tt> || <tt>sbatch --gpus=gtx_1080:1</tt>
 +
|-
 +
| style="width:50%;" | <tt>bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==NVIDIAGeForceRTX3090]"</tt> || <tt>sbatch --gpus=rtx_3090:1</tt>
 +
|}
 +
 
 +
* [[Change_of_GPU_specifiers_in_the_batch_system|GPU model strings for LSF]]
 +
 
 +
* For Slurm, currently the specifiers ''gtx_1080'' and ''rtx_3090'' are supported until we add more GPU types.
 +
 
 +
====GPU job requiring a given amount of GPU memory====
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
! LSF !! Slurm
 
! LSF !! Slurm
 
|-
 
|-
| style="width:50%;" | <tt>bjobs [JOBID]</tt> || <tt>squeue [-j JOBID]</tt>  
+
| style="width:50%;" | <tt>bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_mtotal0>=20480]"</tt> || <tt>sbatch --gpus=1 --gres=gpumem:20g</tt>  
 
|}
 
|}
  
==Killing a job==
+
The default unit for gpumem is '''bytes'''. You are therefore advised to specify units, for example <tt>20'''g'''</tt> or <tt>11000'''m'''</tt>.
 +
 
 +
===Submit a job using a specific share===
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
! LSF !! Slurm
 
! LSF !! Slurm
 
|-
 
|-
| style="width:50%;" | <tt>bkill [JOBID]</tt> || <tt>scancel [JOBID]</tt>  
+
| style="width:50%;" | <tt>bsub -G ''es_example''</tt> || <tt>sbatch -A ''es_example''</tt>
 
|}
 
|}
  
==Environment variables==
+
In Slurm, one can define a default share using the command: "<tt>echo account=''es_example'' >> $HOME/.slurm/defaults</tt>"
 +
 
 +
===Submit a job on a specific CPU model===
 +
 
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
! LSF !! Slurm  
+
! LSF !! Slurm
 
|-
 
|-
| style="width:50%;" | <tt>$LSB_JOBID</tt> || <tt>$SLURM_JOB_ID</tt>
+
| style="width:50%;" | <tt>bsub -R "select[model==EPYC_7H12]"</tt> || <tt>sbatch --constraint=EPYC_7H12</tt>
 +
|}
 +
 
 +
===Job chains===
 +
 
 +
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 +
! LSF !! Slurm
 
|-
 
|-
| <tt>$LSB_SUBCWD</tt> || <tt>$SLURM_SUBMIT_DIR</tt>
+
| style="width:50%;" | <tt>bsub -J ''job_chain''</tt><br><tt>bsub -J ''job_chain'' -w "done(''job_chain'')"</tt> || <tt>sbatch -J ''job_chain'' -d singleton</tt>
 
|}
 
|}
  
==Check resource usage of a job==
+
===Job dependencies===
 +
 
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
! LSF !! Slurm
 
! LSF !! Slurm
 
|-
 
|-
| style="width:50%;" | <tt>bbjobs [JOBID]</tt> || <tt>sacct -j JOBID</tt>  
+
| style="width:50%;" | Job #1: <tt>bsub -J ''job1'' ''command1''</tt><br>Job #2: <tt>bsub -J ''job2'' -w "done(''job1'')" ''command2''</tt> || Job #1: <tt>myjobid=$(sbatch --parsable -J ''job1'' --wrap="''command1''")</tt><br>Job #2: <tt>sbatch -J ''job2'' -d afterany:$myjobid --wrap="''command2''"</tt>
 
|}
 
|}
 +
In Slurm, <tt>sbatch --parsable</tt> returns the JOBID of the job
  
==Submit a GPU job==
+
==Job control==
 +
 
 +
===Job status===
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
! LSF !! Slurm
 
! LSF !! Slurm
 
|-
 
|-
| style="width:50%;" | <tt>bsub -R "rusage[ngpus_excl_p=1]"</tt> || <tt>sbatch --gpus 1 < jobscript.sh</tt>  
+
| style="width:50%;" | <tt>bjobs [JOBID]</tt> || <tt>squeue [-j JOBID]</tt>
 +
|-
 +
| style="width:50%;" | <tt>bjobs -p</tt> || <tt>squeue -u USERNAME -t PENDING</tt>
 +
|-
 +
| style="width:50%;" | <tt>bjobs -r</tt> || <tt>squeue -u USERNAME -t RUNNING</tt>
 
|}
 
|}
  
===Submit a job with a specific GPU model===
+
===Resource usage===
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
! LSF !! Slurm
 
! LSF !! Slurm
 
|-
 
|-
| style="width:50%;" | <tt>bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==GeForceGTX1080]"</tt> || <tt>sbatch --gpus gtx_1080:1 < jobscript.sh</tt>  
+
| style="width:50%;" | <tt>bbjobs [JOBID]</tt> || <tt>scontrol show jobid -dd JOBID</tt>
 +
|-
 +
| style="width:50%;" |  || <tt>sacct -l -j JOBID</tt> for finished jobs
 +
|-
 +
| style="width:50%;" |  || <tt>sstat [--all] JOBID</tt> for running jobs
 
|}
 
|}
  
===Submit a GPU job requiring a given amount of GPU memory===
+
Use <tt>--format JobID,User,State,AllocCPUS,Elapsed,NNodes,NTasks,TotalCPU,REQMEM,MaxRSS,ExitCode</tt> instead of <tt>-l</tt> for a customizable, more readable output.
 +
 
 +
===Killing a job===
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 
! LSF !! Slurm
 
! LSF !! Slurm
 
|-
 
|-
| style="width:50%;" | <tt>bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_mtotal0>=10240]"</tt> || <tt>sbatch --gpus 1 --gres gpumem:10g < jobscript.sh</tt>  
+
| style="width:50%;" | <tt>bkill [JOBID]</tt> || <tt>scancel [JOBID]</tt>
 +
|}
 +
 
 +
==Environment variables==
 +
{| class="wikitable" border="1" style="width:80%;text-align:left;"
 +
! LSF !! Slurm
 +
|-
 +
| style="width:50%;" | <tt>$LSB_JOBID</tt> || <tt>$SLURM_JOB_ID</tt>
 +
|-
 +
| <tt>$LSB_SUBCWD</tt> || <tt>$SLURM_SUBMIT_DIR</tt>
 
|}
 
|}

Latest revision as of 08:20, 19 October 2022

Introduction

The commands for Slurm are similar to the ones used in LSF. You can find a mapping of the relevant commands below.

Job submission

Simple command

LSF Slurm
bsub command sbatch --wrap=command
bsub "command1 ; command2" sbatch --wrap="command1 ; command2"
bsub "command1 | command2" sbatch --wrap="command1 | command2"
bsub [LSF options] command sbatch [slurm options] --wrap="command"

Frequently used bsub/sbatch options

Parameter bsub sbatch
Job name -J job_name -J job_name   or   --job-name=job_name
Job array consisting of N sub-jobs -J job_name[1-N]" -a 1-N   or   --array=1-N
Ouput file (stdout) -o file_name
(default: lsf.oJOBID)
-o file_name   or   --output=file_name
(default: slurm-JOBID.out)
Error file (stderr) -e file_name
(default: merged with output file)
-e file_name   or   --error=file_name
(default: merged with output file)
Wall-clock time (default: 4h) -W HH:MM -t DD-HH[:MM]   or   --time=MM   or   --time=HH:MM:SS
Number of cores (default: 1) -n cores -n cores   or   --ntasks=cores
Number of cores per node -R "span[ptile=cores_per_node]" --ntasks-per-node=cores_per_node
Memory per core (default: 1024 MB) -R "rusage[mem=MB]" --mem-per-cpu=MB   (can also be expressed in GB using "G" suffix)
Number of GPUs (default: 0) -R "rusage[ngpus_excl_p=N]" -G N   or   --gpus=N
Memory per GPU -R "select[gpu_mtotal0>=MB]" --gres=gpumem:MB   (can also be expressed in GB using "G" suffix)
Local scratch space per core -R "rusage[scratch=MB]" not available
Local scratch space per node not available --tmp=MB   (can also be expressed in GB using "G" suffix)
Run job under a specific shareholder group -G shareholder_group -A shareholder_group   or   --account=shareholder_group
Notify user by email when job starts -B --mail-type=BEGIN
Notify user by email when job ends -N --mail-type=END,FAIL
(multiple types can be combined in one option, e.g. --mail-type=BEGIN,END,FAIL)

Shell script

LSF Slurm
bsub [options] < jobscript.sh sbatch [options] < jobscript.sh   or  
sbatch [options] jobscript.sh [arguments]
Job parameters can be passed as options to bsub or placed inside jobscript.sh using #BSUB pragmas:
#!/bin/bash

#BSUB -n 4
#BSUB -W 08:00
#BSUB -R "rusage[mem=2000]"
#BSUB -R "rusage[scratch=1000]"    # per core
#BSUB -J analysis1
#BSUB -o analysis1.out
#BSUB -e analysis1.err

module load xyz/123
command1
command2
...
Job parameters can be passed as options to sbatch or placed inside jobscript.sh using #SBATCH pragmas:
#!/bin/bash

#SBATCH -n 4
#SBATCH --time=8:00
#SBATCH --mem-per-cpu=2000
#SBATCH --tmp=4000                        # per node!!
#SBATCH --job-name=analysis1
#SBATCH --output=analysis1.out
#SBATCH --error=analysis1.err

module load xyz/123
command1
command2
...

Note:

  • In LSF, the jobscript.sh must be passed to bsub via the "<" operator
  • In LSF, scratch space is expressed per core, while in Slurm it is per node
  • In LSF, the default output file is "lsf.oJOBID", while in Slurm it is "slurm-JOBID.out"

Interactive job

LSF Slurm
bsub -Is [LSF options] bash srun --pty bash

Parallel job

LSF Slurm
bsub -n 256 -R "span[ptile=128]" sbatch -n 256 --ntasks-per-node=128   or  
sbatch -n 256 --nodes=2

The Slurm options

  • --ntasks-per-core,
  • --cpus-per-task,
  • --nodes, and
  • --ntasks-per-node

are supported.

Job array

LSF Slurm
bsub -J jobname[1-N]" sbatch --array=1-N
bsub -J jobname[1-N%step]" sbatch --array=1-N:step
Environment variables defined in each job:
  • index of current sub-job: LSB_JOBINDEX
  • maximum index (N): LSB_JOBINDEX_END
  • step: LSB_JOBINDEX_STEP
Environment variables defined in each job:
  • number of sub-jobs: SLURM_ARRAY_TASK_COUNT
  • index of current sub-job: SLURM_ARRAY_TASK_ID
  • minimum index: SLURM_ARRAY_TASK_MIN
  • maximum index: SLURM_ARRAY_TASK_MAX

LSF example:

bsub -J "myarray[1-4]" 'echo "Hello, I am task $LSB_JOBINDEX of $LSB_JOBINDEX_END"'

Slurm example:

sbatch --array=1-4 --wrap='echo "Hello, I am task $SLURM_ARRAY_TASK_ID of $SLURM_ARRAY_TASK_COUNT"'

GPU job

LSF Slurm
bsub -R "rusage[ngpus_excl_p=1]" sbatch --gpus=1

For multi-node jobs you need to use the --gpus-per-node option instead.

GPU job requiring a specific GPU model

LSF Slurm
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==NVIDIAGeForceGTX1080]" sbatch --gpus=gtx_1080:1
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==NVIDIAGeForceRTX3090]" sbatch --gpus=rtx_3090:1
  • For Slurm, currently the specifiers gtx_1080 and rtx_3090 are supported until we add more GPU types.

GPU job requiring a given amount of GPU memory

LSF Slurm
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_mtotal0>=20480]" sbatch --gpus=1 --gres=gpumem:20g

The default unit for gpumem is bytes. You are therefore advised to specify units, for example 20g or 11000m.

Submit a job using a specific share

LSF Slurm
bsub -G es_example sbatch -A es_example

In Slurm, one can define a default share using the command: "echo account=es_example >> $HOME/.slurm/defaults"

Submit a job on a specific CPU model

LSF Slurm
bsub -R "select[model==EPYC_7H12]" sbatch --constraint=EPYC_7H12

Job chains

LSF Slurm
bsub -J job_chain
bsub -J job_chain -w "done(job_chain)"
sbatch -J job_chain -d singleton

Job dependencies

LSF Slurm
Job #1: bsub -J job1 command1
Job #2: bsub -J job2 -w "done(job1)" command2
Job #1: myjobid=$(sbatch --parsable -J job1 --wrap="command1")
Job #2: sbatch -J job2 -d afterany:$myjobid --wrap="command2"

In Slurm, sbatch --parsable returns the JOBID of the job

Job control

Job status

LSF Slurm
bjobs [JOBID] squeue [-j JOBID]
bjobs -p squeue -u USERNAME -t PENDING
bjobs -r squeue -u USERNAME -t RUNNING

Resource usage

LSF Slurm
bbjobs [JOBID] scontrol show jobid -dd JOBID
sacct -l -j JOBID for finished jobs
sstat [--all] JOBID for running jobs

Use --format JobID,User,State,AllocCPUS,Elapsed,NNodes,NTasks,TotalCPU,REQMEM,MaxRSS,ExitCode instead of -l for a customizable, more readable output.

Killing a job

LSF Slurm
bkill [JOBID] scancel [JOBID]

Environment variables

LSF Slurm
$LSB_JOBID $SLURM_JOB_ID
$LSB_SUBCWD $SLURM_SUBMIT_DIR