Difference between revisions of "LSF to Slurm quick reference"
(→Interactive job) |
Lhausammann (talk | contribs) (→Shell script) |
||
(93 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
==Introduction== | ==Introduction== | ||
− | |||
− | == | + | The commands for Slurm are similar to the ones used in LSF. You can find a mapping of the relevant commands below. |
− | {| class="wikitable" border="1" style="width: | + | |
+ | ==Job submission== | ||
+ | |||
+ | ===Simple command=== | ||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
+ | ! LSF !! Slurm | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bsub ''command''</tt> || <tt>sbatch --wrap=''command''</tt> | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bsub "''command1'' ; ''command2''"</tt> || <tt>sbatch --wrap="''command1'' ; ''command2''"</tt> | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bsub "''command1'' | ''command2''"</tt> || <tt>sbatch --wrap="''command1'' | ''command2''"</tt> | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bsub [LSF options] ''command''</tt> || <tt>sbatch [slurm options] --wrap="''command''"</tt> | ||
+ | |} | ||
+ | |||
+ | ===Frequently used <tt>bsub/sbatch</tt> options=== | ||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
+ | ! style="width:25%;"|Parameter | ||
+ | ! style="width:25%;"|<tt>bsub</tt> | ||
+ | ! style="width:50%;"|<tt>sbatch</tt> | ||
+ | |- style="vertical-align:top;" | ||
+ | | Job name||<tt>-J ''job_name''</tt>||<tt>-J ''job_name''</tt>   or   <tt>--job-name=''job_name''</tt> | ||
+ | |- style="vertical-align:top;" | ||
+ | | Job array consisting of ''N'' sub-jobs||<tt>-J ''job_name[1-''N'']"</tt>||<tt>-a 1-''N''</tt>   or   <tt>--array=1-''N''</tt> | ||
+ | |- style="vertical-align:top;" | ||
+ | |Ouput file (stdout)||<tt>-o ''file_name''</tt><br>(default: <tt>lsf.o''JOBID''</tt>)||<tt>-o ''file_name''</tt>   or   <tt>--output=''file_name''</tt><br>(default: <tt>slurm-''JOBID''.out</tt>) | ||
+ | |- style="vertical-align:top;" | ||
+ | |Error file (stderr)||<tt>-e ''file_name''</tt><br>(default: merged with output file)||<tt>-e ''file_name''</tt>   or   <tt>--error=''file_name''</tt><br>(default: merged with output file) | ||
+ | |- style="vertical-align:top;" | ||
+ | | Wall-clock time (default: 4h)||<tt>-W ''HH:MM''</tt>||<tt>-t ''DD-HH[:MM]''</tt>   or   <tt>--time=''MM''</tt>   or   <tt>--time=''HH:MM:SS''</tt> | ||
+ | |- style="vertical-align:top;" | ||
+ | | Number of cores (default: 1)||<tt>-n ''cores''</tt>||<tt>-n ''cores''</tt>   or   <tt>--ntasks=''cores''</tt> for MPI jobs <br />and <tt>--ntasks=1 --cpus-per-task=''cores'' for OpenMP jobs</tt> | ||
+ | |- style="vertical-align:top;" | ||
+ | | Number of cores per node||<tt>-R "span[ptile=''cores_per_node'']"</tt>||<tt>--ntasks-per-node=''cores_per_node''</tt> | ||
+ | |- style="vertical-align:top;" | ||
+ | | Memory per core (default: 1024 MB)||<tt>-R "rusage[mem=''MB'']"</tt>||<tt>--mem-per-cpu=''MB''</tt>   (can also be expressed in GB using "G" suffix) | ||
+ | |- style="vertical-align:top;" | ||
+ | | Number of GPUs (default: 0)||<tt>-R "rusage[ngpus_excl_p=''N'']"</tt>||<tt>-G ''N''</tt>   or   <tt>--gpus=''N''</tt> | ||
+ | |- style="vertical-align:top;" | ||
+ | | Memory per GPU||<tt>-R "select[gpu_mtotal0>=''MB'']"</tt>||<tt>--gres=gpumem:''MB''</tt>   (can also be expressed in GB using "G" suffix) | ||
+ | |- style="vertical-align:top;" | ||
+ | | Local scratch space per core||<tt>-R "rusage[scratch=''MB'']"</tt>||''not available'' | ||
+ | |- style="vertical-align:top;" | ||
+ | | Local scratch space per node||''not available''|| <tt>--tmp=''MB''</tt>   (can also be expressed in GB using "G" suffix) | ||
+ | |- style="vertical-align:top;" | ||
+ | | Run job under a specific shareholder group||<tt>-G ''shareholder_group''</tt>||<tt>-A ''shareholder_group''</tt>   or   <tt>--account=''shareholder_group''</tt> | ||
+ | |- style="vertical-align:top;" | ||
+ | | style="vertical-align:top;"|Notify user by email when job starts||<tt>-B</tt>||<tt>--mail-type=BEGIN</tt> | ||
+ | |- style="vertical-align:top;" | ||
+ | | |Notify user by email when job ends||<tt>-N</tt>||<tt>--mail-type=END,FAIL</tt><br>(multiple types can be combined in one option, e.g. <tt>--mail-type=BEGIN,END,FAIL</tt>) | ||
+ | |} | ||
+ | |||
+ | ===Shell script=== | ||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
! LSF !! Slurm | ! LSF !! Slurm | ||
|- | |- | ||
− | | style="width:50%;" | <tt>bsub < jobscript.sh</tt> || <tt>sbatch jobscript.sh</tt> | + | | style="width:50%;" | <tt>bsub [options] < jobscript.sh</tt> || <tt>sbatch [options] < jobscript.sh</tt>   or   <br><tt>sbatch [options] jobscript.sh [arguments]</tt> |
|- | |- | ||
− | | <tt>jobscript.sh | + | |Job parameters can be passed as options to <tt>bsub</tt> or placed inside <tt>jobscript.sh</tt> using #BSUB pragmas:<br /> |
#!/bin/bash | #!/bin/bash | ||
Line 14: | Line 67: | ||
#BSUB -W 08:00 | #BSUB -W 08:00 | ||
#BSUB -R "rusage[mem=2000]" | #BSUB -R "rusage[mem=2000]" | ||
+ | #BSUB -R "rusage[scratch=1000]" # per core | ||
#BSUB -J analysis1 | #BSUB -J analysis1 | ||
#BSUB -o analysis1.out | #BSUB -o analysis1.out | ||
#BSUB -e analysis1.err | #BSUB -e analysis1.err | ||
− | + | module load ''xyz/123'' | |
− | + | ''command1'' | |
− | | <tt>jobscript.sh | + | ''command2'' |
+ | ... | ||
+ | |Job parameters can be passed as options to <tt>sbatch</tt> or placed inside <tt>jobscript.sh</tt> using #SBATCH pragmas:<br /> | ||
#!/bin/bash | #!/bin/bash | ||
#SBATCH -n 4 | #SBATCH -n 4 | ||
− | #SBATCH --time=8:00 | + | #SBATCH --time=8:00:00 |
− | #SBATCH --mem-per-cpu= | + | #SBATCH --mem-per-cpu=2000 |
+ | #SBATCH --tmp=4000 # per node!! | ||
#SBATCH --job-name=analysis1 | #SBATCH --job-name=analysis1 | ||
#SBATCH --output=analysis1.out | #SBATCH --output=analysis1.out | ||
#SBATCH --error=analysis1.err | #SBATCH --error=analysis1.err | ||
− | + | module load ''xyz/123'' | |
− | + | ''command1'' | |
+ | ''command2'' | ||
+ | ... | ||
|} | |} | ||
− | ==Interactive job== | + | Note: |
− | {| class="wikitable" border="1" style="width: | + | * In LSF, the <tt>jobscript.sh</tt> '''must''' be passed to <tt>bsub</tt> via the "<tt><</tt>" operator |
+ | * In LSF, scratch space is expressed per '''core''', while in Slurm it is per '''node''' | ||
+ | * In LSF, the default output file is "<tt>lsf.o''JOBID''</tt>", while in Slurm it is "<tt>slurm-''JOBID''.out</tt>" | ||
+ | |||
+ | ===Interactive job=== | ||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
! LSF !! Slurm | ! LSF !! Slurm | ||
|- | |- | ||
Line 41: | Line 105: | ||
|} | |} | ||
− | == | + | ===Parallel job=== |
− | {| class="wikitable" border="1" style="width:50%;text-align:left;" | + | ====Shared memory (OpenMP, threads)==== |
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
+ | ! LSF !! Slurm | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bsub -n 128 -R "span[ptile=128]"</tt> || <tt>sbatch -n 1 --cpus-per-task=128</tt> | ||
+ | |} | ||
+ | ====Distributed memory (MPI, processes)==== | ||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
! LSF !! Slurm | ! LSF !! Slurm | ||
|- | |- | ||
− | | style="width:50%;" | <tt> | + | | style="width:50%;" | <tt>bsub -n 256 -R "span[ptile=128]"</tt> || <tt>sbatch -n 256 --ntasks-per-node=128</tt>   or   <br><tt>sbatch -n 256 --nodes=2</tt> |
|} | |} | ||
+ | The Slurm options | ||
+ | * <tt>--ntasks-per-core</tt>, | ||
+ | * <tt>--cpus-per-task</tt>, | ||
+ | * <tt>--nodes</tt>, and | ||
+ | * <tt>--ntasks-per-node</tt> | ||
+ | are supported. | ||
+ | |||
+ | Please note that for larger parallel MPI jobs that use more than a single node (more than 128 cores), you should add the sbatch option | ||
+ | |||
+ | -C ib | ||
+ | |||
+ | to make sure that they get dispatched to nodes that have the infiniband highspeed interconnect, as this will result a much better performance. | ||
+ | |||
+ | ===Job array=== | ||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
+ | ! LSF !! Slurm | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bsub -J ''jobname''[1-''N'']"</tt> || <tt>sbatch --array=1-''N''</tt> | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bsub -J ''jobname''[1-''N''%''step'']"</tt> || <tt>sbatch --array=1-''N'':''step''</tt> | ||
+ | |- style="vertical-align:top;" | ||
+ | | Environment variables defined in each job: | ||
+ | * index of '''current''' sub-job: <tt>LSB_JOBINDEX</tt> | ||
+ | * maximum index (''N''): <tt>LSB_JOBINDEX_END</tt> | ||
+ | * step:<tt> LSB_JOBINDEX_STEP</tt> | ||
+ | | Environment variables defined in each job: | ||
+ | * number of sub-jobs: <tt>SLURM_ARRAY_TASK_COUNT</tt> | ||
+ | * index of '''current''' sub-job: <tt>SLURM_ARRAY_TASK_ID</tt> | ||
+ | * minimum index: <tt>SLURM_ARRAY_TASK_MIN</tt> | ||
+ | * maximum index: <tt>SLURM_ARRAY_TASK_MAX</tt> | ||
+ | |} | ||
+ | |||
+ | LSF example: | ||
+ | |||
+ | bsub -J "myarray[1-4]" 'echo "Hello, I am task $LSB_JOBINDEX of $LSB_JOBINDEX_END"' | ||
+ | |||
+ | Slurm example: | ||
+ | |||
+ | sbatch --array=1-4 --wrap='echo "Hello, I am task $SLURM_ARRAY_TASK_ID of $SLURM_ARRAY_TASK_COUNT"' | ||
+ | |||
+ | ===GPU job=== | ||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
+ | ! LSF !! Slurm | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bsub -R "rusage[ngpus_excl_p=1]"</tt> || <tt>sbatch --gpus=1</tt> | ||
+ | |} | ||
+ | For multi-node jobs you need to use the <tt>--gpus-per-node</tt> option instead. | ||
+ | |||
+ | ====GPU job requiring a specific GPU model==== | ||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
+ | ! LSF !! Slurm | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==NVIDIAGeForceGTX1080]"</tt> || <tt>sbatch --gpus=gtx_1080:1</tt> | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==NVIDIAGeForceRTX3090]"</tt> || <tt>sbatch --gpus=rtx_3090:1</tt> | ||
+ | |} | ||
+ | |||
+ | * [[Change_of_GPU_specifiers_in_the_batch_system|GPU model strings for LSF]] | ||
+ | |||
+ | * For Slurm, currently the specifiers ''gtx_1080'' and ''rtx_3090'' are supported until we add more GPU types. | ||
+ | |||
+ | ====GPU job requiring a given amount of GPU memory==== | ||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
+ | ! LSF !! Slurm | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_mtotal0>=20480]"</tt> || <tt>sbatch --gpus=1 --gres=gpumem:20g</tt> | ||
+ | |} | ||
+ | |||
+ | The default unit for gpumem is '''bytes'''. You are therefore advised to specify units, for example <tt>20'''g'''</tt> or <tt>11000'''m'''</tt>. | ||
+ | |||
+ | ===Submit a job using a specific share=== | ||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
+ | ! LSF !! Slurm | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bsub -G ''es_example''</tt> || <tt>sbatch -A ''es_example''</tt> | ||
+ | |} | ||
+ | |||
+ | In Slurm, one can define a default share using the command: "<tt>echo account=''es_example'' >> $HOME/.slurm/defaults</tt>" | ||
+ | |||
+ | ===Submit a job on a specific CPU model=== | ||
+ | |||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
+ | ! LSF !! Slurm | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bsub -R "select[model==EPYC_7H12]"</tt> || <tt>sbatch --constraint=EPYC_7H12</tt> | ||
+ | |} | ||
+ | |||
+ | ===Job chains=== | ||
+ | |||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
+ | ! LSF !! Slurm | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bsub -J ''job_chain''</tt><br><tt>bsub -J ''job_chain'' -w "done(''job_chain'')"</tt> || <tt>sbatch -J ''job_chain'' -d singleton</tt> | ||
+ | |} | ||
+ | |||
+ | ===Job dependencies=== | ||
+ | |||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
+ | ! LSF !! Slurm | ||
+ | |- | ||
+ | | style="width:50%;" | Job #1: <tt>bsub -J ''job1'' ''command1''</tt><br>Job #2: <tt>bsub -J ''job2'' -w "done(''job1'')" ''command2''</tt> || Job #1: <tt>myjobid=$(sbatch --parsable -J ''job1'' --wrap="''command1''")</tt><br>Job #2: <tt>sbatch -J ''job2'' -d afterany:$myjobid --wrap="''command2''"</tt> | ||
+ | |} | ||
+ | In Slurm, <tt>sbatch --parsable</tt> returns the JOBID of the job | ||
+ | |||
+ | ==Job control== | ||
+ | |||
+ | ===Job status=== | ||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
+ | ! LSF !! Slurm | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bjobs [JOBID]</tt> || <tt>squeue [-j JOBID]</tt> | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bjobs -p</tt> || <tt>squeue -u USERNAME -t PENDING</tt> | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bjobs -r</tt> || <tt>squeue -u USERNAME -t RUNNING</tt> | ||
+ | |} | ||
+ | |||
+ | ===Resource usage=== | ||
+ | {| class="wikitable" border="1" style="width:80%;text-align:left;" | ||
+ | ! LSF !! Slurm | ||
+ | |- | ||
+ | | style="width:50%;" | <tt>bbjobs [JOBID]</tt> || <tt>myjobs -j JOBID</tt> | ||
+ | |- | ||
+ | | style="width:50%;" | || <tt>scontrol show jobid -dd JOBID</tt> | ||
+ | |- | ||
+ | | style="width:50%;" | || <tt>sacct -l -j JOBID</tt> for finished jobs | ||
+ | |- | ||
+ | | style="width:50%;" | || <tt>sstat [--all] JOBID</tt> for running jobs | ||
+ | |} | ||
+ | |||
+ | Use <tt>--format JobID,User,State,AllocCPUS,Elapsed,NNodes,NTasks,TotalCPU,REQMEM,MaxRSS,ExitCode</tt> instead of <tt>-l</tt> for a customizable, more readable output. | ||
− | ==Killing a job== | + | ===Killing a job=== |
− | {| class="wikitable" border="1" style="width: | + | {| class="wikitable" border="1" style="width:80%;text-align:left;" |
! LSF !! Slurm | ! LSF !! Slurm | ||
|- | |- | ||
Line 56: | Line 258: | ||
==Environment variables== | ==Environment variables== | ||
− | {| class="wikitable" border="1" style="width: | + | {| class="wikitable" border="1" style="width:80%;text-align:left;" |
! LSF !! Slurm | ! LSF !! Slurm | ||
|- | |- | ||
− | | style="width:50%;" | <tt>$LSB_JOBID</tt> || <tt>$ | + | | style="width:50%;" | <tt>$LSB_JOBID</tt> || <tt>$SLURM_JOB_ID</tt> |
|- | |- | ||
| <tt>$LSB_SUBCWD</tt> || <tt>$SLURM_SUBMIT_DIR</tt> | | <tt>$LSB_SUBCWD</tt> || <tt>$SLURM_SUBMIT_DIR</tt> | ||
|} | |} |
Latest revision as of 16:55, 23 February 2023
Contents
- 1 Introduction
- 2 Job submission
- 3 Job control
- 4 Environment variables
Introduction
The commands for Slurm are similar to the ones used in LSF. You can find a mapping of the relevant commands below.
Job submission
Simple command
LSF | Slurm |
---|---|
bsub command | sbatch --wrap=command |
bsub "command1 ; command2" | sbatch --wrap="command1 ; command2" |
bsub "command1 | command2" | sbatch --wrap="command1 | command2" |
bsub [LSF options] command | sbatch [slurm options] --wrap="command" |
Frequently used bsub/sbatch options
Parameter | bsub | sbatch |
---|---|---|
Job name | -J job_name | -J job_name or --job-name=job_name |
Job array consisting of N sub-jobs | -J job_name[1-N]" | -a 1-N or --array=1-N |
Ouput file (stdout) | -o file_name (default: lsf.oJOBID) |
-o file_name or --output=file_name (default: slurm-JOBID.out) |
Error file (stderr) | -e file_name (default: merged with output file) |
-e file_name or --error=file_name (default: merged with output file) |
Wall-clock time (default: 4h) | -W HH:MM | -t DD-HH[:MM] or --time=MM or --time=HH:MM:SS |
Number of cores (default: 1) | -n cores | -n cores or --ntasks=cores for MPI jobs and --ntasks=1 --cpus-per-task=cores for OpenMP jobs |
Number of cores per node | -R "span[ptile=cores_per_node]" | --ntasks-per-node=cores_per_node |
Memory per core (default: 1024 MB) | -R "rusage[mem=MB]" | --mem-per-cpu=MB (can also be expressed in GB using "G" suffix) |
Number of GPUs (default: 0) | -R "rusage[ngpus_excl_p=N]" | -G N or --gpus=N |
Memory per GPU | -R "select[gpu_mtotal0>=MB]" | --gres=gpumem:MB (can also be expressed in GB using "G" suffix) |
Local scratch space per core | -R "rusage[scratch=MB]" | not available |
Local scratch space per node | not available | --tmp=MB (can also be expressed in GB using "G" suffix) |
Run job under a specific shareholder group | -G shareholder_group | -A shareholder_group or --account=shareholder_group |
Notify user by email when job starts | -B | --mail-type=BEGIN |
Notify user by email when job ends | -N | --mail-type=END,FAIL (multiple types can be combined in one option, e.g. --mail-type=BEGIN,END,FAIL) |
Shell script
LSF | Slurm |
---|---|
bsub [options] < jobscript.sh | sbatch [options] < jobscript.sh or sbatch [options] jobscript.sh [arguments] |
Job parameters can be passed as options to bsub or placed inside jobscript.sh using #BSUB pragmas:#!/bin/bash #BSUB -n 4 #BSUB -W 08:00 #BSUB -R "rusage[mem=2000]" #BSUB -R "rusage[scratch=1000]" # per core #BSUB -J analysis1 #BSUB -o analysis1.out #BSUB -e analysis1.err module load xyz/123 command1 command2 ... |
Job parameters can be passed as options to sbatch or placed inside jobscript.sh using #SBATCH pragmas:#!/bin/bash #SBATCH -n 4 #SBATCH --time=8:00:00 #SBATCH --mem-per-cpu=2000 #SBATCH --tmp=4000 # per node!! #SBATCH --job-name=analysis1 #SBATCH --output=analysis1.out #SBATCH --error=analysis1.err module load xyz/123 command1 command2 ... |
Note:
- In LSF, the jobscript.sh must be passed to bsub via the "<" operator
- In LSF, scratch space is expressed per core, while in Slurm it is per node
- In LSF, the default output file is "lsf.oJOBID", while in Slurm it is "slurm-JOBID.out"
Interactive job
LSF | Slurm |
---|---|
bsub -Is [LSF options] bash | srun --pty bash |
Parallel job
LSF | Slurm |
---|---|
bsub -n 128 -R "span[ptile=128]" | sbatch -n 1 --cpus-per-task=128 |
Distributed memory (MPI, processes)
LSF | Slurm |
---|---|
bsub -n 256 -R "span[ptile=128]" | sbatch -n 256 --ntasks-per-node=128 or sbatch -n 256 --nodes=2 |
The Slurm options
- --ntasks-per-core,
- --cpus-per-task,
- --nodes, and
- --ntasks-per-node
are supported.
Please note that for larger parallel MPI jobs that use more than a single node (more than 128 cores), you should add the sbatch option
-C ib
to make sure that they get dispatched to nodes that have the infiniband highspeed interconnect, as this will result a much better performance.
Job array
LSF | Slurm |
---|---|
bsub -J jobname[1-N]" | sbatch --array=1-N |
bsub -J jobname[1-N%step]" | sbatch --array=1-N:step |
Environment variables defined in each job:
|
Environment variables defined in each job:
|
LSF example:
bsub -J "myarray[1-4]" 'echo "Hello, I am task $LSB_JOBINDEX of $LSB_JOBINDEX_END"'
Slurm example:
sbatch --array=1-4 --wrap='echo "Hello, I am task $SLURM_ARRAY_TASK_ID of $SLURM_ARRAY_TASK_COUNT"'
GPU job
LSF | Slurm |
---|---|
bsub -R "rusage[ngpus_excl_p=1]" | sbatch --gpus=1 |
For multi-node jobs you need to use the --gpus-per-node option instead.
GPU job requiring a specific GPU model
LSF | Slurm |
---|---|
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==NVIDIAGeForceGTX1080]" | sbatch --gpus=gtx_1080:1 |
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==NVIDIAGeForceRTX3090]" | sbatch --gpus=rtx_3090:1 |
- For Slurm, currently the specifiers gtx_1080 and rtx_3090 are supported until we add more GPU types.
GPU job requiring a given amount of GPU memory
LSF | Slurm |
---|---|
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_mtotal0>=20480]" | sbatch --gpus=1 --gres=gpumem:20g |
The default unit for gpumem is bytes. You are therefore advised to specify units, for example 20g or 11000m.
LSF | Slurm |
---|---|
bsub -G es_example | sbatch -A es_example |
In Slurm, one can define a default share using the command: "echo account=es_example >> $HOME/.slurm/defaults"
Submit a job on a specific CPU model
LSF | Slurm |
---|---|
bsub -R "select[model==EPYC_7H12]" | sbatch --constraint=EPYC_7H12 |
Job chains
LSF | Slurm |
---|---|
bsub -J job_chain bsub -J job_chain -w "done(job_chain)" |
sbatch -J job_chain -d singleton |
Job dependencies
LSF | Slurm |
---|---|
Job #1: bsub -J job1 command1 Job #2: bsub -J job2 -w "done(job1)" command2 |
Job #1: myjobid=$(sbatch --parsable -J job1 --wrap="command1") Job #2: sbatch -J job2 -d afterany:$myjobid --wrap="command2" |
In Slurm, sbatch --parsable returns the JOBID of the job
Job control
Job status
LSF | Slurm |
---|---|
bjobs [JOBID] | squeue [-j JOBID] |
bjobs -p | squeue -u USERNAME -t PENDING |
bjobs -r | squeue -u USERNAME -t RUNNING |
Resource usage
LSF | Slurm |
---|---|
bbjobs [JOBID] | myjobs -j JOBID |
scontrol show jobid -dd JOBID | |
sacct -l -j JOBID for finished jobs | |
sstat [--all] JOBID for running jobs |
Use --format JobID,User,State,AllocCPUS,Elapsed,NNodes,NTasks,TotalCPU,REQMEM,MaxRSS,ExitCode instead of -l for a customizable, more readable output.
Killing a job
LSF | Slurm |
---|---|
bkill [JOBID] | scancel [JOBID] |
Environment variables
LSF | Slurm |
---|---|
$LSB_JOBID | $SLURM_JOB_ID |
$LSB_SUBCWD | $SLURM_SUBMIT_DIR |