Job submission with SLURM
The three ways to access computing resources are
- Submit a job through command line
- Submit a job through a job bash script using #SBATCH pragmas
- Get an interactive session on a compute node
Basic job submission
A basic job submission command consists of three parts:
sbatch | SLURM options | Job |
where
sbatch | is the SLURM submit command. |
SLURM options | are for requesting resources and defining job-related options. |
Job | is a computing job to be submitted. |
Here is an example:
sbatch | -n 1 --time=04:00:00 --mem-per-cpu=4096 | --wrap="python myscript.py" |
When the job is submitted, LSF shows job's information:
$ sbatch -n 1 --time=04:00:00 --mem-per-cpu=4096 --wrap="python myscript.py" Submitted batch job 3582287
The output includes a unique JobID, e.g., 3582287.
Note: The JobID is important for monitoring or reporting issues. It is also integrated in the name of the SLURM output file. Please don't delete the SLURM output file unless you are sure that the job was running fine.
Job |
A job can be one of the following:
Job | Command | Examples of job submission command |
---|---|---|
a single Linux command | cmd | sbatch --wrap="ls" |
a command or program with its arguments | cmd arg1 arg2 | sbatch --wrap="echo hello" |
multiple commands | "cmd1 ; cmd2" | sbatch --wrap="date; pwd; ls -l" |
piped command | "cmd1 | cmd2" | wc -l" |
a command with I/O redirection, quote | "cmd<in >out" | sbatch --wrap="du -sk /scratch > du.out" |
a here document, passed via "<<" | << EOF ... EOF | |
a shell script, passed via "<" | < script | sbatch < hello.sh |
SLURM options |
Requesting resources
Resources | Format | Default values |
---|---|---|
Maximum run time | --time HH:MM:SS | 04:00:00 (4h), max. 360h |
Number of processors | -n nprocs | 1 processor |
Memory | --mem-per-cpu=1024 | 1024 MB per core |
Local scratch space | --tmp=1024 | 1024 MB |
Note: Please note that users cannot request the full memory of a node, as some of the memory is reserved for the operating system of the compute nodes that runs in memory. Therefore if a user for instance requests 256 GiB of memory, then job will not be dispatched to a node with 256 GiB of memory, but on a node with 512 GiB memory or more. As a general rule, jobs that request ~3% less memory than a node has can run on that node type. For instance, on a node with 256 GiB of memory, you can request up to 256*0.97 GiB = 248.32 GiB.
Note: Unlike memory, the batch system does not reserve any disk space for this scratch directory by default. If your job is expected to write large amounts of temporary data (say, more than 250 MB) into $TMPDIR — or anywhere in the local /scratch file system — you must request enough scratch space
Other SLURM options
-o outfile | append job’s standard output to outfile |
-e errfile | append job’s error messages to errfile |
-J jobname | assign a jobname to the job |
--mail-type=BEGIN,END,FAIL | send an email when the job begins/ends/fails |
SLURM-LSF submission line advisor can assist your to find SLURM options you need.
Job script and #SBATCH pragmas
Create a job script called job_script.sbatch
#!/bin/bash #SBATCH -n 24 # 24 cores #SBATCH --time 08:00:00 # 8-hour run-time #SBATCH --mem-per-cpu=4000 # 4000 MB per core #SBATCH -J analysis1 #SBATCH -o analysis1.out #SBATCH -e analysis1.err #SBATCH --mail-type=END,FAIL source /cluster/apps/local/env2lmod.sh # Switch to the new software stack module load gcc/6.3.0 openmpi/4.0.2 # Load modules cd /path/to/execution/folder # Change directory mpirun myprogram arg1 # Execute the program
Submit a job
$ sbatch < job_script.bsub
The options specified on the command line take precedence over the options in the job script.
$ sbatch -n 36 < job_script.bsub
Interactive session on a compute node
To run a quick test or a benchmark, you can request an interactive session on a compute node by using
srun --pty
For example:
[nmarounina@eu-login-43 ~]$ srun -n 4 --time=01:00:00 --pty bash srun: job 3634650 queued and waiting for resources srun: job 3634650 has been allocated resources [nmarounina@eu-a2p-074 ~]$
Further reading
- User guide: Using the batch system
- Job arrays
- Job chaining
- X11 forwarding batch interactive jobs
- Multiple shareholder groups
- SLURM mini reference
Helper