Slurm
The Slurm Workload Manager is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.
Documentation
Slurm's own documentation: https://slurm.schedmd.com/documentation.html
Usage
For clients, the main commands are
- srun (Get a job allocation and execute an application)
- sbatch (Submit a batch script to Slurm)
- squeue (View the job queue)
- scancel (Remove a job from the queue)
additionally there's
- salloc (Get a job allocation)
- sacct (View accounting data)
- sbcast (Broadcast file to a job's compute nodes)
- sinfo (View nodes and partitions)
- sstat (Display the status information of a running job/step)
Slurm queues batch jobs, allocates resources, and dispatches jobs to the allocated resources. (The following text is mainly taken from https://slurm.schedmd.com/documentation.html)
sbatch submits a batch script to Slurm. The batch script may be given to sbatch through a file name on the command line
sbatch batch_script.txt
or if no file name is specified, sbatch will read in a script from standard input
sbatch < batch_script_generator.py
The batch script may contain options preceded with "#SBATCH" before any executable commands in the script.
#!/bin/bash #SBATCH --time=1 module load eth_proxy my_application
sbatch will stop processing further #SBATCH directives once the first non-comment non-whitespace line has been reached in the script.
sbatch takes command line arguments, that have precedence over the values specified in the script.
Commonly used arguments are:
- -n, --ntasks=<number>: sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of 'number' tasks and to provide for sufficient resources.
- -c, --cpus-per-task=<ncpus>: Specify the number of CPUs assigned to each task.
- --mem-per-cpu=<size>[units]: Minimum memory required per usable allocated CPU. Default units are megabytes.
- -N, --nodes=<minnodes> Request that a minimum of 'minnodes' nodes be allocated to this job.
- --gpus-per-task=<number>: Specify the number of GPUs required for the job on each task.
- --wrap=<command_string>: Sbatch will wrap the specified command string in a simple "sh" shell script, and submit that script to the slurm controller.
Slurm will assign an id to the submitted script and put it in the queue until resources are available. The queue can be observed with squeue.
Once sufficient resources become available, slurm will allocate the resources and execute the script.
Jobs, running and pending, can be cancelled with scancel.
To run parallel jobs, use srun. If necessary, srun will first create a resource allocation in which to run the parallel job. Srun blocks, while sbatch dispatches.
To run an application in 5 parallel tasks, with 8 CPUs per task you can do several things that are equivalent.
srun --ntasks=5 --cpus-per-task=8 nproc srun -n 5 -c 8 nproc sbatch -n 5 -c 8 --wrap="srun nproc" sbatch parallel_nproc.sbatch
with parallel_nproc.sbatch containing
#!/bin/sh #SBATCH --ntasks=5 #SBATCH --cpus-per-task=8 srun nproc
all four commands will allocate resources to match 5 tasks with 8 CPUs each, and then execute 'nproc' in each task, thus printing the number "8" 5 times.
Without "srun"
sbatch -n 5 -c 8 --wrap="nproc"
sbatch will allocate resources to match "-n 5 -c 8" but only execute "nproc" in one task, thus printing the number "8" once.
The current versions of Slurm and OpenMPI support task launch using the srun command.
Big picture
To use hardware resources efficiently, request what your job actually needs and don't over specify it.
Generally speaking, the smaller and shorter a job is, the easier it is for slurm to find an open timeslot timely.
Custom extensions
For Euler we developed some wrappers around Slurm commands for simpler access
Program | Explanation |
---|---|
myjobs | Displays more human-friendly information than squeue |
my_share_info | Shows information about your share(s) of the cluster |
get_inefficient_jobs | Displays inefficient jobs |
There's also a web interface at https://slurm-jobs-webgui.euler.hpc.ethz.ch/