Slurm

From ScientificComputing
Jump to: navigation, search

The Slurm Workload Manager is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

Documentation

Slurm's own documentation: https://slurm.schedmd.com/documentation.html

Usage

For clients, the main commands are

  • srun (Get a job allocation and execute an application)
  • sbatch (Submit a batch script to Slurm)
  • squeue (View the job queue)
  • scancel (Remove a job from the queue)

additionally there's

  • salloc (Get a job allocation)
  • sacct (View accounting data)
  • sbcast (Broadcast file to a job's compute nodes)
  • sinfo (View nodes and partitions)
  • sstat (Display the status information of a running job/step)

Slurm queues batch jobs, allocates resources, and dispatches jobs to the allocated resources. (The following text is mainly taken from https://slurm.schedmd.com/documentation.html)
sbatch submits a batch script to Slurm. The batch script may be given to sbatch through a file name on the command line

sbatch batch_script.txt

or if no file name is specified, sbatch will read in a script from standard input

sbatch < batch_script_generator.py

The batch script may contain options preceded with "#SBATCH" before any executable commands in the script.

#!/bin/bash
#SBATCH --time=1
module load eth_proxy
my_application

sbatch will stop processing further #SBATCH directives once the first non-comment non-whitespace line has been reached in the script.
sbatch takes command line arguments, that have precedence over the values specified in the script.
Commonly used arguments are:

  • -n, --ntasks=<number>: sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of 'number' tasks and to provide for sufficient resources.
  • -c, --cpus-per-task=<ncpus>: Specify the number of CPUs assigned to each task.
  • --mem-per-cpu=<size>[units]: Minimum memory required per usable allocated CPU. Default units are megabytes.
  • -N, --nodes=<minnodes> Request that a minimum of 'minnodes' nodes be allocated to this job.
  • --gpus-per-task=<number>: Specify the number of GPUs required for the job on each task.
  • --wrap=<command_string>: Sbatch will wrap the specified command string in a simple "sh" shell script, and submit that script to the slurm controller.

Slurm will assign an id to the submitted script and put it in the queue until resources are available. The queue can be observed with squeue.
Once sufficient resources become available, slurm will allocate the resources and execute the script.
Jobs, running and pending, can be cancelled with scancel.
To run parallel jobs, use srun. If necessary, srun will first create a resource allocation in which to run the parallel job. Srun blocks, while sbatch dispatches.
To run an application in 5 parallel tasks, with 8 CPUs per task you can do several things that are equivalent.

srun --ntasks=5 --cpus-per-task=8 nproc
srun -n 5 -c 8 nproc
sbatch -n 5 -c 8 --wrap="srun nproc"
sbatch parallel_nproc.sbatch

with parallel_nproc.sbatch containing

#!/bin/sh
#SBATCH --ntasks=5
#SBATCH --cpus-per-task=8
srun nproc

all four commands will allocate resources to match 5 tasks with 8 CPUs each, and then execute 'nproc' in each task, thus printing the number "8" 5 times.
Without "srun"

sbatch -n 5 -c 8 --wrap="nproc"

sbatch will allocate resources to match "-n 5 -c 8" but only execute "nproc" in one task, thus printing the number "8" once.
The current versions of Slurm and OpenMPI support task launch using the srun command.

Big picture

To use hardware resources efficiently, request what your job actually needs and don't over specify it.
Generally speaking, the smaller and shorter a job is, the easier it is for slurm to find an open timeslot timely.

Custom extensions

For Euler we developed some wrappers around Slurm commands for simpler access

Program Explanation
myjobs Displays more human-friendly information than squeue
my_share_info Shows information about your share(s) of the cluster
get_inefficient_jobs Displays inefficient jobs

There's also a web interface at https://slurm-jobs-webgui.euler.hpc.ethz.ch/