Getting started with clusters

From ScientificComputing
Revision as of 13:48, 30 August 2016 by Sfux (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Requesting an account

Brutus

Brutus (2007-2016) is no longer in operation.

Euler

Everybody at ETH Zurich can use the Euler cluster. The first login of a new user triggers a process that sends a verification code to the users ETH email address (USERNAME@ethz.ch, with USERNAME being the ETH account name). The user is then prompted to enter the verification code and by entering the correct code, the cluster account of the user is created.

Leonhard

Leonhard Open (2017-2021) has been integrated in the Euler cluster.

Access to Leonhard Med 1.0 is restricted to Leonhard Med 1.0 shareholders. Guest users cannot access the Leonhard cluster.

MATLAB Distributed Computing Server (MDCS)

Any member of ETH can use the MATLAB Distributed Computing Server (MDCS) service; the only requirement is a valid ETH account. In order to use this service, you first need to login to the Euler cluster once and accept the usage agreement.

Please note that the MDCS will be phased out end of 2022 due to transitioning the batch system from IBM LSF to Slurm

CLC Genomics Server

The CLC genomics server uses local accounts for authentication. If you would like to use this service, then please contact cluster support to request your CLC account.

Please note that the CLC Genomics Server will be phased out end of 2022 due to transitioning the batch system from IBM LSF to Slurm

Accessing the clusters

Who can access the HPC clusters

The Euler cluster is open to all members of ETH and external users that have a collaboration with a research group at ETH Zurich. Members of other institutes who have a collaboration with a research group at ETH may use the HPC clusters for the purpose of said collaboration. Their counterpart ("sponsor") at ETH must ask the local IT support group (ISG) of the corresponding department to create an ETH guest account for them. The account needs to be linked to a valid e-mail address. For external users, the VPN service also needs to be enabled. Once the ETH guest account has been created, they can access the clusters like members of the ETH.

Legal compliance

The HPC clusters of ID SIS HPC are subject to ETH's acceptable use policy for IT resources (Benutzungsordnung für Telematik an der ETH Zürich, BOT). In particular:

  • Accounts are strictly personal.
  • You must not share your account (password, ssh keys) wih anyone else.
  • You must not use someone else's account, with our without their consent.
  • If you suspect that someone used your account, change your password and contact cluster support.

For changing your ETH password in the Identity and Access Management (IAM) system of ETH, please have a look at the documentation and the video of IT Services.

In case of abuse, the offender's account may be blocked temporarily or closed. System administrators are obliged by law to investigate abusive or illegal activities and report them to the relevant authorities.

Internet Security

Access to the HPC clusters of ID SIS HPC is only possible via secure protocols ( ssh, sftp, scp, rsync). The HPC clusters are only accessible from inside the ETH network. If you would like to connect from a computer, which is not inside the ETH network, then you would need to establish a VPN connection first. Outgoing connections to computers inside the ETH network are not blocked. If you would like to connect to an external service, then please use the ETH proxy service (http://proxy.service.consul:3128) by loading the eth_proxy module:

module load eth_proxy

WARNING: the ETH proxy service can only handle a small number of requests at the same time and its bandwidth is limited. It should therefore be used with moderation. If you have jobs that rely on the ETH proxy to access some external server, please only execute a small number of such jobs at the same time. If you fail to follow this rule, the proxy may completely block access to that external server! This block will affect not only you, but also all users who need to access this external server.

Cluster.png

SSH

You can connect to the HPC clusters via the SSH protocol. For this purpose it is required that you have an SSH client installed. The information required to connect to an HPC cluster, is the hostname of the cluster that you would like to connect to and your ETH account credentials (username, password).

Cluster Hostname
Euler euler.ethz.ch

Linux, Mac OS X

Open a shell (Terminal in OS X) and use the standard ssh command

ssh username@hostname

where username is your ETH username and the hostname can be found in the table shown above. If for instance user sfux would like to access the Euler cluster, then the command would be

samfux@bullvalene:~$ ssh sfux@euler.ethz.ch
sfux@euler.ethz.ch's password: 
Last login: Fri Sep 13 07:33:57 2019 from bullvalene.ethz.ch

      ____________________   ___
     /  ________   ___   /__/  /
    /  _____/  /  /  /  ___   /
   /_______/  /__/  /__/  /__/
   Eidgenoessische Technische Hochschule Zuerich
   Swiss Federal Institute of Technology Zurich
   -------------------------------------------------------------------------
                                        E U L E R  C L U S T E R


                                                     https://scicomp.ethz.ch
                                                http://www.smartdesk.ethz.ch
                                                  cluster-support@id.ethz.ch
 
   =========================================================================


[sfux@eu-login-19 ~]$

Windows

Since Windows does not provide an ssh client as part of the operating system, users need to download a third-party software in order to be able to establish ssh connections.

Widely used ssh clients are for instance MobaXterm, PuTTY and Cygwin.

MobaXterm terminal
Cygwin terminal
PuTTY ssh client

For using MobaXterm, you can either start a local terminal and use the same SSH command as for Linux and Mac OS X, or you can click on the session button, choose SSH and then enter the hostname and username. After clicking on OK, you will be asked to enter your password.

If you use PuTTY, then it is sufficient to specify the hostname of the cluster that you would like to access and to click on the Open button. Afterwards, the users will be prompted to enter their ETH account credentials. When using Cygwin, then you can enter the same command as Linux and Mac OS X users.

 ssh username@hostname

SSH keys

ssh keys allow you to login to a cluster without having to type a password. This can be useful for file transfer and automated tasks. When you use ssh keys properly, then this is much safer than passwords. There are always pairs of keys, a private (stored on your local workstation) and a public (stored on the computer you want to connect to). You can generate as many key pairs as you want. In order to make the keys even more secure, you should protect them with a passphrase.

Linux, Mac OS X

For a good documentation on SSH please have a look at the SSH website. It contains a general overview on SSH, instructions on how to create SSH keys and instructions on how to copy an SSH key.

On your computer, use ssh-keygen -t ed25519 to generate a key pair with the ed25519 algorithm. By default the private key is stored as $HOME/.ssh/id_ed25519 and the public key as $HOME/.ssh/id_ed25519.pub.

For security reasons, we recommend that you use a different key pair for every computer you want to connect to. For instance, if you are using both Euler and Leonhard:

ssh-keygen -t ed25519 -f $HOME/.ssh/id_ed25519_euler            # please enter a strong, non-empty passphrase when prompted

Once this is done, copy the public key to Euler or Leonhard using one of the commands:

ssh-copy-id -i $HOME/.ssh/id_ed25519_euler.pub    username@euler.ethz.ch

Where username is your ETH username. You will need to enter your ETH (LDAP) password to connect to Euler / Leonhard.

If you use an SSH agent, then you also need to add the key there (https://www.ssh.com/ssh/add).

Windows

For windows a third party software (PuTTYgen,MobaXterm) is required to create SSH keys. For a good documentation on SSH please have a look at the SSH website.

Please either use PuTTYgen or the command (MobaXterm)

ssh-keygen -t ed25519

to generate a key pair with the ed25519 algorithm and store both, the public and the private key on your local computer. For security reasons, we recommend that you use a different key pair for every computer you want to connect to.

Afterwards please login to the cluster and create the hidden directory $HOME/.ssh which needs to have the unix permission 700.

mkdir -p -m 700 $HOME/.ssh

In order to setup passwordless access to a cluster, copy the public key from your workstation to the $HOME/.ssh directory on the cluster (for this example, we use the Euler cluster, if you would like to setup access to another cluster, then you need to use the corresponding hostname instead of euler.ethz.ch) using for instance WinSCP or MobaXterm. The file needs to be stored as

$HOME/.ssh/authorized_keys

on the cluster.

Safety rules

  • Always use a (strong) passphrase to protect your SSH key. Do not leave it empty!
  • Never share your private key with somebody else, or copy it to another computer. It must only be stored on your personal computer
  • Use a different key pair for each computer you want to connect to
  • Do not reuse the key pairs for Euler / Leonhard for other systems
  • Do not keep open SSH connections in detached screen sessions
  • Disable the ForwardAgent option in your SSH configuration and do not use ssh -A (or use ssh -a to disable agent forwarding)

How to use keys with non-default names

If you use different key pairs for different computers (as recommended above), you need to specify the right key when you connect, for instance:

ssh -i $HOME/.ssh/id_ed25519_euler username@euler.ethz.ch

To make your life easier, you can configure your ssh client to use this option automatically by adding the following lines in your $HOME/.ssh/config file:

Host euler.ethz.ch
IdentityFile ~/.ssh/id_ed25519_euler

First login

On your first login, you need to accept the cluster's usage rules. Afterwards your account is created automatically. Please find below the user agreement for the Euler cluster as an example:

Please note that the Euler cluster is subject to the "Acceptable Use Policy
for Telematics Resources" ("Benutzungsordnung fuer Telematik", BOT) of ETH
Zurich and relevant documents (https://tinyurl.com/eth-bot), in particular:

  * your Euler account (like your ETH account) is *strictly personal*
  * you are responsible for all activities done under your account
  * you must keep your password secure and may not give it to a 3rd party
  * you may not share your account with anyone, including your supervisor
  * you may not use someone else's account, with or without their consent
  * you must comply with all civil and criminal laws (copyright, privacy,
    data protection, etc.)
  * any violation of these rules and policies may lead to administrative
    and/or legal measures

Before you can proceed you must confirm that you have read, understood,
and agree to the rules and policies mentioned above.

On Euler, the first login of a new user triggers a process that sends a verification code to the users ETH email address (USERNAME@ethz.ch, with USERNAME being the ETH account name). The user is then prompted to enter the verification code and by entering the correct code, the cluster account of the user is created.

X11

The clusters of ID SIS HPC use the X window System (X11) to display a program's graphical user interface (GUI) on a users workstation. You need to install an X11 server on your workstation to siplay X11 windows. The ports used by X11 are blocked by the cluster's firewall. To circumvent this problem, you must open an SSH tunnel and redirect all X11 communication through that tunnel.

Linux

Xorg (X11) is normally installed by default as part of most Linux distributions. If you are using a version newer than 1.16, then please have a look at the troubleshooting section at the bottom of this wiki page.

ssh -Y username@hostname

Mac OS X

Since X11 is no longer included in OS X, you must install XQuartz. If you are using a version newer than 2.7.8, then please have a look at the troubleshooting section at the bottom of this wiki page.

ssh -Y username@hostname

Windows

X11 is not supported by Windows. You need to install a third-party application in order to use X11 forwarding. You can find a list of common X11 servers below:

VPN

When connecting from outside of the ETH network to one of our HPC clusters, you first need to establish a VPN connection. For installing a VPN client, please access https://sslvpn.ethz.ch in your browser. After logging in to the website, it will detect if there is already a VPN client installed on your computer and otherwise install one automatically. You can find more detailed instructions on the ETH website.

Please note that for establishing a VPN connection, you need to use your network password instead of your main password. If you did not yet set your network password, then please go to https://password.ethz.ch, login with your ETH account credentials and click on Passwort ändern. There you can set your network password.

Vpn.png

After establishing a VPN connection, you can login to our clusters via SSH.

Troubleshooting

Permission denied

If you enter 3 times a wrong password, then you will get a permission denied error:

sfux@calculus:~$ ssh sfux@euler.ethz.ch
sfux@euler.ethz.ch's password: 
Permission denied, please try again.
sfux@euler.ethz.ch's password: 
Permission denied, please try again.
sfux@euler.ethz.ch's password: 
Permission denied (publickey,password,hostbased).
sfux@calculus:~$

In case you receive a "Permission denied" error, please check if you entered the correct password. If you think that your account has been corrupted, then please contact the service desk of IT services of ETH Zurich.

If you enter a wrong password too many times or in a high frequency, then we might block access to the clusters for your account, because it could be correupted. If you account has been blocked by the HPC group, then please contact cluster support.

Timeout

If you try to login and receive a timeout error, then it is very likely that you tried to connect from outside of the ETH network to one of the HPC clusters.

sfux@calculus:~$ ssh -Y sfux@euler.ethz.ch
ssh: connect to host euler.ethz.ch port 22: Connection timed out

Please either connect from the inside of the ETH network, or establish a VPN connection.

setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory

If you are using a Mac, can you please try to comment out the following lines in your /etc/ssh/ssh_config on your workstation:

Host *
       SendEnv LANG LC_*

This should solve the problem.

Too many authentication failures

This errors can be triggered if you have more than 6 private SSH keys in your local .ssh directory. In this case specify the SSH key to use and use the IdentitiesOnly=yes option, for example:

sfux@calculus:~$ ssh -i $HOME/.ssh/id_ed25519 -o IdentitiesOnly=yes sfux@euler.ethz.ch


Indirect GLX rendering error

When using an SSH connection with X11 forwarding enabled, newer versions of the Xorg server show an error message, when the graphical user interface of an application is started:

X Error of failed request: BadValue (integer parameter out of range for operation)
  Major opcode of failed request: 153 (GLX)
  Minor opcode of failed request: 3 (X_GLXCreateContext)
  Value in failed request: 0x0
  Serial number of failed request: 27
  Current serial number in output stream: 30

This error is caused by starting your X11 server without enabling the setting for indirect GLX rendering (iglx), that is required for X11 forwarding. Up to version 1.16 of the Xorg server, the setting iglx, has been enabled by default. With version 1.17, the default has changed from +iglx to -iglx. Now the setting needs to be enabled either in the Xorg configuration file or with a command line setting, when starting the Xorg server manually. For Xquartz versions up to 2.7.8, the iglx setting is enabled by default. If you would like to use XQuartz 2.7.9 or newer, then please make sure that you enable the iglx setting when the X-server is started.

This problem is described in the following article:

https://www.phoronix.com/scan.php?page=news_item&px=Xorg-IGLX-Potential-Bye-Bye

Please find below some links, which address the problem for specific operating systems.

Operating system Link
Red Hat Enterprise Linux (RHEL) https://elrepo.org/bugs/view.php?id=610
CentOS https://www.centos.org/forums/viewtopic.php?t=57409#p244528
Ubuntu http://askubuntu.com/questions/745135/how-to-enable-indirect-glx-contexts-iglx-in-ubuntu-14-04-lts-with-nvidia-gfx
Mac OS X https://bugs.freedesktop.org/show_bug.cgi?id=96260

Data management

Introduction

On our cluster, we provide multiple storage systems, which are optimized for different purposes. Since the available storage space on our clusters is limited and shared between all users, we set quotas in order to prevent single users from filling up an entire storage system with their data.

A summary of general questions about file systems, storage and file transfer can be found in our FAQ. If you have questions or encounter problems with the storage systems provided on our clusters or file transfer, then please contact cluster support.

Personal storage (everyone)

Home

On our clusters, we provide a home directory (folder) for every user that can be used for safe long term storage of important and critical data (program source, script, input file, etc.). It is created on your first login to the cluster and accessible through the path

/cluster/home/username

The path is also saved in the variable $HOME. The permissions are set that only you can access the data in your home directory and no other user. Your home directory is limited to 50 GB and a maximum of 500'000 files and directories (inodes). The content of your home is saved every hour and there is also a nightly backup (tape).

Scratch

We also provide a personal scratch directory (folder) for every user, that can be used for short-term storage of larger amounts of data. It is created, when you access it the first time through the path

/cluster/scratch/username

The path is also saved in the variable $SCRATCH. It is visible (mounted), only when you access it. If you try to access it with a graphical tool, then you need to specify the full path as it is might not visible in the /cluster/scratch top-level directory. Before you use your personal scratch directory, please carefully read the usage rules to avoid misunderstandings. The usage rules can also be displayed directly on the cluster with the following command.

cat $SCRATCH/__USAGE_RULES__

Your personal scratch directory has a disk quota of 2.5 TB and a maximum of 1'000'000 files and directories (inodes). There is no backup for the personal scratch directories and they are purged on a regular basis (see usage rules).

For personal scratch directories, there are two limits (soft and hard quota). When reaching the soft limit (2.5 TB) there is a grace period of one week where users can use 10% more than their allowed capacity (this upper limit is called hard quota), which applies to both, the number of inodes and space. If the used capacity is still above the soft limit after the grace period, then the current directory is locked for new writes until being again below the soft quota.

Group storage (shareholders only)

Project

Shareholder groups have the option to purchase additional storage inside the cluster. The project file system is designed for safe long-term storage of critical data (like the home directory). Shareholder groups can buy as much space as they need. The path for project storage is

/cluster/project/groupname

Access rights and restriction is managed by the shareholder group. We recommend to use ETH groups for this purpose. If you are interested in getting more information and prices of the project storage, then please contact cluster support.

Work

Apart from project storage, shareholder groups also have the option to buy so-called work (high-performance) storage. It is optimized for I/O performance and can be used for short- or medium-term storage for large computations (like scratch, but without regular purge). Shareholders can buy as much space as they need. The path for work storage is

/cluster/work/groupname

Access rights and restriction is managed by the shareholder group. We recommend to use ETH groups for this purpose. The directory is visible (mounted), only when accessed. If you are interested in getting more information and prices of the work storage, then please contact cluster support.

For /cluster/work directories, there are two limits (soft and hard quota). When reaching the soft limit there is a grace period of one week where users can use 10% more than their allowed capacity (this upper limit is called hard quota), which applies to both, the number of inodes and space. If the used capacity is still above the soft limit after the grace period, then the current directory is locked for new writes until being again below the soft quota.

Local scratch (on each compute node)

The compute nodes in our HPC clusters also have some local hard drives, which can be used for temporary storing data during a calculation. The main advantage of the local scratch is, that it is located directly inside the compute nodes and not attached via the network. This is very beneficial for serial, I/O-intensive applications. The path of the local scratch is

/scratch

You can either create a directory in local scratch yourself, as part of a batch job, or you can use a directory in local scratch, which is automatically created by the batch system. Slurm creates a unique directory in local scratch for every job. At the end of the job, Slurm is also taking care of cleaning up this directory. The path of the directory is stored in the environment variable

$TMPDIR

If you use $TMPDIR, then you need to request scratch space from the batch system.

External storage

Please note that external storage is convenient to bring data in to the cluster or to store data for a longer time. But we recommend to not directly process data from external storage systems in batch jobs on Euler as this could be very slow and potentially put a high load on the external storage system. Please rather copy data from the external storage system to some cluster storage (home directory, personal scratch directory, project storage, work storage, or local scratch) before you process it in a batch job. After processing the data from a cluster storage system, you can copy the results back to the external storage system.

Central NAS/CDS

Groups who have purchased storage on the central NAS of ETH or CDS can ask the storage group of IT services to export it to our HPC clusters. There are certain requirements that need to be fulfilled in order to use central NAS/CDS shares on our HPC clusters.

  • The NAS/CDS share needs to be mountable via NFS (shares that only support CIFS cannot be mounted on the HPC clusters).
  • The NAS/CDS share needs to be exported to the subnet of our HPC clusters (please contact ID Systemdienste and ask them for an NFS export of your NAS/CDS share).
  • Please carefully set the permissions of the files and directories on your NAS/CDS share if other cluster users should not have read/write access to your data.

NAS/CDS shares are then mounted automatically when you access them. The mount-point of such a NAS/CDS share is

/nfs/servername/sharename

A typical NFS export file to export a share to the Euler cluster would look like

# cat /etc/exports
/export 129.132.93.64/26(rw,root_squash,secure) 10.205.0.0/16(rw,root_squash,secure) 10.204.0.0/16(rw,root_squash,secure)

If you ask the storage group to export your share to the Euler cluster, then please provide them the above-shown information. If the NAS share is located on the IBM Spectrum Scale storage system, then please also ask for the following options to be set by the storage group:

PriviledgedPort=TRUE
Manage_Gids=TRUE

Please note that these options should only be applied to the Euler subnet. For a general overview on subnets and IP addresses please check the following wiki page. When a NAS share is mounted on our HPC clusters, then it is accessible from all the compute nodes in the cluster.

Local NAS

Groups that operate their own NAS, can export a shared file system via NFSv3 to our HPC clusters. In order to use an external NAS on our HPC clusters, the following requirements need to be fullfilled

  • NAS needs to support NFSv3 (this is currently the only NFS version that is supported from our side).
  • The user and group ID's on the NAS needs to be consistent with ETH user names and group.
  • The NAS needs to be exported to the subnet of our HPC clusters.
  • Please carefully set the permissions of the files and directories on your NAS share if other cluster users should not have read/write access to your data.

We advise you to not use this path directly from your jobs. Rather, you should stage files to and from $SCRATCH.

You external NAS can then be accessed through the mount-point

/nfs/servername/sharename

A typical NFS export file to export a share to the Euler cluster would look like

# cat /etc/exports
/export 129.132.93.64/26(rw,root_squash,secure) 10.205.0.0/16(rw,root_squash,secure) 10.204.0.0/16(rw,root_squash,secure)

For a general overview on subnets and IP addresses please check the following wiki page.

The share is automatically mounted, when accessed.

Central LTS (Euler)

Groups who have purchased storage on the central LTS of ETH can ask the ITS SD backup group to export it to the LTS nodes in the Euler cluster. There are certain requirements that need to be fulfilled in order to use central LTS shares on our HPC clusters.

  • The LTS share needs to be mountable via NFS (shares that only support CIFS cannot be mounted on the HPC clusters).
  • The LTS share needs to be exported to the LTS nodes of our HPC clusters (please contact ITS SD Backup group and ask them for an NFS export of your LTS share).
  • Please carefully set the permissions of the files and directories on your LTS share if other cluster users should not have read/write access to your data.

The LTS share needs to be exported to the LTS nodes:

129.132.93.70(rw,root_squash,secure)
129.132.93.71(rw,root_squash,secure)

For accessing your LTS share, you would need to login to the LTS nodes in Euler with

ssh USERNAME@lts.euler.ethz.ch

Where USERNAME needs to be replaced with your ETH account name. LTS shares are then mounted automatically when you access them. The mount-point of such a LTS share is

/nfs/lts11.ethz.ch/shares/sharename(_repl)

or

/nfs/lts21.ethz.ch/shares/sharename(_repl)

depending if your share is located on lts11.ethz.ch or lts21.ethz.ch.

Backup

The users home directories are backed up every night and the backup has a retention time of 90 days. For project and work storage, we provide a weekly back up, with also 90 days retention time. If you have some data that you would like to exclude from the backup, then please create a subdirectory nobackup. Data stored in the nobackup directory will then be excluded from the backup. The subdirectory nobackup can be located on any level in the directory hierarchy:

/cluster/work/YOUR_STORAGE_SHARE/nobackup
/cluster/work/YOUR_STORAGE_SHARE/project101/nobackup
/cluster/work/YOUR_STORAGE_SHARE/project101/data/nobackup/filename
/cluster/work/YOUR_STORAGE_SHARE/project101/data/nobackup/subdir/filename

When large unimportant temporary data that changes a lot is backed up, then this will increase the size/pool of the backup and hence make the backup and the restore process slower. We would therefore like to ask you to exclude this kind of data from the backup of your group storage share if possible. Excluding large temporary data from the backup will help you and us restoring your important data faster in the case of an event.

Comparison

In the table below, we try to give you an overview of the available storage categories/systems on our HPC clusters as well as a comparison of their features.

Category Mount point Life span Snapshots Backup Retention time of backup Purged Max. size Small files Large files
Home /cluster/home permanent up to 7 days yes 90 days no 50 GB + o
Scratch /cluster/scratch 2 weeks no no - yes (files older than 15 days) 2.5 TB o ++
Project /cluster/project 4 years optional yes 90 days no flexible + +
Work /cluster/work 4 years no yes 90 days no flexible o ++
Central NAS /nfs/servername/sharename flexible up to 8 days yes 90 days no flexible + +
Local scratch /scratch duration of job no no - end of job 800 GB ++ +

Choosing the optimal storage system

When working on an HPC cluster that provides different storage categories/systems, the choice of which system to use can have a big influence of the performance of your workflow. In the best case you can speedup your workflow by quite a lot, whereas in the worst case the system administrator has to kill all your jobs and has to limit the number of concurrent jobs that you can run because your jobs slow down the entire storage system and this can affect other users jobs. Please take into account a few recommendations that are listed below.

  • Use local scratch whenever possible. With a few exceptions this will give you the best performance in most cases.
  • For parallel I/O with large files, the high-performance (work) storage will give you the best performance.
  • Don't create a large number of small files (KB's) on project or work storage as this could slow down the entire storage system.
  • If your application does very bad I/O (opening and closing files multiple times per second and doing small appends on the order of a few bytes), then please don't use project and work storage. The best option for this use-case would be local scratch.

If you need to work with a large amount of small files, then please keep them grouped in a tar archive. During a job you can then untar the files to the local scratch, process them and group the results again in a tar archive, which can then be copied back to your home/scratch/work/project space.

File transfer

In order to run your jobs on a HPC cluster, you need to transfer some data or input files from/to the cluster. For smaller and medium amounts of data, you can use some standard command line/graphical tools. If you need to transfer very large amounts of data (on the order of several TB), then please contact the cluster support and we will help you to set up the optimal strategy to transfer your data in a reasonable amount of time.

Command line tools

For transferring files from/to the cluster, we recommend to use standard tools like secure copy (scp) or rsync. The general syntax for using scp is

scp [options] source destination

For copying a file from your PC to an HPC cluster (to your home directory), you need to run the following command on your PC:

scp file username@hostname:

Where username is your ETH username and hostname is the hostname of the cluster. Please note the colon after the hostname. For copying a file from the cluster to your PC (current directory), you need to run the following command on your PC:

scp username@hostname:file .

For copying an entire directory, you would need to add the option -r. Therefore you would use the following command to transfer a directory from your PC to an HPC cluster (to your home directory).

scp -r directory username@hostname:

The general sytnax for rsync is

rsync [options] source destination

In order to copy the content of a directory from your PC (home directory) to a cluster (home directory), you would use the following command.

rsync -Pav /home/username/directory/ username@hostname:/cluster/home/username/directory

The -P option enables rsync to show the progress of the file transfer. The -a option preserves almost all file attributes and the -v option gives you more verbose output.

Graphical tools

Graphical scp/sftp clients allow you to mount your Euler home directory on your workstation. These clients are available for most operating systems.

  • Linux + Gnome: Connect to server
  • Linux + KDE: Konqueror, Dolphin, Filezilla
  • Mac OS X: MacFUSE, Macfusion, Cyberduck, Filezilla
  • Windows: WinSCP, Filezilla'

WinSCP provides the user a Windows explorer like user interface with a split screen that allows to transfer files per drag-and-drop. After starting your graphical scp/sftp client, you need to specify the hostname of the cluster that you would like to connect to and then click on the connect button. After entering your ETH username and password, you will be connected to the cluster and can transfer files.

WinSCP
Winscp1.png Winscp2.png
Filezilla
Filezilla1.png Filezilla2.png

Globus for fast file transfer

Infographic Globus univers


see Globus for fast file transfer


Quotas

The home and scratch directories on our clusters are subject to a strict user quota. In your home directory, the soft quota for the amount of storage that you can use is set to 45 GB and the hard quota is set to 50 GB. Further more, you can store maximally 500'000 files and directories (inodes). The hard quota for your personal scratch directory is set to 2.5 TB. You can maximally have 1'000'000 files and directories (inodes). You can check your current usage with the lquota command.

[sfux@eu-login-13-ng ~]$ lquota
+-----------------------------+-------------+------------------+------------------+------------------+
| Storage location:           | Quota type: | Used:            | Soft quota:      | Hard quota:      |
+-----------------------------+-------------+------------------+------------------+------------------+
| /cluster/home/sfux          | space       |          8.85 GB |         17.18 GB |         21.47 GB |
| /cluster/home/sfux          | files       |            25610 |           160000 |           200000 |
+-----------------------------+-------------+------------------+------------------+------------------+
| /cluster/shadow             | space       |          4.10 kB |          2.15 GB |          2.15 GB |
| /cluster/shadow             | files       |                2 |            50000 |            50000 |
+-----------------------------+-------------+------------------+------------------+------------------+
| /cluster/scratch/sfux       | space       |        237.57 kB |          2.50 TB |          2.70 TB |
| /cluster/scratch/sfux       | files       |               29 |          1000000 |          1500000 |
+-----------------------------+-------------+------------------+------------------+------------------+
[sfux@eu-login-13-ng ~]$ 

If you reach 80% of your quota (number of files or storage) in your personal scratch directory, you will be informed via email to clean up.

Shareholders that own storage in /cluster/work or /cluster/project on Euler or Leonhard can check their quota also by using the lquota command:

[sfux@eu-login-11-ng ~]$ lquota /cluster/project/sis
+-----------------------------+-------------+------------------+------------------+------------------+
| Storage location:           | Quota type: | Used:            | Soft quota:      | Hard quota:      |
+-----------------------------+-------------+------------------+------------------+------------------+
| /cluster/project/sis        | space[B]    |          6.17 TB |                - |         10.41 TB |
| /cluster/project/sis        | files       |          1155583 |                - |         30721113 |
+-----------------------------+-------------+------------------+------------------+------------------+
[sfux@eu-login-11-ng ~]$
[sfux@eu-login-11-ng ~]$ lquota /cluster/work/sis
+-----------------------------+-------------+------------------+------------------+------------------+
| Storage location:           | Quota type: | Used:            | Soft quota:      | Hard quota:      |
+-----------------------------+-------------+------------------+------------------+------------------+
| /cluster/work/sis           | space       |          8.36 TB |         10.00 TB |         11.00 TB |
| /cluster/work/sis           | files       |          1142478 |         10000000 |         11000000 |
+-----------------------------+-------------+------------------+------------------+------------------+
[sfux@eu-login-11-ng ~]$

The lquota script requires the path to the top-level directory as parameter.

Setting up the Environment

Introduction

Most applications, compilers and libraries rely on environment variables to function properly. These variables are usually set by the operating system, the administrator, or by the user. Typical examples include:

  • PATH — location of system commands and user programs
  • LD_LIBRARY_PATH — location of the dynamic (=shared) libraries needed by these commands and programs
  • MANPATH — location of man (=manual) pages for these commands
  • Program specific environment variables

The majority of problems encountered by users are caused by incorrect or missing environment variables. People often copy initialization scripts — .profile, .bashrc, .cshrc — from one machine to the next, without verifying that the variables defined in these scripts are correct (or even meaningful!) on the target system.

If setting environment variables is difficult, modifying them at run-time is even more complex and error-prone. Changing the contents of PATH to use a different compiler than the one set by default, for example, is not for the casual user. The situation can quickly become a nightmare when one has to deal with multiple compilers and libraries (e.g. MPI) at the same time.

Environment modules — modules in short — offer an elegant and user-friendly solution to all these problems. Modules allow a user to load all the settings needed by a particular application on demand, and to unload them when they are no longer needed. Switching from one compiler to the other; or between different releases of the same application; or from one MPI library to another can be done in a snap, using just one command — module.

Software stacks

On our clusters we provide multiple software stacks.

When in doubt, please use the most recent one. As of 08/2024, the available options are :

nmarounina@eu-login-18:~$ module avail stack

----------------------------- /cluster/software/lmods --------------------------------
  stack/2024-03-beta    stack/2024-04    stack/2024-05    stack/2024-06 (D)

 Where:
  D:  Default Module

The detail of the software available in each stack can be seen here.

Module commands

Module spider

The module spider command list all the existing modules matching the string.

nmarounina@eu-login-18:~$ module spider python

--------------------------------------------------
  python:
--------------------------------------------------
     Versions:
        python/3.8.18-c3ikxoi
        python/3.8.18-mcsql52
        python/3.8.18-zv6eekz
        python/3.9.18_cuda
        python/3.9.18_rocm
        python/3.9.18
        python/3.10.13_cuda
        python/3.10.13_rocm
        python/3.10.13
        python/3.11.6_cuda-oe7bpyk
        python/3.11.6_cuda
        python/3.11.6_rocm-oe7bpyk
        python/3.11.6_rocm
        python/3.11.6-m4n2ny4
        python/3.11.6-oe7bpyk
        python/3.11.6
     Other possible modules matches:
        py-python-dateutil  python_cuda  python_rocm

--------------------------------------------------
  To find other possible module matches execute:

      $ module -r spider '.*python.*'

--------------------------------------------------
  For detailed information about a specific "python" package (including how to load the modules) use the module's full name.
  Note that names that have a trailing (E) are extensions provided by other modules.
  For example:

     $ module spider python/3.11.6
--------------------------------------------------

This is an excellent tool to explore all of the available software on the cluster. As specified at the end of the output, type `module spider` + the full version name, to get the list of modules that are needed to be loaded prior to loading the desired module.

When in doubt, please choose a module that does not have a hash after its name.

Module show

The module show command provides you some information on what environment variables are changed and set by the module file.

nmarounina@eu-login-18:~$ module show matlab/R2024a
------------------------------------------------
  /cluster/software/lmods/matlab/R2024a.lua:
------------------------------------------------
whatis("Name : MATLAB")
whatis("Version : R2024a")
help(MATLAB)
setenv("MATLAB","/cluster/software/commercial/matlab/R2024a")
setenv("MATLAB_BASEDIR","/cluster/software/commercial/matlab/R2024a")
setenv("MKL_DEBUG_CPU_TYPE","5")
prepend_path("PATH","/cluster/software/commercial/matlab/R2024a/bin")
setenv("MATLAB_CLUSTER_PROFILES_LOCATION","/cluster/software/comm[...]
append_path("PATH","/cluster/software/commercial/matlab/support_package[...]
prepend_path("MATLABPATH","/cluster/software/commercial/matlab/support[...]

nmarounina@eu-login-18:~$

Module load

The module load command load the corresponding and prepares the environment for using this application or library, by applying the instructions, which can be shown by running the module show command.

nmarounina@eu-login-18:~$ module load stack/2024-06  gcc/12.2.0 python/3.11.6
Many modules are hidden in this stack. Use "module --show_hidden spider SOFTWARE" if you are not able to find the required software
nmarounina@eu-login-18:~$ which python
/cluster/software/stacks/2024-06/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-12.2.0/python-3.11.6-ukhwpjnwzzzizek3pgr75zkbhxros5fq/bin/python
nmarounina@eu-login-18:~$

Module list

The module list command displays the currently loaded modules files.

nmarounina@eu-login-18:~$ module list

Currently Loaded Modules:
  1) stack/2024-06   2) gcc/12.2.0   3) python/3.11.6

 

nmarounina@eu-login-18:~$

Module purge

The module purge command unload all currently loaded modules and cleans up the environment of your shell. In some cases, it might be better to log out and log in again, in order to get a really clean shell.

nmarounina@eu-login-18:~$ module list

Currently Loaded Modules:
  1) stack/2024-06   2) gcc/12.2.0   3) python/3.11.6

 

nmarounina@eu-login-18:~$ 
nmarounina@eu-login-18:~$ module purge
nmarounina@eu-login-18:~$ module list
No modules loaded
nmarounina@eu-login-18:~$ 

Naming scheme

Please find the general naming scheme of module files below.

program_name/version(alias[:alias2])

Instead of specifying a version directly, it is also possible to use aliases.

program_name/alias == program_name/version

The special alias default indicates which version is taken by default (if neither version nor alias is specified)

program_name/default == program_name

If no default is specified for a particular software, then the most recent version (i.e. that with the largest number) is taken by default.

Hierarchical modules

LMOD allows to define a hierarchy of modules containing 3 layers (Core, Compiler, MPI). The core layer contains all module files which are not depending on any compiler/MPI. The compiler layer contains all modules which are depending on a particular compilers, but not on any MPI library. The MPI layer contains modules that are depending on a particular compiler/MPI combination.

When you login to the Euler cluster, no module is loaded. Running the module avail command displays all the available software stacks. Loading a stack will also automatically load a compiler module (gcc) and running again module avail will show all modules available for this compiler. If you would like to see the modules available for a different compiler, then you would need to load another software stack and run module avail again. For checking out the available modules for gcc/12.2.0 openmpi/4.1.6, you would load the corresponding compiler and MPI module and run again module avail'.

As a consequence of the module hierarchy, you can never have two different versions of the same module loaded at the same time. This helps to avoid problems arising due to misconfiguration of the environment.

Application life-cycle

Based on application experience on Brutus we offer, besides the currently supported versions, two new categories of modules for new and legacy versions on the Euler cluster. Due to dependencies between compilers, libraries and applications, changes to the applications and the corresponding modules need to be synchronized.

Life-cycle of an application

Modules for new or experimental versions of an application/library/compiler first appear in the new module category, where we provide a partial support matrix. Specific compiler/library combinations can be requested by shareholders. New modules are not visible by default. If you would like to see which new versions are available or try them out, you will need to load the new module first:

module load new

Applications that have passed all tests and are deemed ready for production (stable, bug-free, compatible with LSF, etc.) will be moved to the supported category in the next quarterly update.

Applications that have become obsolete (no longer supported by the vendor, superseded by new versions with more functionality, etc.) will be moved to the legacy category. For these modules the HPC group can only provide limited support. Legacy modules are not visible by default. If you still need to use them, you will need to load the legacy module first:

module load legacy

Applications that are known to be buggy, have become redundant or whose license have expired will be removed. If you still need to use them, please contact cluster support.

User notification

The HPC group updates the module categories on a regular basis. Therefore the Application life-cycle page contains a table listing all applications that are available on Euler as well as the modifications that we plan to apply at the next change. The users will receive a reminder, one week prior to the update of the categories that will also contain information about the most important changes.

Application tables

We have listed all available modules on the different HPC clusters in separate tables, which also contain a special formatting that indicate actions taken in the next change. Please note that the application table for Leonhard does not contain the different module categories as we have switched from manual installations to using the supercomputing package manager (SPACK) and also switched from environment modules to lmod modules.

Using the batch system

Command Summary

Please find below a table with commands for job submission, monitoring and control

Command Description
sbatch Submit scripts to Slurm
scancel Kill a job
srun Run a parallel job within Slurm (e.g. create a job or do it within the current one)
squeue View job and job step information for jobs managed by Slurm
scontrol Display information about the resource usage of a job
sstat Display the status information of a running job/step
sacct Displays accounting data for all jobs and job steps in the Slurm job accounting log or Slurm database
myjobs Job information in human readable format


Introduction

On our HPC cluster, we use the Slurm (Simple Linux Utility for Resource Management) batch system. A basic knowledge of Slurm is required if you would like to work on the HPC clusters of ETH. The present article will show you how to use Slurm to execute simple batch jobs and give you an overview of some advanced features that can dramatically increase your productivity on a cluster.

Using a batch system has numerous advantages:

  • single system image — all computing resources in the cluster can be accessed from a single point
  • load balancing — the workload is automatically distributed across all available processor cores
  • exclusive use — many computations can be executed at the same time without affecting each other
  • prioritization — computing resources can be dedicated to specific applications or people
  • fair share — a fair allocation of those resources among all users is guaranteed

In fact, our HPC clusters contains so many cores (130,000) and are used by so many people (more than 3,200) that it would be impossible to use it efficiently without a batch system.

All computations on our HPC cluster must be submitted to the batch system. Please do not run any job interactively on the login nodes, except for testing or debugging purposes.

If you are a member of multiple shareholder groups, then please have a look at our wiki page about working in multiple shareholder groups

Basic job submission

We provide a helper tool to facilitate setting up submission commands and/or jobscript for Slurm and LSF

Slurm/LSF Submission Line Advisor

You can specify the resource required by your job and the command and the script will output the corresponding Slurm/LSF submission command or jobscript, depending on your choice.

Slurm provides two different ways of submitting jobs. While we first show the solution with --wrap, we strongly recommend to use scripts as indicated in the section Job scripts. The scripts require a bit more work to run a job but comes with some majors advantages:

  • Better reproducibility
  • More easy and faster handover (which includes the cluster support when you need our help)
  • Can load the modules directly within the script

Simple commands and programs

Submitting a job to the batch system is as easy as:

sbatch --wrap="command [arguments]"
sbatch --wrap="/path/to/program [arguments]"

Examples:

[sfux@eu-login-03 ~]$ sbatch --wrap="gzip big_file.dat"
Submitted batch job 1010113
[sfux@eu-login-03 ~]$ sbatch --wrap="./hello_world"
Submitted batch job 1010171

Two or more commands can be combined together by enclosing them in quotes:

sbatch --wrap="command1; command2"

Example:

[sfux@eu-login-03 ~]$ sbatch --wrap "configure; make; make install"
Submitted batch job 1010213.

Quotes are also necessary if you want to use I/O redirection (">", "<"), pipes ("|") or conditional operators ("&&", "||"):

sbatch --wrap="command < data.in > data.out"
sbatch --wrap="command1 | command2"

Examples:

[sfux@eu-login-03 ~]$ sbatch --wrap="tr ',' '\n' < comma_separated_list > linebreak_separated_list"
Submitted batch job 1010258
[sfux@eu-login-03 ~]$ sbatch --wrap="cat unsorted_list_with_redundant_entries | sort | uniq > sorted_list"
Submitted batch job 1010272

Shell scripts

More complex commands may be placed in a shell script, which should then be submitted like this:

sbatch < script
sbatch script

Example:

[sfux@eu-login-03 ~]$ sbatch < hello.sh
Submitted batch job 1010279.

Output file

By default your job's output and error messages (or stdout and stderr, to be precise) are combined and written into a file named slurm-JobID.out in the directory where you executed sbatch, where JobID is the number assigned to your job by Slurm. You can select a different output file using the option:

sbatch --output=output_file --open-mode=append --wrap="command [argument]" 

The option --output output_file in combination with --open-mode=append tells Slurm to append your job's output to output_file. If you want to overwrite this file, use:

sbatch --output output_file --open-mode=truncate --wrap="command [argument]"

Note that this option, like all sbatch options, must be placed before the command that you want to execute in your job. A common mistake is to place sbatch options in the wrong place, like.

sbatch --wrap=command -o output_fileWRONG!

Error file

It is also possible to store stderr of a job in a separate file (and again, you can choose with the --open-mode parameter if you would like to append or overwrite)

sbatch --error=error_file --open-mode=append --wrap "command [argument]"

Queue / Queues

Slurm uses different queues to manage the scheduling of the jobs. As a user, you don't need to specify which queue to use as it is automatically picked by slurm when you submit the job.

Resource requirements

By default, a batch job can use only one core for up to 1 hour. (The job is killed when it reaches its run-time limit.) If your job needs more resources — time, cores, memory or scratch space —, you must request them when you submit it.

Wall-clock time

The time limits on our clusters are always based on wall-clock (or elapsed) time. You can specify the amount of time needed by your job with several formats using the option:

sbatch --time=minutes ...                        example:  sbatch --time=10 ...
sbatch --time=minutes:seconds ...                example:  sbatch --time=10:50 ...
sbatch --time=hours:minutes:seconds ...          example:  sbatch --time=5:10:50 ...
sbatch --time=days-hours ...                     example:  sbatch --time=1-5 ...
sbatch --time=days-hours:minutes ...             example:  sbatch --time=1-5:10 ...
sbatch --time=days-hours:minutes:seconds ...     example:  sbatch --time=1-5:10:50 ...

Examples:

[sfux@eu-login-03 ~]$ sbatch --time=20 --wrap="./Riemann_zeta -arg 26"
Submitted batch job 1010305
[sfux@eu-login-03 ~]$ sbatch --time=20:00 --wrap="./solve_Koenigsberg_bridge_problem"
Submitted batch job 1010312.

Since our clusters contains processor cores with different speeds two similar jobs will not necessarily take the same time to complete. It is therefore safer to request more time than strictly necessary... but not too much, for shorter jobs have generally a higher priority than longer ones.

The maximum run-time for jobs that can run on most compute nodes in the cluster is 360 hours. We remain the right to stop jobs with a run time of more than 5 days in case of an emergency maintenance.

Number of processor cores

If your job requires multiple cores (or threads), you must request them using the option:

sbatch --ntasks=number_of_cores --wrap="..."

or

sbatch --ntasks=1 --cpus-per-task=number_of_cores --wrap="..."

Please make sure to check the paragraph about parallel job submission before requesting multiple cores.

Note that merely requesting multiple cores does not mean that your application will use them.

Memory

By default the batch system allocates 1024 MB (1 GB) of memory per processor core. A single-core job will thus get 1 GB of memory; a 4-core job will get 4 GB; and a 16-core job, 16 GB. If your computation requires more memory, you must request it when you submit your job:

sbatch --mem-per-cpu=XXX ...

where XXX is an integer. The default unit is MB, but you can also specify the value in GB when adding the suffix "G" after the integer value.

Example:

[sfux@eu-login-03 ~]$ sbatch --mem-per-cpu=2G --wrap="./evaluate_gamma -precision 10e-30"
Submitted batch job 1010322

Note: Please note that users cannot request the full memory of a node, as some of the memory is reserved for the operating system of the compute nodes that runs in memory. Therefore if a user for instance requests 256 GiB of memory, then job will not be dispatched to a node with 256 GiB of memory, but on a node with 512 GiB memory or more. As a general rule, jobs that request ~3% less memory than a node has can run on that node type. For instance, on a node with 256 GiB of memory, you can request up to 256*0.97 GiB = 248.32 GiB.

Scratch space

Slurm automatically creates a local scratch directory when your job starts and deletes it when the job ends. This directory has a unique name, which is passed to your job via the variable $TMPDIR.

Unlike memory, the batch system does not reserve any disk space for this scratch directory by default. If your job is expected to write large amounts of temporary data (say, more than 250 MB) into $TMPDIR — or anywhere in the local /scratch file system — you must request enough scratch space when you submit it:

sbatch --tmp=YYY ...

where YYY' is the amount of scratch space needed by your job, in MB per host (there is no setting in Slurm to request it per core). You can also specify the amount in GB by adding the suffix "G" after YYY.

Example:

[sfux@eu-login-03 ~]$ sbatch --tmp=5000 --wrap="./generating_Euler_numbers -num 5000000"
Submitted batch job 1010713

Note that /tmp is reserved for the operating system. Do not write temporary data there! You should either use the directory created by Slurm ($TMPDIR) or create your own temporary directory in the local /scratch file system; in the latter case, do not forget to delete this directory at the end of your job.

GPU

There are GPU nodes in the Euler cluster. The GPU nodes are reserved exclusively to the shareholder groups that invested into them. Guest users and shareholder that purchase CPU nodes but no GPU nodes cannot use the GPU nodes.

All GPUs in Slurm are configured in non-exclusive process mode, such that you can run multiple processes/threads on a single GPU. Please find below the available GPU node types.

Euler

GPU Model Slurm specifier GPU per node GPU memory per GPU CPU cores per node System memory per node CPU cores per GPU System memory per GPU Compute capability Minimal CUDA version required
NVIDIA GeForce RTX 2080 Ti rtx_2080_ti 8 11 GiB 36 384 GiB 4.5 48 GiB 7.5 10.0
NVIDIA GeForce RTX 2080 Ti rtx_2080_ti 8 11 GiB 128 512 GiB 16 64 GiB 7.5 10.0
NVIDIA GeForce RTX 3090 rtx_3090 8 24 GiB 128 512 GiB 16 64 GiB 8.6 11.0
NVIDIA GeForce RTX 4090 rtx_4090 8 24 GiB 128 512 GiB 16 64 GiB 8.9 11.8
NVIDIA TITAN RTX titan_rtx 8 24 GiB 128 512 GiB 16 64 GiB 7.5 10.0
NVIDIA Quadro RTX 6000 quadro_rtx_6000 8 24 GiB 128 512 GiB 8 64 GiB 7.5 10.0
NVIDIA Tesla V100-SXM2 32 GiB v100 8 32 GiB 48 768 GiB 6 96 GiB 7.0 9.0
NVIDIA Tesla V100-SXM2 32 GB v100 8 32 GiB 40 512 GiB 5 64 GiB 7.0 9.0
Nvidia Tesla A100 (40 GiB) a100-pcie-40gb 8 40 GiB 48 768 GiB 6 96 GiB 8.0 11.0
Nvidia Tesla A100 (80 GiB) a100_80gb 10 80 GiB 48 1024 GiB 4.8 96 GiB 8.0 11.0

You can request one or more GPUs with the command

sbatch --gpus=number of GPUs ...

To run multi-node GPU jobs, you need to use the option --gpus-per-node:

sbatch --gpus-per-node=2 ...

For advanced settings, please have a look at our getting started with GPUs page.

Interactive jobs

If you just want to run a quick test, you can submit it as a batch interactive job. In this case the job's output is not written into a file, but directly to your terminal, as if it were executed interactively:

srun --pty bash -l

Please note that the bash option -l is required to start a login shell.

Example:

[sfux@eu-login-35 ~]$ srun --pty bash -l
srun: job 2040660 queued and waiting for resources
srun: job 2040660 has been allocated resources
[sfux@eu-a2p-515 ~]$

For interactive jobs with X11 forwarding enabled, you need to make sure that you login to the cluster with X11 forwarding enabled and then you can run

srun [Slurm options] --x11 --pty bash -l

Parallel job submission

Before submitting parallel jobs, please make sure that your application can run in parallel at all in order to not waste resources by requesting multiple cores for a serial application. Further more, please do a short scaling analysis to see how well your code scales in parallel before requesting dozens or hundreds of cores.

OpenMP

If your application is parallelized using OpenMP or linked against a library using OpenMP (Intel MKL, OpenBLAS, etc.), the number of processor cores (or threads) that it can use is controlled by the environment variable OMP_NUM_THREADS. This variable must be set before you submit your job:

export OMP_NUM_THREADS=number_of_cores
sbatch --ntasks=1 --cpus-per-task=number_of_cores --wrap="..."

NOTE: if OMP_NUM_THREADS is not set, your application will either use one core only, or will attempt to use all cores that it can find. As you are restricted to your jobs resources, all threads will be bound to the cores allocated to your job. Starting more than 1 thread per core will slow down your application as the threads will be fighting to get time on the CPU.

MPI

Three kinds of MPI libraries are available on our cluster: Open MPI (recommended), Intel MPI and MVAPICH2. Before you can submit and execute an MPI job, you must load the corresponding modules (compiler + MPI, in that order):

module load compiler
module load mpi_library

The command used to launch an MPI application is mpirun.

Let's assume for example that hello_world was compiled with GCC 6.3.0 and linked with Open MPI 4.1.4. The command to execute this job on 4 cores is:

module load gcc/6.3.0
module load open_mpi/4.1.4
sbatch -n 4 --wrap="mpirun ./hello_world"

Note that mpirun automatically uses all cores allocated to the job by Slurm. It is therefore not necessary to indicate this number again to the mpirun command itself:

sbatch --ntasks=4 --wrap="mpirun -np 4 ./hello_world"      ←  "-np 4" not needed!

Pthreads and other threaded applications

Their behavior is similar to OpenMP applications. It is important to limit the number of threads that the application spawns. There is no standard way to do this, so be sure to check the application's documentation on how to do this. Usually a program supports at least one of four ways to limit itself to N threads:

  • it understands the OMP_NUM_THREADS=N environment variable,
  • it has its own environment variable, such as GMX_NUM_THREADS=N for Gromacs,
  • it has a command-line option, such as -nt N (for Gromacs), or
  • it has an input-file option, such as num_threads N.

If you are unsure about the program's behavior, please contact us and we will analyze it.

Hybrid jobs

It is possible to run hybrid jobs that mix MPI and OpenMP on our HPC clusters, but this requires a more advanced knowledge of slurm and the hardware.

Job scripts

You can also use a job script to specify all sbatch options using #SBATCH pragmas. We strongly recommend to load the modules within the submission script in order improve the reproducibility.

#!/bin/bash

#SBATCH -n 4
#SBATCH --time=8:00
#SBATCH --mem-per-cpu=2000
#SBATCH --tmp=4000                        # per node!!
#SBATCH --job-name=analysis1
#SBATCH --output=analysis1.out
#SBATCH --error=analysis1.err

module load xyz/123
command1
command2

The script can the be submitted as

sbatch < script

or

sbatch script

Job monitoring

This section is still work in progress.

squeue

The squeue command allows you to get information about pending, running and shortly finished jobs.

[sfux@eu-login-41 ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1433323 normal.4h     wrap     sfux  PD      0:04      1 eu-g1-026-2
           1433322 normal.4h     wrap     sfux  R       0:11      1 eu-a2p-483

You can also check only for running jobs (R) or for pending jobs (PD):

[sfux@eu-login-41 ~]$ squeue -t RUNNING
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1433322 normal.4h     wrap     sfux  R       0:28      1 eu-a2p-483
[sfux@eu-login-41 ~]$ squeue -t PENDING
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1433323 normal.4h     wrap     sfux  PD      0:21      1 eu-g1-026-2
[sfux@eu-login-41 ~]$ 

An overview on all squeue options is available in the squeue documentation:

https://slurm.schedmd.com/squeue.html

scontrol

The command scontrol if one of multiple that allow you to check the information about a running job:

[sfux@eu-login-15 ~]$ scontrol show jobid -dd 1498523
JobId=1498523 JobName=wrap
   UserId=sfux(40093) GroupId=sfux-group(104222) MCS_label=N/A
   Priority=1769 Nice=0 Account=normal/es_hpc QOS=es_hpc/normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:38 TimeLimit=01:00:00 TimeMin=N/A
   SubmitTime=2022-10-27T11:44:30 EligibleTime=2022-10-27T11:44:30
   AccrueTime=2022-10-27T11:44:30
   StartTime=2022-10-27T11:44:31 EndTime=2022-10-27T12:44:31 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-10-27T11:44:31 Scheduler=Main
   Partition=normal.4h AllocNode:Sid=eu-login-15:26645
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=eu-a2p-528
   BatchHost=eu-a2p-528
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,mem=1G,node=1,billing=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*
   JOB_GRES=(null)
     Nodes=eu-a2p-528 CPU_IDs=127 Mem=1024 GRES=
   MinCPUsNode=1 MinMemoryCPU=1G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=/cluster/home/sfux
   StdErr=/cluster/home/sfux/slurm-1498523.out
   StdIn=/dev/null
   StdOut=/cluster/home/sfux/slurm-1498523.out
   Power=
Squeue States
Job Status Description
QOSMaxCpuPerUserLimit You are using more CPUs than allowed by your share. You can either cancel a running job or wait until they are finished.
QOSMaxMemoryPerUser You are using more RAM than allowed by your share. You can either cancel a running job or wait until they are finished.
QOSMaxGRESPerUser You are using more generic resources (e.g. GPUs) than allowed by your share. You can either cancel a running job or wait until they are finished.
PartitionDown If a maintenance will start soon, some partitions will not be available. Otherwise, it might be an issue on our side.
Priority Your job is scheduled, but some other jobs with a higher priority (e.g. that have been longer in the queue) are scheduled before yours.
ReqNodeNotAvail Your job requirements cannot match any available nodes. Either wait until some resources are available or reduce your restriction (e.g. RAM, cores, GPU type, GPU RAM, ...).
Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions Your job requirements cannot match any available nodes. Either wait until some resources are available or reduce your restriction (e.g. RAM, cores, GPU type, GPU RAM, ...).
Resources Your job is waiting for resources to be available.
InvalidAccount If you don't specify the account when submitting your job, you might have been removed from a share. Please try to logout / login to update the share information. Otherwise, please check that the account name is not invalid with `my_share_info`.
PartitionTimeLimit Your job requests more time than available for the partition.
JobArrayTaskLimit Too many jobs in an array are already running, waiting will solve the issue.


sstat

You can use the sstat command to diplay information about your running jobs, for instance resources like CPU time (MinCPU) and memory usage (MaxRSS):

[sfux@eu-login-35 ~]$ sstat --all --format JobID,NTasks,MaxRSS,MinCPU -j 2039738
JobID          NTasks     MaxRSS     MinCPU
------------ -------- ---------- ----------
2039738.ext+        1          0   00:00:00
2039738.bat+        1    886660K   00:07:14

An overview on all available fields for the format option is provided in the sstat documentation

https://slurm.schedmd.com/sstat.html

sacct

The sacct command allows users to check information on running or finished jobs.

[sfux@eu-login-35 ~]$ sacct  --format JobID,User,State,AllocCPUS,Elapsed,NNodes,NTasks,ReqMem,ExitCode
JobID             User      State  AllocCPUS    Elapsed   NNodes   NTasks     ReqMem ExitCode
------------ --------- ---------- ---------- ---------- -------- -------- ---------- --------
2039738           sfux    RUNNING          4   00:06:01        1                  8G      0:0
2039738.bat+              RUNNING          4   00:06:01        1        1                 0:0
2039738.ext+              RUNNING          4   00:06:01        1        1                 0:0
[sfux@eu-login-35 ~]$

An overview on all format fields for the sacct is available in the documentation

https://slurm.schedmd.com/sacct.html

Please note that the CPU time (TotalCPU) and memory usage (MaxRSS) are only correctly displayed for finished jobs. If you check this properties for running jobs, then it will just show 0. For checking the CPU time and memory usage of running jobs, please use sstat.

myjobs

We are working on providing a bbjobs like wrapper for monitoring Slurm jobs. The wrapper script is called myjobs and accepts a single option -j to specify the jobid.

  • Please note that the script only correctly works for simple jobs without additional job steps
  • Please note that using the cat emoji 🐈 in a user command will cause the myjobs script to fail
  • Please note that the CPU and memory efficiency for multi-node jobs displayed by myjobs is not correct (CPU time and memory usage reports the CPU time for only 1 node out of the resource allocation).

The script is still work in progress and we try to improve it continuously.

[sfux@eu-login-39 ~]$ myjobs -j 2647208
Job information
 Job ID                          : 2647208
 Status                          : RUNNING
 Running on node                 : eu-a2p-277
 User                            : sfux
 Shareholder group               : es_hpc
 Slurm partition (queue)         : normal.24h
 Command                         : sbatch --ntasks=4 --time=4:30:00 --mem-per-cpu=2g
 Working directory               : /cluster/home/sfux/testrun/adf/2021_test
Requested resources
 Requested runtime               : 04:30:00
 Requested cores (total)         : 4
 Requested nodes                 : 1
 Requested memory (total)        : 8192 MiB
 Requested scratch (per node)    : #not yet implemented#
Job history
 Submitted at                    : 2022-11-18T11:10:37
 Started at                      : 2022-11-18T11:10:37
 Queue waiting time              : 0 sec
Resource usage
 Wall-clock                      : 00:10:34
 Total CPU time                  : 00:41:47
 CPU utilization                 : 98.85%
 Total resident memory           : 1135.15 MiB
 Resident memory utilization     : 13.85%
[sfux@eu-login-39 ~]$ 

We are still working on implementing some missing features like displaying the requested local scratch and Sys/Kernel time.

If you would like to get the myjobs output for all your jobs in the queue (pending/running), you can omit the jobid as parameter:

myjobs

for displaying only information about pending jobs, you can use

myjobs -p

for displaying only information about running jobs, you can use

myjobs -r

Please note that these commands might not work for job arrays.

scancel

You can use the scancel to cancel jobs.

[sfux@eu-login-15 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1525589 normal.24 sbatch sfux R 0:11 1 eu-a2p-373
[sfux@eu-login-15 ~]$ scancel 1525589
[sfux@eu-login-15 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
[sfux@eu-login-15 ~]$

bjob_connect

Sometimes it is necessary to monitor the job on the node(s) where it is running. On Euler, compute nodes can not be accessed directly via ssh. To access a node where a job is running the tool srun should be used. You can connect to one of your running job with srun.

srun --interactive --jobid JOBID --pty bash

where you need to replace JOBID with the id of your batch job. For jobs running on multiple nodes, you can use --nodelist=NODE to pick one.

Applications

We provide a wide range of centrally installed commercial and open source applications and libraries to our cluster users.

Central installations

Applications and libraries that are used by many people from different departments of ETH (e.g. MATLAB, Comsol, Ansys, etc.) or that are explicitly requested by a shareholder group will be installed centrally in /cluster/apps. Providing a software stack of centrally installed applications and libraries gives the users certain advantages.

  • Applications and libraries are visible and accessible to all users via environment modules.
  • They are maintained by the HPC group (or in a few cases also by users).
  • Commercial licenses are provided by the central license administration of ETH (IT shop).

If an application or library is only used by a few people, then we recommend the users to install it locally in their home directory. In case you need help to install an application or library locally, please do not hesitate to contact cluster support.

Software stacks

On our clusters we provide multiple software stacks.

After the login, no modules are loaded and the user first needs to load a stack to access the available software. We recommend to use the most recent stack:

module load stack/2024-06 

LMOD Modules use a hierarchy of modules with three layers to avoid conflicts when multiple modules are loaded at the same time.

  • The core layer contains software which are independent of compilers and MPI libraries, e.g., commercials software which come with their own runtime libraries
$ module load comsol/6.2
  • The compiler layer contains software which are dependent of compilers.
$ module load gcc/12.2.0 hdf5/1.14.3
  • The MPI layer contains software which are dependent of compilers and MPI libraries
$ module load gcc/12.2.0 openmpi/4.1.6 hdf5/1.14.3