Getting started with clusters

From ScientificComputing
Jump to: navigation, search

Contents

Requesting an account

Brutus

Brutus is no longer in operation.

Euler

We have reduced the administrative procedure to zero: no paperwork, no account request form, nothing. Any member of ETH can use Euler straight away; the only requirement is a valid NETHZ account.

Leonhard

Access is restricted to Leonhard shareholders and groups that want to test it before investing. Guest users cannot access the Leonhard cluster.


Accessing the clusters

Who can access the HPC clusters

The Euler cluster is open to all members of ETH and external users that have a collaboration with a research group at ETH Zurich. Members of ETH have immediate access to the clusters and can login with their NETHZ credentials. Members of other institutes who have a collaboration with a research group at ETH may use the HPC clusters for the purpose of said collaboration. Their counterpart ("sponsor") at ETH must ask the local IT support group (ISG) of the corresponding department to create a NETHZ guest account for them. The account needs to have the nethz service enabled and it needs to be linked to a valid e-mail address. For external users, the VPN service also needs to be enabled. Once the NETHZ guest account has been created, they can access the clusters like members of the ETH.

Legal compliance

The HPC clusters of ID SIS HPC are subject to ETH's acceptable use policy for IT resources (Benutzungsordnung für Telematik an der ETH Zürich, BOT). In particular:

  • Accounts are strictly personal.
  • You must not share your account (password, ssh keys) wih anyone else.
  • You must not use someone else's account, with our without their consent.
  • If you suspect that someone used your account, change your password and contact cluster support.

In case of abuse, the offender's account may be blocked temporarily or closed. System administrators are obliged by law to investigate abusive or illegal activities and report them to the relevant authorities.

Security

Access to the HPC clusters of ID SIS HPC is only possible via secure protocols ( ssh, sftp, scp, rsync). The HPC clusters are only accessible from inside the ETH network. If you would like to connect from a computer, which is not inside the ETH network, then you would need to establish a VPN connection first. Outgoing connections to computers inside the ETH network are not blocked. If you would like to connect to an external service, then please use the ETH proxy service:

http://proxy.ethz.ch:3128

SSH

You can connect to the HPC clusters via the SSH protocol. For this purpose it is required that you have an SSH client installed. The information required to connect to an HPC cluster, is the hostname of the cluster that you would like to connect to and your NETHZ credentials (username, password).

Cluster Hostname
Brutus brutus.ethz.ch
Euler euler.ethz.ch
Leonhard Open login.leonhard.ethz.ch

Linux, Mac OS X

Open a shell (Terminal in OS X) and use the standard ssh command

ssh username@hostname

where username is your NETHZ username and the hostname can be found in the table shown above. If for instance user leonhard would like to access the Euler cluster, then this would look like

leonhard@calculus:~$ ssh leonhard@euler.ethz.ch
leonhard@euler.ethz.ch's password: 
Last login: Fri Sep 17 14:17:54 1783 from calculus.ethz.ch

      ____________________   ___
     /  ________   ___   /__/  /
    /  _____/  /  /  /  ___   /
   /_______/  /__/  /__/  /__/
   Eidgenoessische Technische Hochschule Zuerich
   Swiss Federal Institute of Technology Zurich
   -------------------------------------------------------------------------
                                        E U L E R  C L U S T E R    CentOS 6


                http://clusterwiki.ethz.ch/brutus/Getting_started_with_Euler
                                NEW! -->  http://tinyurl.com/cluster-support
                                                  cluster-support@id.ethz.ch


[leonhard@euler04 ~]$

Windows

Since Windows does not provide an ssh client as part of the operating system, users need to download a third-party software in order to be able to establish ssh connections.

Widely used ssh clients are for instance PuTTY and Cygwin.

PuTTY ssh client
Cygwin terminal

If you use PuTTY, then it is sufficient to specify the hostname of the cluster that you would like to access and to click on the Open button. Afterwards, the users will be prompted to enter their NETHZ credentials. When using Cygwin, then you can enter the same command as Linux and Mac OS X users.

 ssh username@hostname

SSH keys

ssh keys allow you to login to a cluster without having to type a password. This can be useful for file transfer and automated tasks. When you use ssh keys properly, then this is much safer than passwords. There are always pairs of keys, a private (sotred on your local workstation) and a public (stored on the computer you want to connect to). You can generate as many key pairs as you want. In order to make the keys even more secure, you should protect them with a passphrase.

On your workstation, use ssh-keygen to generate a key pair. By default the private key is stored as $HOME/.ssh/id_rsa and the public key as $HOME/.ssh/id_rsa.pub. In order to setup passwordless access to a cluster, copy the public key to the .ssh directory on the cluster (for this example, we use the Euler cluster).

cat $HOME/.ssh/id_rsa.pub | ssh username@euler.ethz.ch "cat - >> .ssh/authorized_keys"

Some Linux distributions provide tools for copying keys.

First login

On your first login, you need to accept the cluster's usage rules. Afterwards your account is created automatically. Please find below the user agreement for the Euler cluster as an example:

Please note that the Euler cluster is subject to the "Acceptable Use Policy
for Telematics Resources" ("Benutzungsordnung fuer Telematik", BOT) of ETH
Zurich and relevant documents (https://tinyurl.com/eth-bot), in particular:

  * your Euler account (like your NETHZ account) is *strictly personal*
  * you are responsible for all activities done under your account
  * you must keep your password secure and may not give it to a 3rd party
  * you may not share your account with anyone, including your supervisor
  * you may not use someone else's account, with or without their consent
  * you must comply with all civil and criminal laws (copyright, privacy,
    data protection, etc.)
  * any violation of these rules and policies may lead to administrative
    and/or legal measures

Before you can proceed you must confirm that you have read, understood,
and agree to the rules and policies mentioned above.

X11

The clusters of ID SIS HPC use the X window System (X11) to display a program's graphical user interface (GUI) on a users workstation. You need to install an X11 server on your workstation to siplay X11 windows. The ports used by X11 are blocked by the cluster's firewall. To circumvent this problem, you must open an SSH tunnel and redirect all X11 communication through that tunnel.

Linux

Xorg (X11) is normally installed by default as part of most Linux distributions. If you are using a version newer than 1.16, then please have a look at the troubleshooting section at the bottom of this wiki page.

ssh -Y username@hostname

Mac OS X

Since X11 is no longer included in OS X, you must install XQuartz. If you are using a version newer than 2.7.8, then please have a look at the troubleshooting section at the bottom of this wiki page.

ssh -Y username@hostname

Windows

X11 is not supported by Windows. You need to install a third-party application in order to use X11 forwarding. You can find a list of common X11 servers below:

VPN

When connecting from outside of the ETH network to one of the HPC clusters of ID SIS HPC, one first needs to establish a VPN connection. A VPN client can be downloaded from https://sslvpn.ethz.ch. The VPN client is configured to connect to the ETH network.

Vpn.png

Troubleshooting

Permission denied

If you enter 3 times a wrong password, then you will get a permission denied error:

leonhard@calculus:~$ ssh leonhard@euler.ethz.ch
leonhard@euler.ethz.ch's password: 
Permission denied, please try again.
leonhard@euler.ethz.ch's password: 
Permission denied, please try again.
leonhard@euler.ethz.ch's password: 
Permission denied (publickey,password,hostbased).
leonhard@calculus:~$

In case you receive a "Permission denied" error, please check if you entered the correct password. If you think that your account has been corrupted, then please contact the service desk of IT services of ETH Zurich.

If you enter a wrong password too many times or in a high frequency, then we might block access to the clusters for your account, because it could be correupted. If you account has been blocked by the HPC group, then please contact cluster support.

Timeout

If you try to login and receive a timeout error, then it is very likely that you tried to connect from outside of the ETH network to one of the HPC clusters.

leonhard@calculus:~$ ssh -Y leonhard@euler.ethz.ch
ssh: connect to host euler.ethz.ch port 22: Connection timed out

Please either connect from the inside of the ETH network, or establish a VPN connection.

setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory

If you are using a Mac, can you please try to comment out the following lines in your /etc/ssh/ssh_config on your workstation:

Host *
       SendEnv LANG LC_*

This should solve the problem.

Indirect GLX rendering error

When using an SSH connection with X11 forwarding enabled, newer versions of the Xorg server show an error message, when the graphical user interface of an application is started:

X Error of failed request: BadValue (integer parameter out of range for operation)
  Major opcode of failed request: 153 (GLX)
  Minor opcode of failed request: 3 (X_GLXCreateContext)
  Value in failed request: 0x0
  Serial number of failed request: 27
  Current serial number in output stream: 30

This error is caused by starting your X11 server without enabling the setting for indirect GLX rendering (iglx), that is required for X11 forwarding. Up to version 1.16 of the Xorg server, the setting iglx, has been enabled by default. With version 1.17, the default has changed from +iglx to -iglx. Now the setting needs to be enabled either in the Xorg configuration file or with a command line setting, when starting the Xorg server manually. For Xquartz versions up to 2.7.8, the iglx setting is enabled by default. If you would like to use XQuartz 2.7.9 or newer, then please make sure that you enable the iglx setting when the X-server is started.

This problem is described in the following article:

https://www.phoronix.com/scan.php?page=news_item&px=Xorg-IGLX-Potential-Bye-Bye

Please find below some links, which address the problem for specific operating systems.

Operating system Link
Red Hat Enterprise Linux (RHEL) https://elrepo.org/bugs/view.php?id=610
CentOS https://www.centos.org/forums/viewtopic.php?t=57409#p244528
Ubuntu http://askubuntu.com/questions/745135/how-to-enable-indirect-glx-contexts-iglx-in-ubuntu-14-04-lts-with-nvidia-gfx
Mac OS X https://bugs.freedesktop.org/show_bug.cgi?id=96260

Data management

Introduction

On our cluster, we provide multiple storage systems, which are optimized for different purposes. Since the available storage space on our clusters is limited and shared between all users, we set quotas in order to prevent single users from filling up an entire storage system with their data.

A summary of general questions about file systems, storage and file transfer can be found in our FAQ. If you have questions or encounter problems with the storage systems provided on our clusters or file transfer, then please contact cluster support.

Personal storage (everyone)

Home

On our clusters, we provide a home directory (folder) for every user that can be used for safe long term storage of important and critical data (program source, script, input file, etc.). It is created on your first login to the cluster and accessible through the path

/cluster/home/username

The path is also saved in the variable $HOME. The permissions are set that only you can access the data in your home directory and no other user. Your home directory is limited to 16 GB and a maximum of 100'000 files and directories (inodes). The content of your home is saved every hour and there is also a nightly backup (tape).

Scratch

We also provide a personal scratch directory (folder) for every user, that can be used for short-term storage of larger amounts of data. It is created, when you access it the first time through the path

/cluster/scratch/username

The path is also saved in the variable $SCRATCH. It is visible (mounted), only when you access it. If you try to access it with a graphical tool, then you need to specify the full path as it is might not visible in the /cluster/scratch top-level directory. Before you use your personal scratch directory, please carefully read the usage rules to avoid misunderstandings. The usage rules can also be displayed directly on the cluster with the following command.

cat $SCRATCH/__USAGE_RULES__

Your personal scratch directory has a disk quota of 2.5 TB and a maximum of 1'000'000 files and directories (inodes). There is no backup for the personal scratch directories and they are purged on a regular basis (see usage rules).

Group storage (shareholders only)

Project

Shareholder groups have the option to purchase additional storage inside the cluster. The project file system is designed for safe long-term storage of critical data (like the home directory). Shareholder groups can buy as much space as they need. The path for project storage is

/cluster/project/groupname

Access rights and restriction is managed by the shareholder group. We recommend to use NETHZ groups for this purpose. Backup (tape) is not included, but can be purchase optionally. If you are interested in getting more information and prices of the project storage, then please contact cluster support.

Work

Apart from project storage, shareholder groups also have the option to buy so-called work (high-performance) storage. It is optimized for I/O performance and can be used for short- or medium-term storage for large computations (like scratch, but without regular purge). Shareholders can buy as much space as they need. The path for work storage is

/cluster/work/groupname

Access rights and restriction is managed by the shareholder group. We recommend to use NETHZ groups for this purpose. The directory is visible (mounted), only when accessed. If you are interested in getting more information and prices of the work storage, then please contact cluster support.

Local scratch (on each compute node)

The compute nodes in our HPC clusters also have some local hard drives, which can be used for temporary storing data during a calculation. The main advantage of the local scratch is, that it is located directly inside the compute nodes and not attached via the network. This is very beneficial for serial, I/O-intensive applications. The path of the local scratch is

/scratch

You can either create a directory in local scratch yourself, as part of a batch job, or you can use a directory in local scratch, which is automatically created by the batch system. LSF creates a unique directory in local scratch for every job. At the end of the job, LSF is also taking care of cleaning up this directory. The path of the directory is stored in the environment variable

$TMPDIR

If you use $TMPDIR, then you need to request scratch space from the batch system.

External storage

Central NAS

Groups who have purchased storage on the central NAS of ETH can ask the storage group of IT services to export it to our HPC clusters. There are certain requirements that need to be fullfilled in order to use central NAS shares on our HPC clusters.

  • The NAS share needs to be mountable via NFS (shares that only support CIFS cannot be mounted on the HPC clusters).
  • The NAS share needs to be exported to the subnet of our HPC clusters (please contact ID Systemdienste and ask them for an NFS export of your NAS share).
  • Please carefully set the permissions of the files and directories on your NAS share if other cluster users should not have read/write access to your data.

NAS shares are then mounted automatically when you access them. The mount-point of such a NAS share is

/nfs/servername/sharename

A typical NFS export file to export a share to the Euler cluster would look like

# cat /etc/exports
/export 129.132.71.59/32(rw,root_squash,secure) 129.132.93.64/26(rw,root_squash,secure) 10.205.0.0/19(rw,root_squash,secure) 10.205.96.0/19(rw,root_squash,secure)

A typical NFS export file to export a share to the Leonhard cluster would look like

# cat /etc/exports
/export 129.132.248.224/27(rw,root_squash,secure) 10.204.0.0/19(rw,root_squash,secure)

It is also possible to use a single export for both clusters

# cat /etc/exports
/export 129.132.71.59/32(rw,root_squash,secure) 129.132.93.64/26(rw,root_squash,secure) 10.205.0.0/19(rw,root_squash,secure) 10.205.96.0/19(rw,root_squash,secure) 129.132.248.224/27(rw,root_squash,secure) 10.204.0.0/19(rw,root_squash,secure) 

If you ask the storage group to export your share to the Euler cluster, then please provide them the above-shown information. When a NAS share is mounted on our HPC clusters, then it is accessible from all the compute nodes in the cluster.

Local NAS

Groups who are operating their own NAS, can export a shared file system via NSF to our HPC clusters. In order to use an external NAS on our HPC clusters, the following requirements need to be fullfilled

  • NAS needs to support NFSv3 (this is currently the only NFS version that is supported from our side).
  • The user and group ID's on the NAS needs to be consistent with NETHZ user names and group.
  • The NAS needs to be exported to the subnet of our HPC clusters.
  • Please carefully set the permissions of the files and directories on your NAS share if other cluster users should not have read/write access to your data.

You external NAS can then be accessed through the mount-point

/nfs/servername/sharename

A typical NFS export file to export a share to the Euler cluster would look like

# cat /etc/exports
/export 129.132.71.59/32(rw,root_squash,secure) 129.132.93.64/26(rw,root_squash,secure) 10.205.0.0/19(rw,root_squash,secure) 10.205.96.0/19(rw,root_squash,secure)

A typical NFS export file to export a share to the Leonhard cluster would look like

# cat /etc/exports
/export 129.132.248.224/27(rw,root_squash,secure) 10.204.0.0/19(rw,root_squash,secure)

It is also possible to use a single export for both clusters

# cat /etc/exports
/export 129.132.71.59/32(rw,root_squash,secure) 129.132.93.64/26(rw,root_squash,secure) 10.205.0.0/19(rw,root_squash,secure) 10.205.96.0/19(rw,root_squash,secure) 129.132.248.224/27(rw,root_squash,secure) 10.204.0.0/19(rw,root_squash,secure) 

The share is automatically mounted, when accessed.

Comparison

In the table below, we try to give you an overview of the available storage categories/systems on our HPC clusters as well as a comparison of their features.

Category Mount point Life span Backup Purged Max. size Small files Large files
Home /cluster/home permanent yes no 16 GB + o
Scratch /cluster/scratch 4 years no yes (files older than 15 days) 2.5 TB o ++
Project /cluster/project 4 years optional no flexible + +
Work /cluster/work 4 years no no flexible ++ o
Central NAS /nfs/servername/sharename flexible yes no flexible + +
Local scratch /scratch duration of job no end of job 800 GB ++ +

Choosing the optimal storage system

When working on an HPC cluster that provides different storage categories/systems, the choice of which system to use can have a big influence of the performance of your workflow. In the best case you can speedup your workflow by quite a lot, whereas in the worst case the system administrator has to kill all your jobs and has to limit the number of concurrent jobs that you can run because your jobs slow down the entire storage system and this can affect other users jobs. Please take into account a few recommendations that are listed below.

  • Use local scratch whenever possible. With a few exceptions this will give you the best performance in most cases.
  • For parallel I/O with large files, the high-performance (work) storage will give you the best performance.
  • Don't create a large number of small files (KB's) on project or work storage as this could slow down the entire storage system.
  • If your application does very bad I/O (opening and closing files multiple times per second and doing small appends on the order of a few bytes), then please don't use project and work storage. The best option for this use-case would be local scratch.

If you need to work with a large amount of small files, then please keep them grouped in a tar archive. During a job you can then untar the files to the local scratch, process them and group the results again in a tar archive, which can then be copied back to your home/scratch/work/project space.

File transfer

In order to run your jobs on a HPC cluster, you need to transfer some data or input files from/to the cluster. For smaller and medium amounts of data, you can use some standard command line/graphical tools. If you need to transfer very large amounts of data (on the order of several TB), then please contact the cluster support and we will help you to set up the optimal strategy to transfer your data in a reasonable amount of time.

Command line tools

For transferring files from/to the cluster, we recommend to use standard tools like secure copy (scp) or rsync. The general syntax for using scp is

scp [options] source destination

For copying a file from your PC to an HPC cluster (to your home directory), you need to run the following command on your PC:

scp file username@hostname:

Where username is your NETHZ username and hostname is the hostname of the cluster. Please note the colon after the hostname. For copying a file from the cluster to your PC (current directory), you need to run the following command on your PC:

scp username@hostname:file .

For copying an entire directory, you would need to add the option -r. Therefore you would use the following command to transfer a directory from your PC to an HPC cluster (to your home directory).

scp -r directory username@hostname:

The general sytnax for rsync is

rsync [options] source destination

In order to copy the content of a directory from your PC (home directory) to a cluster (home directory), you would use the following command.

rsync -Pav /home/username/directory/ username@hostname:/cluster/home/username/directory

The -P option enables rsync to show the progress of the file transfer. The -a option preserves almost all file attributes and the -v option gives you more verbose output.

Graphical tools

Graphical scp/sftp clients allow you to mount your Euler home directory on your workstation. These clients are available for most operating systems.

  • Linux + Gnome: Connect to server
  • Linux + KDE: Konqueror, Dolphin, Filezilla
  • Mac OS X: MacFUSE, Macfusion, Cyberduck, Filezilla
  • Windows: WinSCP, Filezilla'

WinSCP provides the user a Windows explorer like user interface with a split screen that allows to transfer files per drag-and-drop. After starting your graphical scp/sftp client, you need to specify the hostname of the cluster that you would like to connect to and then click on the connect button. After entering your NETHZ username and password, you will be connected to the cluster and can transfer files.

WinSCP
Winscp1.png Winscp2.png
Filezilla
Filezilla1.png Filezilla2.png

Quotas

The home and scratch directories on our clusters are subject to a strict user quota. In your home directory, the soft quota for the amount of storage that you can use is set to 16 GB and the hard quota is set to 20 GB. Further more, you can have maximally 100'000 files and directories (inodes). You can check your current usage with the quota -s command.

[leonhard@euler01 ~]$ quota -s
Disk quotas for user leonhard (uid 27182): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
eu-ne-home6:/home6
                 3141M  16384M  20480M             5926   80000    100k       

The hard quota for your personal scratch directory is set to 2.5 TB. You can maximally have 1'000'000 files and directories (inodes).

[leonhard@euler01 ~]$ pan_quota /cluster/scratch/leonhard
  <GB>  <soft>  <hard> : <files> <soft>  <hard> :         <path to volume> <pan_identity(name)>
 45.27 2000.00 2500.00 :      37 800000 1000000 : /cluster/scratch/leonhard uid:27182(leonhard)

If you reach 80% of your quota (number of files or storage) in your personal scratch directory, you will be informed via email to clean up.

Setting up the Environment

Introduction

Most applications, compilers and libraries rely on environment variables to function properly. These variables are usually set by the operating system, the administrator, or by the user. Typical examples include:

  • PATH — location of system commands and user programs
  • LD_LIBRARY_PATH — location of the dynamic (=shared) libraries needed by these commands and programs
  • MANPATH — location of man (=manual) pages for these commands
  • Program specific environment variables

The majority of problems encountered by users are caused by incorrect or missing environment variables. People often copy initialization scripts — .profile, .bashrc, .cshrc — from one machine to the next, without verifying that the variables defined in these scripts are correct (or even meaningful!) on the target system.

If setting environment variables is difficult, modifying them at run-time is even more complex and error-prone. Changing the contents of PATH to use a different compiler than the one set by default, for example, is not for the casual user. The situation can quickly become a nightmare when one has to deal with multiple compilers and libraries (e.g. MPI) at the same time.

Environment modules — modules in short — offer an elegant and user-friendly solution to all these problems. Modules allow a user to load all the settings needed by a particular application on demand, and to unload them when they are no longer needed. Switching from one compiler to the other; or between different releases of the same application; or from one MPI library to another can be done in a snap, using just one command — module.

Module commands

Module avail

The module avail command lists all available modules of the supported module category. If you load the new or the legacy module, it will also list all modules of these categories. It can be used to get a quick overview of all centrally installed software. If you are interested in a particular software and would like to know which versions are available, then you can specify the name of the software as a parameter for the module avail command

[leonhard@euler01 ~]$ module avail gcc

--------------- /cluster/apps/modules/modulefiles ---------------
gcc/4.4.7(4.4)     gcc/4.8.2(default) gcc/4.9.2
[leonhard@euler01 ~]$ module load legacy new
[leonhard@euler01 ~]$ module avail gcc 

--------------- /cluster/apps/modules/modulefiles ---------------
gcc/4.4.7(4.4)     gcc/4.8.2(default) gcc/4.9.2

----------------- /cluster/apps/modules/legacy ------------------
gcc/4.7.4

------------------- /cluster/apps/modules/new -------------------
gcc/4.8.4 gcc/5.2.0

Module show

The module show command provides you some information on what environment variables are changed and set by the module file. Further more it also contains a short string with information about the name and the version of the application or library.

[leonhard@euler01 ~]$ module show python/2.7.6
-------------------------------------------------------------------
/cluster/apps/modules/modulefiles/python/2.7.6:

module-whatis    Python version 2.7.6 (x86_64) 
prepend-path     PATH /cluster/apps/python/2.7.6/x86_64/bin 
prepend-path     LD_LIBRARY_PATH /cluster/apps/python/2.7.6/x86_64/lib64 
prepend-path     PKG_CONFIG_PATH /cluster/apps/python/2.7.6/x86_64/lib64/pkgconfig 
setenv           PYTHON_ROOT /cluster/apps/python/2.7.6/x86_64 
-------------------------------------------------------------------

Module load

The module load command load the corresponding and prepares the environment for using this application or library, by applying the instructions, which can be shown by running the module show command.

[leonhard@euler01 ~]$ module load gcc/4.8.2 python/2.7.6
Autoloading openblas/0.2.13_seq
[leonhard@euler01 ~]$ which python
/cluster/apps/python/2.7.6/x86_64/bin/python

Module list

The module list command displays the currently loaded modules files.

[leonhard@euler04 ~]$ module list
Currently Loaded Modulefiles:
  1) modules
[leonhard@euler04 ~]$ module load gcc/4.8.2 python/2.7.6
Autoloading openblas/0.2.13_seq
[leonhard@euler04 ~]$ module list
Currently Loaded Modulefiles:
  1) modules                            3) openblas/0.2.13_seq(default:seq)
  2) gcc/4.8.2(default:4.8)             4) python/2.7.6(2.7)

Module purge

The module purge command unload all currently loaded modules and cleans up the environment of your shell. In some cases, it might be better to log out and log in again, in order to get a really clean shell.

[leonhard@euler04 ~]$ module list
Currently Loaded Modulefiles:
  1) modules                            3) openblas/0.2.13_seq(default:seq)
  2) gcc/4.8.2(default:4.8)             4) python/2.7.6(2.7)
[leonhard@euler04 ~]$ module purge
[leonhard@euler04 ~]$ module list
No Modulefiles Currently Loaded.


Naming scheme

Please find the general naming scheme of module files below.

program_name/version(alias[:alias2])

Instead of specifying a version directly, it is also possible to use aliases.

program_name/alias == program_name/version

The special alias default indicates which version is taken by default (if neither version nor alias is specified)

program_name/default == program_name

If no default is specified for a particular software, then the most recent version (i.e. that with the largest number) is taken by default.

LMOD

For the Leonhard cluster, we decided to switch from the environment modules that are used on the Euler cluster to Lmod modules, which provide some additional features. You should barely notice the transition from environment modules to Lmod modules as the commands are mostly the same. Therefore please refer to the Setting up your environment tutorial for a general documentation about the module commands.

[leonhard@lo-login-02 ~]$ module avail boost

----------------------------------------- /cluster/apps/lmodules/Compiler/gcc/4.8.5 ------------------------------------------
   boost/1.63.0

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".


[leonhard@lo-login-02 ~]$ module load boost/1.63.0
[leonhard@lo-login-02 ~]$ module list

Currently Loaded Modules:
  1) gcc/4.8.5   2) StdEnv   3) boost/1.63.0

[leonhard@lo-login-02 ~]$ 

Please note that this is work in progress and the module names might change. Currently, the number of software packages provided on Leonhard is not comparable to the software we provide on the Euler cluster, but it will grow over time.

Hierarchical modules

LMOD allows to define a hierarchy of modules containing 3 layers (Core, Compiler, MPI). The core layer contains all module files which are not depending on any compiler/MPI. The compiler layer contains all modules which are depending on a particular compilers, but not on any MPI library. The MPI layer contains modules that are depending on a particular compiler/MPI combination.

When you login to the Leonhard cluster, the standard compiler gcc/4.8.5 is automatically loaded. Running the module avail command displays all modules that are available for gcc/4.8.5. If you would like to see the modules available for a different compiler, for instance gcc/6.3.0, then you would need to load the compiler module and run module avail again. For checking out the available modules for gcc/4.8.5 openmpi/2.1.0, you would load the corresponding compiler and MPI module and run again module avail'.

As a consequence of the module hierachy, you can never have two different versions of the same module loaded at the same time. This helps to avoid problems arising due to misconfiguration of the environment.

Application life-cycle

Based on application experience on Brutus we offer, besides the currently supported versions, two new categories of modules for new and legacy versions. Due to dependencies between compilers, libraries and applications, changes to the applications and the corresponding modules need to be synchronized and will be done on a quarterly basis.

Life-cycle of an application

Modules for new or experimental versions of an application/library/compiler first appear in the new module category, where we provide a partial support matrix. Specific compiler/library combinations can be requested by shareholders. New modules are not visible by default. If you would like to see which new versions are available or try them out, you will need to load the new module first:

module load new

Applications that have passed all tests and are deemed ready for production (stable, bug-free, compatible with LSF, etc.) will be moved to the supported category in the next quarterly update.

Applications that have become obsolete (no longer supported by the vendor, superseded by new versions with more functionality, etc.) will be moved to the legacy category in the next quarterly update. For these modules the HPC group can only provide limited support. Legacy modules are not visible by default. If you still need to use them, you will need to load the legacy module first:

module load legacy

Applications that are known to be buggy, have become redundant or whose license have expired will be removed in the next quarterly update. If you still need to use them, please contact cluster support.

User notification

The HPC group updates the module categories on a quarterly basis (February, May, August, November). Therefore the Application life-cycle page contains a table listing all applications that are available on Brutus as well as the modifications that we plan to apply at the next quarterly change. The users will receive a reminder, one week prior to the update of the categories that will also contain information about the most important changes.

Application tables

We have listed all available modules on the different HPC clusters in separate tables, which also contain a special formatting that indicate actions taken in the next quarterly change

Using the batch system

Introduction

On our HPC cluster, we use the IBM LSF (Load Sharing Facility) batch system. A basic knowledge of LSF is required if you would like to work on the HPC clusters. The present article will show you how to use LSF to execute simple batch jobs and give you an overview of some advanced features that can dramatically increase your productivity on a cluster.

Using a batch system has numerous advantages:

  • single system image — all computing resources in the cluster can be accessed from a single point
  • load balancing — the workload is automatically distributed across all available processors
  • exclusive use — many computations can be executed at the same time without affecting each other
  • prioritization — computing resources can be dedicated to specific applications or people
  • fair share — a fair allocation of those resources among all users is guaranteed

In fact, our HPC clusters contains so many processors (30,000) and are used by so many people (more than 2,000) that it would be impossible to use it efficiently without a batch system.

All computations on our HPC cluster must be submitted to the batch system. Please do not run any job interactively on the login nodes, except for testing or debugging purposes.

Basic job submission

Simple commands and programs

Submitting a job to the batch system is as easy as:

bsub command [arguments]
bsub /path/to/program [arguments] 

Examples:

[leonhard@euler03 ~]$ bsub gzip big_file.dat
Generic job.
Job <8146539> is submitted to queue <normal.4h>.
[leonhard@euler03 ~]$ bsub ./hello_world
Generic job.
Job <8146540> is submitted to queue <normal.4h>.

Two or more commands can be combined together by enclosing them in quotes:

bsub "command1; command2"

Example:

[leonhard@euler03 ~]$ bsub "configure; make; make install"
Generic job.
Job <8146541> is submitted to queue <normal.4h>.

Quotes are also necessary if you want to use I/O redirection (">", "<"), pipes ("|") or conditional operators ("&&", "||"):

bsub "command < data.in > data.out"
bsub "command1 | command2"

Examples:

[leonhard@euler03 ~]$ bsub "tr ',' '\n' < comma_separated_list > linebreak_separated_list"
Generic job.
Job <8146542> is submitted to queue <normal.4h>.
[leonhard@euler03 ~]$ bsub "cat unsorted_list_with_redundant_entries | sort | uniq > sorted_list"
Generic job.
Job <8146543> is submitted to queue <normal.4h>.

Shell scripts

More complex commands may be placed in a shell script, which should then be submitted like this:

bsub < script

Example:

[leonhard@euler03 ~]$ bsub < hello.sh
Generic job.
Job <8146544> is submitted to queue <normal.4h>.

In principle, it is also possible to submit a script as if it were a program:

bsub /path/to/scriptBAD IDEA!

however this syntax is strongly discouraged on our clusters because it does not allow the batch system to "see" what your script is doing, which may lead to errors in the submission and/or execution of your job.

Output file

By default your job's output (or standard output, to be precise) is written into a file named lsf.oJobID in the directory where you executed bsub, where JobID is the number assigned to your job by LSF. You can select a different output file using the option:

bsub -o output_file command [argument]

The option -o output_file tells LSF to append your job's output to output_file. If you want to overwrite this file, use:

bsub -oo output_file ...

Note that this option, like all bsub options, must be placed before the command that you want to execute in your job. A common mistake is to place bsub options in the wrong place, like.

bsub command -o output_fileWRONG!

Batch interactive job

If you just want to run a quick test, you can submit it as a batch interactive job. In this case the job's output is not written into a file, but directly to your terminal, as if it were executed interactively:

bsub -I command [arguments]

Example:

[leonhard@euler03 ~]$ bsub -I "env | sort"
Generic job.
Job <8146545> is submitted to queue <normal.4h>.
<<Waiting for dispatch ...>

Resource requirements

By default, a batch job can use only one processor for up to 4 hours. (The job is killed when it reaches its run-time limit.) If your job needs more resources — time, processors, memory or scratch space —, you must request them when you submit it.

Wall-clock time

The time limits on our clusters are always based on wall-clock (or elapsed) time. You can specify the amount of time needed by your job using the option:

bsub -W minutes ...                  example:  bsub -W 90 ...
bsub -W HH:MM ...                    example:  bsub -W 1:30 ...

Examples:

[leonhard@euler03 ~]$ bsub -W 20 ./Riemann_zeta -arg 26
Generic job.
Job <8146546> is submitted to queue <normal.4h>.
[leonhard@euler03 ~]$ bsub -W 20:00 ./solve_Koenigsberg_bridge_problem
Generic job.
Job <8146547> is submitted to queue <normal.24h>.

Since our clusters contains processors with different speeds two similar jobs will not necessarily take the same time to complete. It is therefore safer to request more time than strictly necessary... but not too much, for shorter jobs have generally a higher priority than longer ones.

The maximum run-time for jobs that can run on most compute nodes in the cluster is 240 hours. We remain the right to stop jobs with a run time of more than 5 days in case of an emergency maintenance.

Number of processor cores

If your job requires multiple processors (or threads), you must request them using the option:

bsub -n number_of_procs ...

Note that merely requesting multiple processors does not mean that your application will use them.

Memory

By default the batch system allocates 1024 MB (1 GB) of memory per processor core. A single-core job will thus get 1 GB of memory; a 4-core job will get 4 GB; and a 16-core job, 16 GB. If your computation requires more memory, you must request it when you submit your job:

bsub -R "rusage[mem=XXX]" ...

Example:

[leonhard@euler03 ~]$ bsub -R "rusage[mem=2048]" ./evaluate_gamma -precision 10e-30
Generic job.
Job <8146548> is submitted to queue <normal.4h>.

where XXX is the amount of memory needed by your job, in MB per processor.

Scratch space

LSF automatically creates a local scratch directory when your job starts and deletes it when the job ends. This directory has a unique name, which is passed to your job via the variable $TMPDIR.

Unlike memory, the batch system does not reserve any disk space for this scratch directory by default. If your job is expected to write large amounts of temporary data (say, more than 250 MB) into $TMPDIR — or anywhere in the local /scratch file system — you must request enough scratch space when you submit it:

bsub -R "rusage[scratch=YYY]" ...

Example:

[leonhard@euler03 ~]$ bsub -R "rusage[scratch=5000]" ./generating_Euler_numbers -num 5000000
Generic job.
Job <8146548> is submitted to queue <normal.4h>.

where YYY is the amount of scratch space needed by your job, in MB per processor.

Note that /tmp is reserved for the operating system. Do not write temporary data there! You should either use the directory created by LSF ($TMPDIR) or create your own temporary directory in the local /scratch file system; in the latter case, do not forget to delete this directory at the end of your job.

Multiple requirements

It is possible to combine memory and scratch requirements:

bsub -R "rusage[mem=XXX]" -R "rusage[scratch=YYY]" ...

is equivalent to:

bsub -R "rusage[mem=XXX,scratch=YYY]" ...

LSF submission line advisor

For users that are not yet very experienced with using a batch system, we provide a small helper tool, which simplifies to setup the command for requesting resources from the batch system in order to submit a job.

https://scicomp.ethz.ch/lsf_submission_line_advisor

GPU

Please note that currently only the Leonhard cluster contains GPUs, but the Euler cluster does not. Unlike Euler, which is open to all members of ETH without restriction, Leonhard is reserved exclusively to the groups who have invested in it (the so-called shareholders). Therefore the following information is only relevant for Leonhard shareholders.

All GPUs in Leonhard are configured in Exclusive Process mode. The GPU nodes have 20 cores, 8 GPUs, and 256 GB of RAM (of which only about 210 GB is usable). To run multi-node job, you will need to request span[ptile=20].

The LSF batch system has partial integrated support for GPUs. To use the GPUs for a job node you need to request the ngpus_excl_p resource. It refers to the number of GPUs per node. This is unlike other resources, which are requested per core.

For example, to run a serial job with one GPU,

bsub -R "rusage[ngpus_excl_p=1]" ./my_cuda_program

or on a full node with all eight GPUs and up to 90 GB of RAM,

bsub -n 20 -R "rusage[mem=4500,ngpus_excl_p=8]" ./my_cuda_program

or on two full nodes:

bsub -n 40 -R "rusage[mem=4500,ngpus_excl_p=8] span[ptile=20]" ./my_cuda_program

While your jobs will see all GPUs, LSF will set the CUDA_VISIBLE_DEVICES environment variable, which is honored by CUDA programs.

For advanced settings, please have a look at our getting started with GPUs page.

Parallel job submission

Before submitting parallel jobs, please make sure that your application can run in parallel at all in order to not waste resources by requesting multiple cores for a serial application. Further more, please do a short scaling analysis to see how well your code scales in parallel before requesting dozens or hundreds of cores.


OpenMP

If your application is parallelized using OpenMP or linked against a library using OpenMP (Intel MKL, OpenBLAS, etc.), the number of processors (or threads) that it can use is controlled by the environment variable OMP_NUM_THREADS. This variable must be set before you submit your job:

export OMP_NUM_THREADS=number_of_processors
bsub -R "span[ptile=number_of_processors]" -n number_of_processors ...

NOTE: if OMP_NUM_THREADS is not set, your application will either use one processor only, or will attempt to use all processors that it can find, stealing them from other jobs if needed. In other words, your job will either use too few or too many processors.

MPI

Three kinds of MPI libraries are available on our cluster: Open MPI (recommended), MVAPICH2 and Intel MPI. Before you can submit and execute an MPI job, you must load the corresponding modules (compiler + MPI, in that order):

module load compiler
module load mpi_library

The command used to launch an MPI application is mpirun.

Let's assume for example that hello_world was compiled with PGI 15.1 and linked with Open MPI 1.6.5. The command to execute this job on 4 processors is:

module load pgi/15.1
module load open_mpi/1.6.5
bsub -n 4 mpirun ./hello_world

Note that mpirun automatically uses all processors allocated to the job by LSF. It is therefore not necessary to indicate this number again to the mpirun command itself:

bsub -n 4 mpirun -np 4 ./hello_world      ←  "-np 4" not needed!

Euler III nodes are targeted to serial and shared-memory parallel jobs, but multi-node parallel jobs are still accepted.

You need to tell the system that Infiniband is not available,

module load interconnect/ethernet

before loading the MPI module. Then you need to request at most four cores per node:

bsub -R "span[ptile=4] select[maxslots==4]" [other bsub options] ./my_command
Open MPI
Open MPI 1.6.5 has been tested to work with acceptable performance.
Open MPI 2.0.2 has been tested to work
MVAPICH2
MVAPICH2 2.1 works but preliminary results show low scalability. You need to load the interconnect/ethernet module.
Intel MPI
Intel MPI 5.1.3 has been tested.

Pthreads and other threaded applications

Their behavior is similar to OpenMP applications. It is important to limit the number of threads that the application spawns. There is no standard way to do this, so be sure to check the application's documentation on how to do this. Usually a program supports at least one of four ways to limit itself to N threads:

  • it understands the OMP_NUM_THREADS=N environment variable,
  • it has its own environment variable, such as GMX_NUM_THREADS=N for Gromacs,
  • it has a command-line option, such as -nt N (for Gromacs), or
  • it has an input-file option, such as num_threads N.

If you are unsure about the program's behavior, please contact us and we will analyze it.

Hybrid jobs

It is possible to run hybrid jobs that mix MPI and OpenMP on our HPC clusters, but we strongly recommend to not submit these kind of jobs.

Full-node jodes

(Only on Euler) If you need to run a job that will use a full node, then request the fullnode resource and request a number of cores that is a multiple of 24 or 36 cores:

bsub -n 48 -R fullnode ./my_job

Such a job will only run on two 24-core nodes.

Job monitoring

Please find below a table with commands for job monitoring and job control

Command Description
busers user limits, number of pending and running jobs
bqueues queues status (open/closed; active/inactive)
bjobs more or less detailed information about pending, running and recently finished jobs
bbjobs better bjobs (bjobs with human readable output)
bhist information about jobs that finished in the last hours/days
bpeek display the standard output of a given job
lsf_load show the CPU load of all nodes used by a job
bjob_connect login to a node where one of your jobs is running
bkill kill a job

For an overview on the most common options for the LSF commands, please have a look at the LSF mini reference.

bjobs

The bjobs command allows you to get information about pending, running and shortly finished jobs.

bbjobs

The command bbjobs can be used to see the resource request and usage (cpu, memory, swap, etc.) of any specific job.

bbjobs [-u username -r -a -s -d -p -f -l -P] JOBID
Option Description
(no option) List your jobs — information, requested resources and usage.
-u username user username.
-r Show only running jobs.
-a Show all jobs.
-s Show only suspended jobs.
-d Show only jobs that ended recently (done).
-p Show only pending jobs.
-f Show job cpu affinity, which cores it is running.
-l Show job information in log format.

Example of output for bbjobs:

[leonhard@euler08 ~]$ bbjobs 31989961
Job information
 Job ID                          : 31989961
 Status                          : RUNNING
 Running on node                 : e1268 
 User                            : leonhard
 Queue                           : normal.4h
 Command                         : compute_pq.py
 Working directory               : $HOME/testruns
Requested resources
 Requested cores                 : 1
 Requested memory                : 1024 MB per core
 Requested scratch               : not specified
 Dependency                      : -
Job history
 Submitted at                    : 08:45 2016-11-15
 Started at                      : 08:48 2016-11-15
 Queue wait time                 : 140 sec
Resource usage
 Updated at                      : 08:48 2016-11-15
 Wall-clock                      : 34 sec
 Tasks                           : 4
 Total CPU time                  : 5 sec
 CPU utilization                 : 80.0 %
 Sys/Kernel time                 : 0.0 %
 Total resident memory           : 2 MB
 Resident memory utilization     : 0.2 %

bjob_connect

Sometimes it is necessary to monitor the job on the node(s) where it is running. On Euler, compute nodes can not be accessed directly via ssh. To access a node where a job is running the tool bjob_connect should be used.

bjob_connect JOBID [SSH OPTIONS]

The tool will connect directly to the node where the job is running. In the case of multi-node runs, a list of nodes will be printed and one should be chosen to be accessed.

Connections to nodes created via bjob_connect must be ended explicitly (exit from terminal) by the user when done with job monitoring.

Troubleshooting

Bsub rejects my job

If the error message is not self-explanatory, then please report it to the cluster support.

My job is stick in the queue since XXX hours/days

Please try to find out, why the job is pending. You can do this with the following command.

bjobs -p

Individual host-based reasons means that the resources requested by your jobs are not available at this time. Some resources may never become available ( e.g. mem=10000000). Some resource requirements may be mutually exclusive.

My job was sent to the purgatory queue

The purgatory queue is designed to catch jobs that were not submitted properly, either due to a user error or a bug in the batch system. Please always report this type of problem to the cluster support.

Applications

We provide a wide range of centrally installed commercial and open source applications and libraries to our cluster users.

Central installations

Applications and libraries that are used by many people from different departments of ETH (e.g. MATLAB, Comsol, Ansys, etc.) or that are explicitly requested by a shareholder group will be installed centrally in /cluster/apps. Providing a software stack of centrally installed applications and libraries gives the users certain advantages.

  • Applications and libraries are visible and accessible to all users via environment modules.
  • They are maintained by the ID SIS HPC group (or in a few cases also by users).
  • Commercial licenses are provided by the central license administration of ETH (IT shop).

If an application or library is only used by a few people, then we recommend the users to install it locally in their home directory. In case you need help to install an application or library locally, please do not hesitate to contact cluster support.

Application life-cycle management

In order to deal with software stack of more than 200 applications and libraries (and in many cases multiple versions per application), we use a set of environment module categories for applications life-cycle management. Versions that are installed at any time will never be deleted (unless the cluster as a whole is end-of-life). When they get obsolete, they will be moved to the legacy module category, but they will still remain available. This way users can reproduce calculations that they have done years ago (if the software itself allows to exactly reproduce results, when the same version is used at a later stage). More information about our application life-cycle management can be found on the corresponding wiki page.

Commercial/Open source

Please find below a selection of centrally installed applications that covers a wide range of research fields

Bioinformatics
Bioconductor, BLAST, Bowtie, CLC Genomics Server, FSL, RAxML, TopHat
Computation fluid dynamics
Ansys CFX, Ansys FLUENT, OpenFOAM, STAR-CD, STAR-CCM+
Finite element methods
Ansys, Abaqus, Deal.II, FENiCS, FreeFem++, MSC Marc
Multi-physics phenomena
Ansoft Maxwell, COMSOL multiphysics, Trilinos
Quantum chemistry and molecular dynamics
ADF, CP2K, Gaussian, NWChem, Orca, QChem, Quantum Espresso, Turbomole
Symbolic, numberical and statistical mathematics
Gurobi, Maple, Mathematica, MATLAB, R, Stata
Visualization
Ffmpeg, ParaView, VisIT, VTK

For a complete list of applications and libraries available on our HPC cluster, you can have a look at our application tables:

Development

We also provide a range of programs or libraries for the development of software, in order to allow users to write their own codes and to run and profile them on the cluster.

Compiler
GCC, Intel, LLVM, PGI
Scientific libraries
ACML, BOOST, Deal.II, Eigen, FFTW, GMP, GSL, HDF5, MKL, MPFR, NetCDF, NumPy, OpenBLAS, PETSc, SciPy
MPI libraries
Open MPI, MVAPICH2
Build systems
GNU Autotools, CMake, QMake, Make, SCons
Version control
SVN, Git, Mercurial, CVS