Storage systems

From ScientificComputing
Revision as of 09:27, 21 October 2022 by Sfux (talk | contribs)

Jump to: navigation, search

Introduction

On our cluster, we provide multiple storage systems, which are optimized for different purposes. Since the available storage space on our clusters is limited and shared between all users, we set quotas in order to prevent single users from filling up an entire storage system with their data.

A summary of general questions about file systems, storage and file transfer can be found in our FAQ. If you have questions or encounter problems with the storage systems provided on our clusters or file transfer, then please contact cluster support.

Personal storage (everyone)

Home

On our clusters, we provide a home directory (folder) for every user that can be used for safe long term storage of important and critical data (program source, script, input file, etc.). It is created on your first login to the cluster and accessible through the path

/cluster/home/username

The path is also saved in the variable $HOME. The permissions are set that only you can access the data in your home directory and no other user. Your home directory is limited to 16 GB and a maximum of 160'000 files and directories (inodes). The content of your home is saved every hour and there is also a nightly backup (tape).

Scratch

We also provide a personal scratch directory (folder) for every user, that can be used for short-term storage of larger amounts of data. It is created, when you access it the first time through the path

/cluster/scratch/username

The path is also saved in the variable $SCRATCH. It is visible (mounted), only when you access it. If you try to access it with a graphical tool, then you need to specify the full path as it is might not visible in the /cluster/scratch top-level directory. Before you use your personal scratch directory, please carefully read the usage rules to avoid misunderstandings. The usage rules can also be displayed directly on the cluster with the following command.

cat $SCRATCH/__USAGE_RULES__

Your personal scratch directory has a disk quota of 2.5 TB and a maximum of 1'000'000 files and directories (inodes). There is no backup for the personal scratch directories and they are purged on a regular basis (see usage rules).

For personal scratch directories, there are two limits (soft and hard quota). When reaching the soft limit (2.5 TB) there is a grace period of one week where users can use 10% more than their allowed capacity (this upper limit is called hard quota), which applies to both, the number of inodes and space. If the used capacity is still above the soft limit after the grace period, then the current directory is locked for new writes until being again below the soft quota.

Group storage (shareholders only)

Project

Shareholder groups have the option to purchase additional storage inside the cluster. The project file system is designed for safe long-term storage of critical data (like the home directory). Shareholder groups can buy as much space as they need. The path for project storage is

/cluster/project/groupname

Access rights and restriction is managed by the shareholder group. We recommend to use ETH groups for this purpose. If you are interested in getting more information and prices of the project storage, then please contact cluster support.

Work

Apart from project storage, shareholder groups also have the option to buy so-called work (high-performance) storage. It is optimized for I/O performance and can be used for short- or medium-term storage for large computations (like scratch, but without regular purge). Shareholders can buy as much space as they need. The path for work storage is

/cluster/work/groupname

Access rights and restriction is managed by the shareholder group. We recommend to use ETH groups for this purpose. The directory is visible (mounted), only when accessed. If you are interested in getting more information and prices of the work storage, then please contact cluster support.

For /cluster/work directories, there are two limits (soft and hard quota). When reaching the soft limit there is a grace period of one week where users can use 10% more than their allowed capacity (this upper limit is called hard quota), which applies to both, the number of inodes and space. If the used capacity is still above the soft limit after the grace period, then the current directory is locked for new writes until being again below the soft quota.

Local scratch (on each compute node)

The compute nodes in our HPC clusters also have some local hard drives, which can be used for temporary storing data during a calculation. The main advantage of the local scratch is, that it is located directly inside the compute nodes and not attached via the network. This is very beneficial for serial, I/O-intensive applications. The path of the local scratch is

/scratch

You can either create a directory in local scratch yourself, as part of a batch job, or you can use a directory in local scratch, which is automatically created by the batch system. Slurm creates a unique directory in local scratch for every job. At the end of the job, Slurm is also taking care of cleaning up this directory. The path of the directory is stored in the environment variable

$TMPDIR

If you use $TMPDIR, then you need to request scratch space from the batch system.

External storage

Please note that external storage is convenient to bring data in to the cluster or to store data for a longer time. But we recommend to not directly process data from external storage systems in batch jobs on Euler as this could be very slow and potentially put a high load on the external storage system. Please rather copy data from the external storage system to some cluster storage (home directory, personal scratch directory, project storage, work storage, or local scratch) before you process it in a batch job. After processing the data from a cluster storage system, you can copy the results back to the external storage system.

Central NAS/CDS

Groups who have purchased storage on the central NAS of ETH or CDS can ask the storage group of IT services to export it to our HPC clusters. There are certain requirements that need to be fulfilled in order to use central NAS/CDS shares on our HPC clusters.

  • The NAS/CDS share needs to be mountable via NFS (shares that only support CIFS cannot be mounted on the HPC clusters).
  • The NAS/CDS share needs to be exported to the subnet of our HPC clusters (please contact ID Systemdienste and ask them for an NFS export of your NAS/CDS share).
  • Please carefully set the permissions of the files and directories on your NAS/CDS share if other cluster users should not have read/write access to your data.

NAS/CDS shares are then mounted automatically when you access them. The mount-point of such a NAS/CDS share is

/nfs/servername/sharename

A typical NFS export file to export a share to the Euler cluster would look like

# cat /etc/exports
/export 129.132.93.64/26(rw,root_squash,secure) 10.205.0.0/16(rw,root_squash,secure) 10.204.0.0/16(rw,root_squash,secure)

If you ask the storage group to export your share to the Euler cluster, then please provide them the above-shown information. If the NAS share is located on the IBM Spectrum Scale storage system, then please also ask for the following options to be set by the storage group:

PriviledgedPort=TRUE
Manage_Gids=TRUE

Please note that these options should only be applied to the Euler subnet. For a general overview on subnets and IP addresses please check the following wiki page. When a NAS share is mounted on our HPC clusters, then it is accessible from all the compute nodes in the cluster.

Local NAS

Groups that operate their own NAS, can export a shared file system via NFSv3 to our HPC clusters. In order to use an external NAS on our HPC clusters, the following requirements need to be fullfilled

  • NAS needs to support NFSv3 (this is currently the only NFS version that is supported from our side).
  • The user and group ID's on the NAS needs to be consistent with ETH user names and group.
  • The NAS needs to be exported to the subnet of our HPC clusters.
  • Please carefully set the permissions of the files and directories on your NAS share if other cluster users should not have read/write access to your data.

We advise you to not use this path directly from your jobs. Rather, you should stage files to and from $SCRATCH.

You external NAS can then be accessed through the mount-point

/nfs/servername/sharename

A typical NFS export file to export a share to the Euler cluster would look like

# cat /etc/exports
/export 129.132.93.64/26(rw,root_squash,secure) 10.205.0.0/16(rw,root_squash,secure) 10.204.0.0/16(rw,root_squash,secure)

For a general overview on subnets and IP addresses please check the following wiki page.

The share is automatically mounted, when accessed.

Central LTS (Euler)

Groups who have purchased storage on the central LTS of ETH can ask the ITS SD backup group to export it to the LTS nodes in the Euler cluster. There are certain requirements that need to be fulfilled in order to use central LTS shares on our HPC clusters.

  • The LTS share needs to be mountable via NFS (shares that only support CIFS cannot be mounted on the HPC clusters).
  • The LTS share needs to be exported to the LTS nodes of our HPC clusters (please contact ITS SD Backup group and ask them for an NFS export of your LTS share).
  • Please carefully set the permissions of the files and directories on your LTS share if other cluster users should not have read/write access to your data.

The LTS share needs to be exported to the LTS nodes:

129.132.93.70(rw,root_squash,secure)
129.132.93.71(rw,root_squash,secure)

For accessing your LTS share, you would need to login to the LTS nodes in Euler with

ssh USERNAME@lts.euler.ethz.ch

Where USERNAME needs to be replaced with your ETH account name. LTS shares are then mounted automatically when you access them. The mount-point of such a LTS share is

/nfs/lts11.ethz.ch/shares/sharename(_repl)

or

/nfs/lts21.ethz.ch/shares/sharename(_repl)

depending if your share is located on lts11.ethz.ch or lts21.ethz.ch.

Backup

The users home directories are backed up every night and the backup has a retention time of 90 days. For project and work storage, we provide a weekly back up, with also 90 days retention time. If you have some data that you would like to exclude from the backup, then please create a subdirectory nobackup. Data stored in the nobackup directory will then be excluded from the backup. The subdirectory nobackup can be located on any level in the directory hierarchy:

/cluster/work/YOUR_STORAGE_SHARE/nobackup
/cluster/work/YOUR_STORAGE_SHARE/project101/nobackup
/cluster/work/YOUR_STORAGE_SHARE/project101/data/nobackup/filename
/cluster/work/YOUR_STORAGE_SHARE/project101/data/nobackup/subdir/filename

When large unimportant temporary data that changes a lot is backed up, then this will increase the size/pool of the backup and hence make the backup and the restore process slower. We would therefore like to ask you to exclude this kind of data from the backup of your group storage share if possible. Excluding large temporary data from the backup will help you and us restoring your important data faster in the case of an event.

Comparison

In the table below, we try to give you an overview of the available storage categories/systems on our HPC clusters as well as a comparison of their features.

Category Mount point Life span Snapshots Backup Retention time of backup Purged Max. size Small files Large files
Home /cluster/home permanent up to 7 days yes 90 days no 16 GB + o
Scratch /cluster/scratch 2 weeks no no - yes (files older than 15 days) 2.5 TB o ++
Project /cluster/project 4 years optional yes 90 days no flexible + +
Work /cluster/work 4 years no yes 90 days no flexible o ++
Central NAS /nfs/servername/sharename flexible up to 8 days yes 90 days no flexible + +
Local scratch /scratch duration of job no no - end of job 800 GB ++ +

Choosing the optimal storage system

When working on an HPC cluster that provides different storage categories/systems, the choice of which system to use can have a big influence of the performance of your workflow. In the best case you can speedup your workflow by quite a lot, whereas in the worst case the system administrator has to kill all your jobs and has to limit the number of concurrent jobs that you can run because your jobs slow down the entire storage system and this can affect other users jobs. Please take into account a few recommendations that are listed below.

  • Use local scratch whenever possible. With a few exceptions this will give you the best performance in most cases.
  • For parallel I/O with large files, the high-performance (work) storage will give you the best performance.
  • Don't create a large number of small files (KB's) on project or work storage as this could slow down the entire storage system.
  • If your application does very bad I/O (opening and closing files multiple times per second and doing small appends on the order of a few bytes), then please don't use project and work storage. The best option for this use-case would be local scratch.

If you need to work with a large amount of small files, then please keep them grouped in a tar archive. During a job you can then untar the files to the local scratch, process them and group the results again in a tar archive, which can then be copied back to your home/scratch/work/project space.

File transfer

In order to run your jobs on a HPC cluster, you need to transfer some data or input files from/to the cluster. For smaller and medium amounts of data, you can use some standard command line/graphical tools. If you need to transfer very large amounts of data (on the order of several TB), then please contact the cluster support and we will help you to set up the optimal strategy to transfer your data in a reasonable amount of time.

Command line tools

For transferring files from/to the cluster, we recommend to use standard tools like secure copy (scp) or rsync. The general syntax for using scp is

scp [options] source destination

For copying a file from your PC to an HPC cluster (to your home directory), you need to run the following command on your PC:

scp file username@hostname:

Where username is your ETH username and hostname is the hostname of the cluster. Please note the colon after the hostname. For copying a file from the cluster to your PC (current directory), you need to run the following command on your PC:

scp username@hostname:file .

For copying an entire directory, you would need to add the option -r. Therefore you would use the following command to transfer a directory from your PC to an HPC cluster (to your home directory).

scp -r directory username@hostname:

The general sytnax for rsync is

rsync [options] source destination

In order to copy the content of a directory from your PC (home directory) to a cluster (home directory), you would use the following command.

rsync -Pav /home/username/directory/ username@hostname:/cluster/home/username/directory

The -P option enables rsync to show the progress of the file transfer. The -a option preserves almost all file attributes and the -v option gives you more verbose output.

Graphical tools

Graphical scp/sftp clients allow you to mount your Euler home directory on your workstation. These clients are available for most operating systems.

  • Linux + Gnome: Connect to server
  • Linux + KDE: Konqueror, Dolphin, Filezilla
  • Mac OS X: MacFUSE, Macfusion, Cyberduck, Filezilla
  • Windows: WinSCP, Filezilla'

WinSCP provides the user a Windows explorer like user interface with a split screen that allows to transfer files per drag-and-drop. After starting your graphical scp/sftp client, you need to specify the hostname of the cluster that you would like to connect to and then click on the connect button. After entering your ETH username and password, you will be connected to the cluster and can transfer files.

WinSCP
Winscp1.png Winscp2.png
Filezilla
Filezilla1.png Filezilla2.png

Globus for fast file transfer

Infographic Globus univers


see Globus for fast file transfer


Quotas

The home and scratch directories on our clusters are subject to a strict user quota. In your home directory, the soft quota for the amount of storage that you can use is set to 16 GiB (17.18 GB) and the hard quota is set to 20 GiB (21.47 GB). Further more, you can store maximally 200'000 files and directories (inodes). The hard quota for your personal scratch directory is set to 2.5 TB. You can maximally have 1'000'000 files and directories (inodes). You can check your current usage with the lquota command.

[sfux@eu-login-13-ng ~]$ lquota
+-----------------------------+-------------+------------------+------------------+------------------+
| Storage location:           | Quota type: | Used:            | Soft quota:      | Hard quota:      |
+-----------------------------+-------------+------------------+------------------+------------------+
| /cluster/home/sfux          | space       |          8.85 GB |         17.18 GB |         21.47 GB |
| /cluster/home/sfux          | files       |            25610 |           160000 |           200000 |
+-----------------------------+-------------+------------------+------------------+------------------+
| /cluster/shadow             | space       |          4.10 kB |          2.15 GB |          2.15 GB |
| /cluster/shadow             | files       |                2 |            50000 |            50000 |
+-----------------------------+-------------+------------------+------------------+------------------+
| /cluster/scratch/sfux       | space       |        237.57 kB |          2.50 TB |          2.70 TB |
| /cluster/scratch/sfux       | files       |               29 |          1000000 |          1500000 |
+-----------------------------+-------------+------------------+------------------+------------------+
[sfux@eu-login-13-ng ~]$ 

If you reach 80% of your quota (number of files or storage) in your personal scratch directory, you will be informed via email to clean up.

Shareholders that own storage in /cluster/work or /cluster/project on Euler or Leonhard can check their quota also by using the lquota command:

[sfux@eu-login-11-ng ~]$ lquota /cluster/project/sis
+-----------------------------+-------------+------------------+------------------+------------------+
| Storage location:           | Quota type: | Used:            | Soft quota:      | Hard quota:      |
+-----------------------------+-------------+------------------+------------------+------------------+
| /cluster/project/sis        | space[B]    |          6.17 TB |                - |         10.41 TB |
| /cluster/project/sis        | files       |          1155583 |                - |         30721113 |
+-----------------------------+-------------+------------------+------------------+------------------+
[sfux@eu-login-11-ng ~]$
[sfux@eu-login-11-ng ~]$ lquota /cluster/work/sis
+-----------------------------+-------------+------------------+------------------+------------------+
| Storage location:           | Quota type: | Used:            | Soft quota:      | Hard quota:      |
+-----------------------------+-------------+------------------+------------------+------------------+
| /cluster/work/sis           | space       |          8.36 TB |         10.00 TB |         11.00 TB |
| /cluster/work/sis           | files       |          1142478 |         10000000 |         11000000 |
+-----------------------------+-------------+------------------+------------------+------------------+
[sfux@eu-login-11-ng ~]$

The lquota script requires the path to the top-level directory as parameter.