Difference between revisions of "Storage and data transfer"

From ScientificComputing
Jump to: navigation, search
 
(62 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
__NOTOC__
 +
<table style="width: 100%;">
 +
<tr valign=top>
 +
<td style="width: 30%; text-align:left">
 +
< [[Accessing the cluster|Accessing the cluster]]
 +
</td>
 +
<td style="width: 35%; text-align:center">
 +
[[Main_Page|Home]]
 +
</td>
 +
<td style="width: 35%; text-align:right">
 +
[[Modules and applications]] >
 +
</td>
 +
</tr>
 +
</table>
  
<table style="width: 90%">
 
  
 +
<table>
 
<tr valign=top>
 
<tr valign=top>
<td style="width: 40%; background: white;">
+
<td style="width: 35%; background: white;">
 
[[Image:storage.png|370px]]
 
[[Image:storage.png|370px]]
 
</td>
 
</td>
<td style="width: 40%; background: white;">
+
<td style="width: 2%; background: white;">
== Personal storage ==
+
</td>
=== $HOME ===
+
<td style="width: 63%; background: white;">
cd $HOME
+
 
pwd
+
Once you can log in to the cluster, you can start setting up your calculation job and you need your data. Therefore, two questions arise:<br />
/cluster/home/username
+
 
 +
1. Where to store data?<br />
 +
2. How to transfer data?<br />
  
$HOME is a safe, long-term storage for critical data (program source, scripts, etc.) and is accessible only by the user (owner). This means other people cannot read its contents.
+
Here, we explain the storage system on the cluster and give examples how to transfer data between your local computer and the cluster.  
  
There is a disk quota of 16/20 GB and a maximum of 80’000/100’000 files (soft/hard quota). You can check the quota with the command lquota.
+
=== Quick examples ===
 +
Upload a directory from your local computer to /cluster/scratch/username ($SCRATCH) on Euler
 +
$ scp -r dummy_dir username@euler.ethz.ch:/cluster/scratch/username/
  
Its content is saved every hour/day/week using snapshot, which is stored in the hidden .snapshot directory.
+
Log in to the cluster and check your disk space quota
 +
$ lquota
 +
+-----------------------+-------------+------------+---------------+---------------+
 +
| Storage location:    | Quota type: | Used:      | Soft quota:  | Hard quota:  |
 +
+-----------------------+-------------+------------+---------------+---------------+
 +
| /cluster/home/sfux    | space      |    8.85 GB |      17.18 GB |      21.47 GB |
 +
| /cluster/home/sfux    | files      |      25610 |        160000 |        200000 |
 +
+-----------------------+-------------+------------+---------------+---------------+
 +
| /cluster/shadow      | space      |    4.10 kB |      2.15 GB |      2.15 GB |
 +
| /cluster/shadow      | files      |          2 |        50000 |        50000 |
 +
+-----------------------+-------------+------------+---------------+---------------+
 +
| /cluster/scratch/sfux | space      |  237.57 kB |      2.50 TB |      2.70 TB |
 +
| /cluster/scratch/sfux | files      |        29 |      1000000 |      1500000 |
 +
+-----------------------+-------------+------------+---------------+---------------+
 
</td>
 
</td>
 
</tr>
 
</tr>
 
</table>
 
</table>
  
<table style="width: 90%">
+
== Personal storage for all users ==
 +
=== $HOME ===
 +
$ cd $HOME
 +
$ pwd
 +
/cluster/home/username
 +
 
 +
* $HOME is a safe, long-term storage for critical data (program source, scripts, etc.) and is accessible only by the user (owner). This means other people cannot read its contents.
 +
* There is a disk quota of 16/20 GB and a maximum of 160’000/200’000 files (soft/hard quota). You can check the quota with the command lquota.
 +
* Its content is saved every hour/day using snapshot, which is stored in the hidden .snapshot directory.
 +
<table>
 
<tr valign=top>
 
<tr valign=top>
 
<td style="width: 37%; background: white;">
 
<td style="width: 37%; background: white;">
  
 
=== Global Scratch ===
 
=== Global Scratch ===
  cd $SCRATCH
+
  $ cd $SCRATCH
  pwd
+
  $ pwd
 
  /cluster/scratch/username
 
  /cluster/scratch/username
  
$SCRATCH is a fast, short-term storage for computations running on the cluster. It is created automatically upon first access (cd $SCRATCH) and visible (mounted) only when accessed.
+
* $SCRATCH is a fast, short-term storage for computations running on the cluster. It is created automatically upon first access (cd $SCRATCH) and visible (mounted) only when accessed.
  
It has strict usage rules (see $SCRATCH/__USAGE_RULES__for details) and has no backup.
+
* It has strict usage rules (see [[Personal_scratch_usage_rules|$SCRATCH/__USAGE_RULES__]] for details) and has no backup.
 
</td>
 
</td>
 
<td style="width: 3%; background: white;">
 
<td style="width: 3%; background: white;">
Line 42: Line 82:
 
/scratch on each compute node ($TMPDIR)
 
/scratch on each compute node ($TMPDIR)
  
The local scratch is intended for serial, I/O-intensive applications. Therefore, it has a very short life span. Data are deleted automatically when the job ends.  
+
* The local scratch is intended for serial, I/O-intensive applications. Therefore, it has a very short life span. Data are deleted automatically when the job ends.  
 
+
* Scratch space must be requested by the job and has no backup.
Scratch space must be requested by the job (see “Batch System”) and has no backup.
 
  
[[Using_local_scratch|See here how to use local scratch]]
+
[[Using_local_scratch|See how to use local scratch]]
 
</td>
 
</td>
 
</tr>
 
</tr>
 
</table>
 
</table>
  
<table style="width: 90%">
+
== Group storage for shareholders ==
 +
Shareholders can buy the space on Project and Work as much as they need, and manage access rights.
 +
Quota can be checked with lquota. The content is backed up multiple times per week.
 +
<table>
 
<tr valign=top>
 
<tr valign=top>
  
== Group storage ==
 
 
<td style="width: 37%; background: white;">
 
<td style="width: 37%; background: white;">
 
===Project===
 
===Project===
  cd /cluster/project/groupname
+
  $ cd /cluster/project/groupname
  
 
Similar to $HOME, but for groups, it is a safe, long-term storage for critical data.
 
Similar to $HOME, but for groups, it is a safe, long-term storage for critical data.
 
Shareholders can buy as much space as they need, and manage access rights.
 
 
Quota can be checked with lquota. The content is backed up multiple times per week.
 
 
</td>
 
</td>
 
<td style="width: 3%; background: white;">
 
<td style="width: 3%; background: white;">
Line 70: Line 107:
  
 
=== Work ===
 
=== Work ===
  cd /cluster/work/groupname
+
  $ cd /cluster/work/groupname
  
 
Similar to global scratch, but without purge, it is a fast, short-or medium-term storage for large computations.
 
Similar to global scratch, but without purge, it is a fast, short-or medium-term storage for large computations.
  
Shareholders can buy as much space as they need and manage access rights. The folder is visible only when accessed.
+
The folder is visible only when accessed.
 
 
Quota can be checked with lquota. The content is backed up multiple times per week.
 
 
 
 
</td>
 
</td>
 
</tr>
 
</tr>
 
</table>
 
</table>
 
<table style="width: 90%">
 
  
 
== External Storage ==
 
== External Storage ==
 +
{{External_Storage}}
 +
<table>
 
<tr valign=top>
 
<tr valign=top>
 
<td style="width: 37%; background: white;">
 
<td style="width: 37%; background: white;">
=== Central NAS ===
+
=== Central NAS/CDS ===
Groups who have purchased storage on the central NAS of ETH can access it on our clusters.  
+
Groups who have purchased storage on the central NAS/CDS of ETH provided by ID Systemdienste can access it on our clusters.  
 
 
The NAS share needs to be mountable via NFS (shares that only support CIFS cannot be mounted on the HPC clusters), and exported to the subnet of our HPC clusters (please contact ID Systemdienste and ask them for an NFS export of your NAS share).
 
 
 
The NAS share is mounted automatically on our clusters under
 
/nfs/servername/sharename
 
 
</td>
 
</td>
 
<td style="width: 3%; background: white;">
 
<td style="width: 3%; background: white;">
Line 101: Line 130:
 
=== Other NAS ===
 
=== Other NAS ===
 
Groups who are operating their own NAS can export a shared file system via NFS to Euler. The user and group ID's on the NAS needs to be consistent with ETH user names and groups.
 
Groups who are operating their own NAS can export a shared file system via NFS to Euler. The user and group ID's on the NAS needs to be consistent with ETH user names and groups.
 +
</td>
 +
</tr>
 +
</table>
 +
<table>
 +
<tr valign=top>
 +
<td style="width: 100%; background: white;">
 +
The NAS share needs to be mountable via NFSv3 (shares that only support CIFS cannot be mounted on the HPC clusters), and exported to the subnet of our HPC clusters. The NAS is then mounted automatically on our clusters under
 +
/nfs/servername/sharename
 +
</td>
 +
</tr>
 +
</table>
  
NAS needs to support NFSv3 (this is currently the only NFS version that is supported from our side) and needs to be exported to the subnet of our HPC clusters.
+
== File system comparison ==
 +
 
 +
{| class="wikitable" | style="background:white;text-align: center;"
 +
! File system !! Life span !!  Max size !! Snapshots !! Backup !!  Small files !! Large files
 +
|-
 +
| $HOME || permanent ||16 GB ||  &#x2713;|| &#x2713; ||  &#x2713; || o
 +
|-
 +
| $SCRATCH || 2 weeks || 2.5 TB ||- || -  || o ||  &#x2713;&#x2713;
 +
|-
 +
| /cluster/project || 4 years || flexible || optional || &#x2713; ||  &#x2713; || &#x2713;
 +
|-
 +
| /cluster/work || 4 years || flexible ||- || &#x2713;||  o || &#x2713;&#x2713;
 +
|-
 +
| Local /scratch || duration of job ||800 GB || - ||-||  &#x2713;&#x2713; || o
 +
|-
 +
| Central NAS || flexible || flexible || &#x2713; || &#x2713; ||  &#x2713; || &#x2713;
 +
|}
 +
 
 +
Retention time
 +
* Snapshots: up to 7 days
 +
* Backup: up to 90 days
 +
 
 +
== Data transfer with command line tools ==
 +
<table style="width: 100%">
 +
<tr valign=top>
 +
<td style="width: 40%; background: white;">
 +
 
 +
=== Using scp command ===
 +
Upload dummy_file from your workstation to your home directory on Euler
 +
$ scp dummy_file username@euler.ethz.ch:
 +
 
 +
Download dummy_file from Euler to the current directory on your workstation
 +
$ scp username@euler.ethz.ch:dummy_file .
 +
 
 +
Copy a directory to Euler
 +
$ scp -r dummy_dir username@euler.ethz.ch:
 +
 
 +
</td>
 +
<td style="width: 5%; background: white;">
 +
</td>
 +
<td style="width: 45%; background: white;">
 +
 
 +
=== Example: upload a directory with rsync ===
 +
Create two files in the dummy directory and use rsync to transfer the folder
 +
$ mkdir dummy_dir
 +
$ touch dummy_dir/dummy_file1 dummy_dir/dummy_file2
 +
$ rsync -av dummy_dir username@euler.ethz.ch:dummy_dir
  
The NAS is then mounted automatically on our clusters under
 
/nfs/servername/sharename.
 
 
</td>
 
</td>
 
</tr>
 
</tr>
 
</table>
 
</table>
  
[[Image:storage_summarized_table.png|900px]]
+
 
 +
<div style="width=90%">
 +
 
 +
== Data transfer with graphical tools ==
 +
 
 +
{| class="wikitable" | style="background: white; text-align: center"
 +
|+ style="text-align: left" | Table: Graphical file transfer programs
 +
! scope="col" style="width: 150px;" | Linux !! scope="col" style="width: 150px;" | macOS !! scope="col" style="width: 150px;" |Windows
 +
|-
 +
| FileZilla || FileZilla<br />Cyberduck || WinSCP<br />PSCP<br />FileZilla<br />Cyberduck
 +
|}
 +
 
 +
=== WinSCP ===
 +
<table>
 +
<tr>
 +
<td style="width:45%">
 +
[[Image:Winscp1.png|440px]]
 +
</td>
 +
<td style="width:5%">
 +
</td>
 +
<td style="width:45%">
 +
[[Image:Winscp2.png|480px]]
 +
</td>
 +
</tr>
 +
</table>
 +
 
 +
=== Globus for fst file transfer ===
 +
<br />
 +
[[File:Infographic-Globus-Universe-2020.png|560px|left|Infographic Globus univers|link=Globus for fast file transfer]]
 +
<br clear=all>
 +
see [[:Globus for fast file transfer]]
 +
 
 +
== Further reading ==
 +
* [[Storage systems|User guide: Storage systems]]
 +
* [[Unified_quota_wrapper | Unified quota wrapper]]
 +
* [[Too_much_space_is_used_by_your_output_files|Too much space is used by your output files]]
 +
* [[Best_practices_on_Lustre_parallel_file_systems|Best practices guide for Lustre file system]]
 +
 
 +
 
 +
<table style="width: 100%;">
 +
<tr valign=top>
 +
<td style="width: 30%; text-align:left">
 +
< [[Accessing the cluster|Accessing the cluster]]
 +
</td>
 +
<td style="width: 35%; text-align:center">
 +
[[Main_Page|Home]]
 +
</td>
 +
<td style="width: 35%; text-align:right">
 +
[[Modules and applications]] >
 +
</td>
 +
</tr>
 +
</table>

Latest revision as of 10:26, 21 October 2022

< Accessing the cluster

Home

Modules and applications >


Storage.png

Once you can log in to the cluster, you can start setting up your calculation job and you need your data. Therefore, two questions arise:

1. Where to store data?
2. How to transfer data?

Here, we explain the storage system on the cluster and give examples how to transfer data between your local computer and the cluster.

Quick examples

Upload a directory from your local computer to /cluster/scratch/username ($SCRATCH) on Euler

$ scp -r dummy_dir username@euler.ethz.ch:/cluster/scratch/username/

Log in to the cluster and check your disk space quota

$ lquota
+-----------------------+-------------+------------+---------------+---------------+
| Storage location:     | Quota type: | Used:      | Soft quota:   | Hard quota:   |
+-----------------------+-------------+------------+---------------+---------------+
| /cluster/home/sfux    | space       |    8.85 GB |      17.18 GB |      21.47 GB |
| /cluster/home/sfux    | files       |      25610 |        160000 |        200000 |
+-----------------------+-------------+------------+---------------+---------------+
| /cluster/shadow       | space       |    4.10 kB |       2.15 GB |       2.15 GB |
| /cluster/shadow       | files       |          2 |         50000 |         50000 |
+-----------------------+-------------+------------+---------------+---------------+
| /cluster/scratch/sfux | space       |  237.57 kB |       2.50 TB |       2.70 TB |
| /cluster/scratch/sfux | files       |         29 |       1000000 |       1500000 |
+-----------------------+-------------+------------+---------------+---------------+

Personal storage for all users

$HOME

$ cd $HOME
$ pwd
/cluster/home/username 
  • $HOME is a safe, long-term storage for critical data (program source, scripts, etc.) and is accessible only by the user (owner). This means other people cannot read its contents.
  • There is a disk quota of 16/20 GB and a maximum of 160’000/200’000 files (soft/hard quota). You can check the quota with the command lquota.
  • Its content is saved every hour/day using snapshot, which is stored in the hidden .snapshot directory.

Global Scratch

$ cd $SCRATCH
$ pwd
/cluster/scratch/username
  • $SCRATCH is a fast, short-term storage for computations running on the cluster. It is created automatically upon first access (cd $SCRATCH) and visible (mounted) only when accessed.

Local Scratch

/scratch on each compute node ($TMPDIR)

  • The local scratch is intended for serial, I/O-intensive applications. Therefore, it has a very short life span. Data are deleted automatically when the job ends.
  • Scratch space must be requested by the job and has no backup.

See how to use local scratch

Group storage for shareholders

Shareholders can buy the space on Project and Work as much as they need, and manage access rights. Quota can be checked with lquota. The content is backed up multiple times per week.

Project

$ cd /cluster/project/groupname

Similar to $HOME, but for groups, it is a safe, long-term storage for critical data.

Work

$ cd /cluster/work/groupname

Similar to global scratch, but without purge, it is a fast, short-or medium-term storage for large computations.

The folder is visible only when accessed.

External Storage

Please note that external storage is convenient to bring data in to the cluster or to store data for a longer time. But we recommend to not directly process data from external storage systems in batch jobs on Euler as this could be very slow and potentially put a high load on the external storage system. Please rather copy data from the external storage system to some cluster storage (home directory, personal scratch directory, project storage, work storage, or local scratch) before you process it in a batch job. After processing the data from a cluster storage system, you can copy the results back to the external storage system.

Central NAS/CDS

Groups who have purchased storage on the central NAS/CDS of ETH provided by ID Systemdienste can access it on our clusters.

Other NAS

Groups who are operating their own NAS can export a shared file system via NFS to Euler. The user and group ID's on the NAS needs to be consistent with ETH user names and groups.

The NAS share needs to be mountable via NFSv3 (shares that only support CIFS cannot be mounted on the HPC clusters), and exported to the subnet of our HPC clusters. The NAS is then mounted automatically on our clusters under

/nfs/servername/sharename

File system comparison

File system Life span Max size Snapshots Backup Small files Large files
$HOME permanent 16 GB o
$SCRATCH 2 weeks 2.5 TB - - o ✓✓
/cluster/project 4 years flexible optional
/cluster/work 4 years flexible - o ✓✓
Local /scratch duration of job 800 GB - - ✓✓ o
Central NAS flexible flexible

Retention time

  • Snapshots: up to 7 days
  • Backup: up to 90 days

Data transfer with command line tools

Using scp command

Upload dummy_file from your workstation to your home directory on Euler

$ scp dummy_file username@euler.ethz.ch:

Download dummy_file from Euler to the current directory on your workstation

$ scp username@euler.ethz.ch:dummy_file .

Copy a directory to Euler

$ scp -r dummy_dir username@euler.ethz.ch:

Example: upload a directory with rsync

Create two files in the dummy directory and use rsync to transfer the folder

$ mkdir dummy_dir
$ touch dummy_dir/dummy_file1 dummy_dir/dummy_file2
$ rsync -av dummy_dir username@euler.ethz.ch:dummy_dir


Data transfer with graphical tools

Table: Graphical file transfer programs
Linux macOS Windows
FileZilla FileZilla
Cyberduck
WinSCP
PSCP
FileZilla
Cyberduck

WinSCP

Winscp1.png

Winscp2.png

Globus for fst file transfer


Infographic Globus univers


see Globus for fast file transfer

Further reading


< Accessing the cluster

Home

Modules and applications >