Globus for fast file transfer
Contents
Globus for fast file transfer around the globe
- Globus allows to move big chunks of data fast, comfortable and reliable
- Globus is a fast an reliable data transfer system designed for scientific data
- An infrastructure (Globus Service) to maintain the world wide certification authority and the web service is provided by IT services University of Chicago
Globus on Euler can be used by users having access to the project filesystem (later also to the work file system) - thus to members of units which invested into Euler storage.
Globus ecosystem
Globus is a data managment tool for research data based on gridFTP and a certificate based authentication service. The Globus service is managed by the University of Chicago on a subscription base.
- For the users Globus provides a fast data transfer system within ETH and between ETH and other research institutions. You can use Globus to exchange data efficiently between Euler and your workstation/notebook. Globus can also be used to efficiently transfer data from your working group or department server to Euler directly. However some software installation and configuration on the server is necessary to accomplish this task. Globus is so efficient and reliable that you can transfer hundreds of terabytes within a relatively short timespan.
- Globus also provides the possibility to share data with collaborators at other institutions, with a whole research community or the general public.
- Software developers can also use the Globus API to build your own data sharing app on top of Globus.
Data transfer with Globus
Prerequisite
You have to be member of the group ID-HPC-EULER
in order to use Globus on Euler. In case you are a long time Euler user, you should already be member of this group. To check if you are already in this group type the command
id
on an Euler shell (terminal) you will get a long list of groups you belong to. If ID-HPC-EULER is among them, proceed with the web interface.
Other wise open a ticket at Globus Support asking to become member of this group.
Web interface
The easiest way to use Globus is to login into the WebGUI (app.globus.org). There you have to "use your existing organizational login" and to choose "ETHZ - ETH Zürich" in the selection field.
You then will be redirected to the official ETH login page where you can login with your ETH username and password for web applications (LDAP ). The authentication is done by ETH, so no passwords are passed to Globus. Instead, a short-life certificate will be created, which is used as temporary user credentials for Globus.
After a successful login you will reach the Globus file manager. It is more or less self-explanatory. A short how-to can be found on the Globus documentation.
Search for "ETH Zurich#Euler" in the "Collection" search field to access your files on Euler. In some cases, you may need to re-authenticate before you can access your files.
You can of course search for other collections you have access to, and for public collection. For instance under "Shared EMBL-EBI public endpoint" you get access to various genetic databases which you can copy directly to your Euler share or to your local machine.
To fine tune your file transfer have a look at the Transfer and Timer Options. This allows you to schedule your file transfer and gives you following options:
- sync: this works like rsync only changed files will be transferred
- delete files: mostly used in connection with sync. This will delete files on the destionation, should they not exist on the source side.
- preserve source file modification times: self explaining
- do NOT verify file integrity after transfer: this options avoid checksum control - unwise to check it on in almost all cases.
- encrypt transfer: encrypts your transfer. Strongly recommended for file transfer up to lower TB range. Transfer speed will be somewhat reduced
- Fail on quota errors: strongly recommended, when you upload to Euler
Globus Connect Personal
To improve your user experience and increase transfer speed from your local machine it is recommended to install Globus Connect Personal. This adds your machine to the Globus ecosystem and you can define directories you share with others. Globus Connect Personal is available for Linux, Mac OS and Windows. To transfer data from and to Euler, you will usually use a split view in the GUI, showing the EULER file tree and your local file tree next to each other.
Command-line interface
The Globus command-line interface (CLI) can be installed on your local machine using pipx or pip. Use simply:
pip install globus-cli
or
pipx install globus-cli
To install the globus-cli into your Euler home use:
module load python/3.6.0
pip install --upgrade --user globus-cli
and then add the path for the local executables into your .bashrc
PATH=.local/bin/:$PATH
To use the cli you have to login:
globus login
and then copy the output into the URL bar of your browser.
For further information consult the Globus CLI documentation.
Advanced users
Globus is very well documented and it is recommended that you have a look at the various Globus how-tos.
Premium features for paying customers
Research groups at ETH who contribute to the Globus subscription have access to several exclusive features, such as:
- create end-points inside their own institution
- share data with non-Globus users from your own instiutions
Please contact Globus Support if you are interested in these features.