Globus for fast file transfer

From ScientificComputing
Jump to: navigation, search

Globus for fast file transfer around the globe

  • Globus allows to move big chunks of data fast, comfortable and reliable
  • Globus is a fast an reliable data transfer system designed for scientific data
  • An infrastructure (Globus Service) to maintain the world wide certification authority and the web service is provided by IT services University of Chicago

Globus on Euler can be used by users having access to the project filesystem (later also to the work file system) - thus to members of units which invested into Euler storage.

Globus ecosystem

Globus is a data managment tool for research data based on gridFTP and a certificate based authentication service. The Globus service is managed by the University of Chicago on a subscription base.

Infographic Globus universe


  • For the users Globus provides a fast data transfer system within ETH and between ETH and other research institutions. You can use Globus to exchange data efficiently between Euler and your workstation/notebook. Globus can also be used to efficiently transfer data from your working group or department server to Euler directly. However some software installation and configuration on the server is necessary to accomplish this task. Globus is so efficient and reliable that you can transfer hundreds of terabytes within a relatively short timespan.
  • Globus also provides the possibility to share data with collaborators at other institutions, with a whole research community or the general public.
  • Software developers can also use the Globus API to build your own data sharing app on top of Globus.

Data transfer with Globus

Prerequisite

You have to be member of the group ID-HPC-EULER in order to use Globus on Euler. In case you are a long time Euler user, you should already be member of this group. To check if you are already in this group type the command id on an Euler shell (terminal) you will get a long list of groups you belong to. If ID-HPC-EULER is among them, proceed with the web interface. Other wise open a ticket at Globus Support asking to become member of this group.

Web interface

The easiest way to use Globus is to login into the WebGUI (app.globus.org). There you have to "use your existing organizational login" and to choose "ETHZ - ETH Zürich" in the selection field.

Globus Login

You then will be redirected to the official ETH login page where you can login with your ETH username and password for web applications (LDAP ). The authentication is done by ETH, so no passwords are passed to Globus. Instead, a short-life certificate will be created, which is used as temporary user credentials for Globus.


Globus File Manager

After a successful login you will reach the Globus file manager. It is more or less self-explanatory. A short how-to can be found on the Globus documentation.

Search for "ETH Zurich#Euler" in the "Collection" search field to access your files on Euler. In some cases, you may need to re-authenticate before you can access your files.

You can of course search for other collections you have access to, and for public collection. For instance under "Shared EMBL-EBI public endpoint" you get access to various genetic databases which you can copy directly to your Euler share or to your local machine.


Globus Transfer and Timer options

To fine tune your file transfer have a look at the Transfer and Timer Options. This allows you to schedule your file transfer and gives you following options:

  • sync: this works like rsync only changed files will be transferred
  • delete files: mostly used in connection with sync. This will delete files on the destionation, should they not exist on the source side.
  • preserve source file modification times: self explaining
  • do NOT verify file integrity after transfer: this options avoid checksum control - unwise to check it on in almost all cases.
  • encrypt transfer: encrypts your transfer. Strongly recommended for file transfer up to lower TB range. Transfer speed will be somewhat reduced
  • Fail on quota errors: strongly recommended, when you upload to Euler


Globus Connect Personal

To improve your user experience and increase transfer speed from your local machine it is recommended to install Globus Connect Personal. This adds your machine to the Globus ecosystem and you can define directories you share with others. Globus Connect Personal is available for Linux, Mac OS and Windows. To transfer data from and to Euler, you will usually use a split view in the GUI, showing the EULER file tree and your local file tree next to each other.

Command-line interface

The Globus command-line interface (CLI) can be installed on your local machine using pipx or pip. Use simply:

pip install globus-cli or pipx install globus-cli

To install the globus-cli into your Euler home use:

module load python/3.6.0

pip install --upgrade --user globus-cli

and then add the path for the local executables into your .bashrc

PATH=.local/bin/:$PATH

To use the cli you have to login:

globus login

and then copy the output into the URL bar of your browser.

For further information consult the Globus CLI documentation.

Advanced users

Globus is very well documented and it is recommended that you have a look at the various Globus how-tos.

Premium features for paying customers

Research groups at ETH who contribute to the Globus subscription have access to several exclusive features, such as:

  • create end-points inside their own institution
  • share data with non-Globus users from your own instiutions

Please contact Globus Support if you are interested in these features.