Lftp
Contents
Synopsis
lftp is a file transfer program designed to transfer bigger amount of data (big files).
lftp is used on top of file transfer protocols such as ftp, http, sftp, fish, torrent. It allows to parallelize the data streams.
See also the lftp webpage (https://lftp.yar.ru/).
Warning
Use lftp carefully, don't overload the connections, respect the bandwidth needs of other users! Set your parameters carefully - especially the number of threads (-P <threads>).
Installation
If you want to use lftp from your workstation you have to install it. If your workstation acts as server in the data transfer, no installation is necessary. It is recommended to use at minimum LFTP version 4.5.3.
Linux
Most distributions have an lftp version installed. However it is recommended to install a recent version from: https://lftp.yar.ru/get.html
Mac OS X
Install lftp using HomeBrew. HomeBrew is the missing package manager for Mac OS X. It does not require sudo as it installs everything into /usr/local.
brew install lftp
Install lftp using MacPorts . MacPorts is another popular package manager (download) and also contains a recent version of lftp:
port install lftp
Windows
NOT TESTED! There is a Windows version available under https://nwgat.ninja/lftp-for-windows/ . Since openssh is officially available for Windows 10 it is strongly recommended to use Windows 10 only.
Mirror data
lftp has built-in mirror which can download or update a whole directory tree. There is also reverse mirror (mirror -R) which uploads or updates a directory tree on server. (copied from man page) The mirror option is used in some of the examples below
Initiate transfer from ID-HPC clusters (inside out)
For smaller and biggerfiles
To copy (download) the content of a folder from a remote system to a local path inside Leonhard with 8 parallel file sessions:
lftp sftp://username@hostname.ethz.ch -e "mirror -P 8 --use-pget-n=1 /data/smallfiles /cluster/scratch/username/test1/; exit”
While -P <N>: number of threats; --use-pget-n=<n>: number of chunks per file (default 1)
To copy (download) the content of a folder from a remote system to a local path in Leonhard with 12 parallel file sessions:
lftp sftp://username@hostname.ethz.ch -e "mirror -P 12 --use-pget-n=1 /data/smallfiles /cluster/scratch/username/test2/; exit”
For bigger files
Bigger files minimum > 20GB (test cases), real use cases for > 100GB.
LFTP copy (download) 4 files in parallel and in addition split each file into 4 chunks during transfer (16 flows in total):
lftp sftp://username@hostname.ethz.ch -e "mirror -P 4 --use-pget-n=4 /data/bigfiles /cluster/scratch/username/test3/; exit”
LFTP copy (download) 8 files in parallel and in addition split each file into 4 chunks during transfer (32 flows in total):
lftp sftp://username@hostname.ethz.ch -e "mirror -P 8 --use-pget-n=4 /data/bigfiles /cluster/scratch/username/test4/; exit"
Initiate transfer from your local machine (outside in)
Prerequisites
The lftp client must be installed on your local machine (see client installation)
Mirror local directory to cluster
To mirror a local directory to your scratch directory on Euler you can use:
lftp sftp://<username>:@login.euler.ethz.ch -e "mirror -v -R -P 16 /Users/<localuser>/source_dir /cluster/scratch/<username>/ ; exit"
important detail: the colon (:) after <username> sends a empty password to sftp. In this case sftp uses the ssh-key for authentication. If you want to type your password on the prompt, don't use the ':'.
options used:
mirror: mirrors a directory, no action if the data is already at the destination. Files changed on the destination will be replaced, not changed files will not be retransfered
-v: verbose, not necessary but useful
-R: reverse mirror, thus mirror your local directory on the server
-P 16: upload 16 files in parallel (adapt N to your needs, be careful that you don't overload your office/lab network)
<username>: user name on the cluster
<localuser>: user name on your local machine (this path for Mac, on Linux it usually would be /home/<localuser>/Scratch/source_dir)
Upload a single file to the cluster
To upload a file to your scratch directory on Euler you can use put, as used from (s)ftp
lftp sftp://<username>:@login.euler.ethz.ch/cluster/scratch/<username>/dest_dir/ -e "put /home/<localuser>/Scratch/ENCFF284YOU.bam ; exit"
for bigger files you can use the option put -c (or reput) which allows to continue the upload when it has partly failed.
Download a single file from the cluster
While downloading you can split the file with the option pget -n n (n = number of chunks) instead of the well known get. Be careful with this option don't overload the directory.
lftp sftp://<username>:@login.euler.ethz.ch/cluster/scratch/cbollige/dest_dir/ -e "pget -n 16 ENCFF284YOU.bam ; exit"
Transferring multiple files
To transfer multiple files (directories, file names with wild cards) you can either use mput or mget. The easier way is to use mirror as shown above.
Interactive examples
https://www.cyberciti.biz/faq/lftp-mirror-example/
Further instructions
Manpages are your friend. man lftp serves as reference manual. The web version of the man page is on the lftp website