Conda

From ScientificComputing
Jump to: navigation, search

Introduction

Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language.

What is a good use case for conda

When you would like to install software on your local computer that is difficult to install, then conda is a neat solution for resolving this. On a fast harddrive or SSD you won't notice any performance issues when using conda.

Reasons to not use conda on an HPC file system

  • HPC file systems are attached via network to the cluster and therefore have a latency, which is about two orders of magnitude larger than when working on a local SSD
    • Starting an application that was installed using conda will read many files compared to doing the same installation without conda and due to the large latency compared to a local SSD this process will be very slow
  • Parallel high-performance file systems as Lustre (/cluster/scratch, /cluster/work) are optimized for large files (>4 MB), see best practices on Lustre
    • Due to reading many small files when starting an application that was installed with conda there will creating unnecessary load on the metadata servers which slows down the entire file system and therefore also affects other users computations
    • A typical Anaconda installation easily contains 100k-200k small files with an average file size of a few KB, Miniconda installations still contain around 60k small files
    • Lustre reads/writes data in blocks of 4 MB
    • Lustre uses striping to distribute files across multiple OST's for better performance, but striping files smaller than 4 MB leads in many cases to noticeably slower read/write performance due to server contention
  • We provide a large number of centrally installed software available to all cluster users (Euler,Leonhard), that conda cannot use and therefore conda will install a large number of packages which are already available on the cluster
  • Conda applications that require MPI cannot use the centrally provided MPI installations with are compiled with support for the LSF batch system
  • Conda is designed for local installations for a single user and according to the conda documentation making a system-wide installation requires administrator privileges
  • Cluster support is not going to provide any support for local conda installations from users
  • In your home directory, you have a files/directories quota of 100,000 which often conflicts with conda installations

Using conda on Euler

If you use conda on Euler, then the only two suitable options for the storage system are either your home directory or project storage.

Don't use conda on the work filesystem as it will not only slow down your own computations, but decrease the performance for all users of the file system, which is unfair towards other cluster users.

Alternatives to conda

Pure Python installations

For pure Python installations, please use the centrally provided Python installation. If packages are missing, you can for instance create a virtual environment that is aware of the centrally provided packages and install the packages that are not provided centrally:

module load stack/2024-06 python/3.11.6
python -m venv --system-site-packages my_venv
source $HOME/my_venv/bin/activate
pip3 install package_name

If you need a different version of an existing package, then you can also install it in your virtual environment. For instance for installing a newer numpy version.

module load stack/2024-06 python/3.11.6
python -m venv --system-site-packages my_venv
source $HOME/my_venv/bin/activate
OPENBLAS=$OPENBLAS_ROOT/lib/libopenblas.so pip3 install --ignore-installed --no-deps numpy==1.20.0

This will install a newer numpy version. If you would like to replace a package with a newer version and also install all its dependencies, then omit the option --no-deps

Installations with non-Python dependencies

Installing applications with non-Python dependencies is a bit more tricky than installing pure Python packages. First you would need to check, if there is already a module provided for the non-Python dependency

https://scicomp.ethz.ch/wiki/Euler_applications_and_libraries
https://scicomp.ethz.ch/wiki/Leonhard_applications_and_libraries

If the non-Python dependency is already provided centrally, then load the corresponding module and follow the approach described above to install the missing Python packages. If the non-Python dependency is not yet provided centrally, then you can either install it locally in your home directory or ask cluster support if it can be installed.

Singularity containers

When you create a singularity container with a conda installation inside the container, such that everything is stored in a single file, then using conda through Singularity containers can avoid the issues described above. Please note that for using Singularity, you would need to be member of the Singularity user group

  • Singularity user group is restricted to members of shareholder groups, guest users don't have access to Singularity
  • If you are member of a shareholder group, then you can request being added to the Singularity user group
  • Basic documentation is available on this wiki page
  • Please note that we provide the infrastructure for running Singularity containers, but no support for issues with the container itself
  • Creating custom Singularity containers and debugging/changing existing Singularity container requires an SIS subscription for expert services

Links