From ScientificComputing
Revision as of 08:41, 21 October 2022 by Sfux (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

We provide a wide range of centrally installed commercial and open source applications and libraries to our cluster users.

Central installations

Applications and libraries that are used by many people from different departments of ETH (e.g. MATLAB, Comsol, Ansys, etc.) or that are explicitly requested by a shareholder group will be installed centrally in /cluster/apps. Providing a software stack of centrally installed applications and libraries gives the users certain advantages.

  • Applications and libraries are visible and accessible to all users via environment modules.
  • They are maintained by the ID SIS HPC group (or in a few cases also by users).
  • Commercial licenses are provided by the central license administration of ETH (IT shop).

If an application or library is only used by a few people, then we recommend the users to install it locally in their home directory. In case you need help to install an application or library locally, please do not hesitate to contact cluster support.

Software stacks

On our clusters we provide multiple software stacks.

On the Euler cluster, we are in the transition phase from the old software stack to the new software stack. Currently when users login, the old software stack is still set as the default. It will be kept on an as-is basis to allow reproducing older results. New software is only installed in the new software stack. It is set up with the package manager SPACK on Euler using the LMOD module system. The new software stack has 5 basic toolchains (GCC 4.8.5, GCC 6.3.0, GCC 8.2.0, Intel 18.0.1 and 19.1.0). A toolchain is a combination of compiler, MPI and BLAS/LAPACK library. For each of the toolchains, several hundred packages are available.

LMOD Modules use a hierarchy of modules with three layers to avoid conflicts when multiple modules are loaded at the same time.

  • The core layer contains software which are independent of compilers and MPI libraries, e.g., commercials software which come with their own runtime libraries
$ module load comsol/5.6
  • The compiler layer contains software which are dependent of compilers.
$ module load gcc/6.3.0 hdf5/1.10.1
  • The MPI layer contains software which are dependent of compilers and MPI libraries
$ module load gcc/6.3.0 openmpi/4.0.2 openblas

Lmod toolchains.png

There are four main toolchains

Those compilers can be combined with OpenMPI 3.0.1 or 4.0.2 and OpenBLAS

In the new software stack, we have installed a Python module which is linked with CUDA, cuDNN and NCCL libraries and contains machine learning / deep learning packages such as scikit-Learn, TensorFlow and Pytorch. This module can be loaded with the following commands:

$ env2lmod
$ module load gcc/6.3.0 python_gpu/3.8.5

The following have been reloaded with a version change:
  1) gcc/4.8.5 => gcc/6.3.0

$ module list

Currently Loaded Modules:
  1) StdEnv            4) cuda/11.0.3    7) python_gpu/3.8.5
  2) gcc/6.3.0         5) cudnn/8.0.5
  3) openblas/0.2.20   6) nccl/2.7.8-1

You can also find and load CUDA, cuDNN and NCCL libraries available on the cluster matching your needs.

 $ env2lmod
 $ module load gcc/6.3.0
 $ module avail cuda

------------- /cluster/apps/lmodules/Compiler/gcc/6.3.0 -------------
  cuda/8.0.61     cuda/9.2.88      cuda/11.0.3 (L)
  cuda/9.0.176    cuda/10.0.130    cuda/11.1.1
  cuda/9.1.85     cuda/10.1.243    cuda/11.2.2 (D)