JupyterHub

From ScientificComputing
Jump to: navigation, search

Introduction

JupyterLab and Jupyter notebooks are widely used in the scientific community at ETH as they provide an easy way to run Python code (or to use other programming languages) in a browser window. We therefore developed a service that allows users to start a JupyterLab session in their browser without having to login to Euler via an SSH client. It provides an easy access to computational resources of the Euler cluster and you can use it to interactively work and to develop and test your code.

Prerequisites

The only prerequisite to use this service is that you have a local computer with a browser installed. As the Euler cluster itself, the service can only be used from within the ETH network. If you are working from home, then you would first need to establish a VPN connection to the ETH network.

Please note that if you have never logged into the Euler cluster before using this service, then you first need to login once with an SSH client to verify your ETH account and to accept the clusters usage rules. Please check our wiki page about accessing the cluster. On this page you can find all information required to login to the Euler cluster with your SSH agent. When you login for the first time, an access code will be sent to your ETH email address that you need to enter and then you need to accept the clusters usage rules. After this initial procedure you can use the Jupyter service.

Starting a session

You can start a session by opening your favorite browser and by entering the URL (FIXME: put URL here once the service is productive). Then you will be asked to login with your ETH credentials. After entering your ETH credentials and clicking on the Sign in button, you can choose the amount of resources that you request for your session. Please only request multiple cores if you are planning to run some code that can make use of multiple cores. By clicking on the Start button, a batch job with your session will be started. It might takes some time until the batch job has started, but then JupyterLab will start in your browser window.

Please note that the service is currently using our Python 3.10.4 (GCC 8.2.0) installation. It has several hundred packages preinstalled that you can use right away in your session when starting a Python kernel:

https://scicomp.ethz.ch/wiki/Python_on_Euler#python_gpu.2F3.10.4

You can find a comprehensive tutorial about JupyterLab on

https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html

Stopping your job

Please don't forget to kill your job when you are done with your JupyterLab (or stop your server in jupyterhub)

If you just stop the current kernel or close the browser window, then the batch job on Euler will continue to run and waste resources that could be used by other cluster users. To properly stop a jupyter session, you need to click the File menu and choose the entry Hub Control Panel (see picture) and then click the Stop my server button. Afterwards you can close the browser window and your session is terminated.

If you don't have access to this menu (e.g. in tensorboard or other services), you can also access the hub by changing the URL. You just need to replace everything after /user (included) by /hub/home.

How to open the hub control panel

Debugging

Before opening a ticket, please check the logs of your jupyterlab. They are available in your home directory under the following name ~/jupyterhub_slurmspawner*. If you are not able to debug it yourself, please add this file to the ticket.

Installing an Extension

It is possible to extend the basic functionality of JupyterLab with extensions. We provide some preinstalled extensions for the users, but there are probably still some useful extensions missing. You can not directly use the extension manager from JupyterLab as this would required write permission in the central installation directory of JupyterLab which users don't have. There is no easy way to configure JupyterLab to store the extensions in a user-writable directory. For some extensions it is possible to install them with pip:

For example if you wish to install jupyterlab-slurm, you will need to run the following commands:

 module load REQUIRED_MODULES
 pip install --user jupyterlab_slurm
 jupyter labextension enable  jupyterlab_slurm

where REQUIRED_MODULES are the ones required by Jupyterlab. In order to have the current configuration, please look at the top of your log files (~/jupyterhub_slurmspawner*).

If an extension for JupyterLab is useful for many users, then you can also ask {cluster_support} if the extension can be installed centrally.

Disabling an Extension

If you are unhappy with an extension, you can disable it with:

 jupyter labextension disable my-extension

Other services

By using a proxy on the server, we can provide other services within jupyterhub. Unfortunately, depending on the service, it might run only as the main server and not a named server (which means that you cannot run more than 1 non jupyter service at a time). Therefore if you plan to use another service in parallel to jupyterhub, please use a named server for jupyterhub.

Feel free to copy the settings of tensorboard to create your own web services.

Tensorboard

Tensorboard can be selected when starting the server in the option Software from SIS. It will load the data in $HOME/tensorboard_logs. You can either move / copy your to match this directory, create a link with this name to the correct directory or write a bash script in =$HOME/.config_tensorboard where you set the variable LOGDIR to the directory you wish to load.

WARNING: Tensorboard is not able to deal with HTTPS, therefore anyone sharing the same computational node than you could extract all the data contained in tensorboard.

Known Issues

  1. Currently, the plugins cannot be installed directly from the UI. Please use the command line to install them

FAQ

I cannot login to the Jupyter service

If it is the first time that you are using Euler, you will need to connect first with SSH. Please read this page for more information on how to do it.

My server is too slow to start

We rely on the Slurm batch system to provide the JupyterLab instances. So it could be either due to a low amount of available resources in Euler or that your priority is too low (already used too much resources or too many jobs running at the same time).

My server has been killed before starting

JupyterHub relies on a timeout system to manage the starting jobs (currently around 10 minutes). If your job takes more time than that to start, it will be automatically killed. If you are unable to get one after multiple tries, please check your queue by using ssh and running squeue on Euler.

I cannot request a JupyterLab for more than 24h

This service aims at cluster beginners and therefore we chose to only allow short sessions up to 24 hours. For running longer jobs for more than 24h, we recommend to submit them directly to the queue and to not use JupyterLab for that.

My service crashes when starting (e.g. tensorboard)

Unfortunately, only jupyterhub can run as a named server. All the other services need to run as the main server.

Build Recommended but fails

JupyterLab is trying to build all its files within the system directories which is of course not allowed. No worries about this issue, we will try to keep up to date the JupyterLab, but we will not do it with every minor releases of a plugin.

My jupyterlab with 1 GPU is not starting

GPUs are only available to shareholders that purchased GPU resources in Euler. Please ensure that you indeed have access to GPUs on Euler before submitting a ticket to cluster support.

I lost all my settings when migrating from the script to the hub

With the JupyterHub, we are using the directory ~/.jupyterlab and not ~/.jupyter to store all the configurations. Replacing the content of the new directory by the old one should be sufficient.

I want to load a cluster module / I want to activate a virtualenv / Jupyterlab is missing some features

You can add your own instruction by writing your own bash script in ~/.jupyterlabrc

This script will be sourced (. ~/.jupyterlabrc) before starting the jupyterlab. So you can load some modules, update some environment variables, replace jupyterlab by another service (advanced usage: see how tensorboard is done), ...

I wish to use custom arguments to jupyterhub-singleuser

A few environment variables can be defined in your ~/.jupyterlabrc file:

- JUPYTER_DIR: Available directory for the users
- JUPYTER_HOME: Default directory
- JUPYTER_EXTRA_ARGS: any additional argument (e.g. '--debug')