Difference between revisions of "Jupyter on Euler and Leonhard Open"

From ScientificComputing
Jump to: navigation, search
(Troubleshooting)
(Mismatch of software stacks)
Line 220: Line 220:
 
In many cases it turns out that there are issues when the software stack used in the script does not match the default software stack that you have set. You can check which software stack is set as default by running the command
 
In many cases it turns out that there are issues when the software stack used in the script does not match the default software stack that you have set. You can check which software stack is set as default by running the command
  
  set_software_stack -i
+
  set_software_stack.sh -i
  
 
The modules
 
The modules

Revision as of 12:18, 12 October 2021

Introduction

Since Jupyter notebooks are becoming more widely used among the scientific community, the HPC group developed a script that you can run on your local computer. This shell script then starts a Jupyter notebook in a batch job on Euler/Leonhard Open (depending on which cluster you choose) and connects your local browser with it.

At the moment, the script can be used with Linux and Mac computers. There is no support for Windows computers. Maybe Windows user can try to run the script using Windows subsystem for Linux (WSL), but this has not been tested yet.

Please note, that with this script we are addressing beginners that start to use Jupyter notebooks on the cluster. It is not addressing advanced users that need a wide range of additional features going beyond simple Jupyter notebooks. Advanced users can take the script and adapt it, such that it can be used with other Python versions (centrally installed, or local installations) and add support for GPU, adding new kernels etc.

Please note, that the script uses the old software stack on Euler and can in its current state not be used with tools from the new software stack.

Installation

Prerequisites

In order to use this script, users need to make sure, that they have set up SSH keys for passwordless access to the cluster:

https://scicomp.ethz.ch/wiki/Accessing_the_clusters#SSH_keys

Please note that the example on the wiki refers to the Euler cluster and for Leonhard Open, then hostname needs to be changed from

euler.ethz.ch

to

login.leonhard.ethz.ch

please make sure that xdg-open is installed. This package is used to automatically start your default browser. You can install it with the following command:

CentOS:

yum install xdg-utils

Ubuntu:

apt-get install xdg-utils

Further more, the script requires that there is a Python installation available, which is usually included in the Linux distribution or Mac OS.

Download and setup

The script is available on the Gitlab instance of ETH Zurich:

https://gitlab.ethz.ch/sfux/Jupyter-on-Euler-or-Leonhard-Open

Download the repository with the command

git clone https://gitlab.ethz.ch/sfux/Jupyter-on-Euler-or-Leonhard-Open

Mac OS X:

git clone https://gitlab.ethz.ch/sfux/Jupyter-on-Euler-or-Leonhard-Open.git

After downloading the script from gitlab.ethz.ch, you need to change its permissions to make it executable

chmod 755 start_jupyter_nb.sh

Updating the script

  • 01 Oct 2019 — Today the script has been updated, such that the jupyter notebooks have next to the Python 3.6 kernel also a bash and an R kernel (3.6.0 on Euler, 3.5.1 on Leonhard Open) available. If you use an older version of the script and you would like to use the newly added kernels, then you need to update your script from the gitlab repository with the command git pull origin master
samfux@bullvalene:~/Jupyter-on-Euler-or-Leonhard-Open$ git pull origin master
warning: redirecting to https://gitlab.ethz.ch/sfux/Jupyter-on-Euler-or-Leonhard-Open.git/
From https://gitlab.ethz.ch/sfux/Jupyter-on-Euler-or-Leonhard-Open
 * branch            master     -> FETCH_HEAD
Already up to date.
samfux@bullvalene:~/Jupyter-on-Euler-or-Leonhard-Open$ 

Using SSH keys with non-default names

Since the reopening of Euler and Leonhard Open after the cyber attack in May 2020, we recommend to the cluster users to use SSH keys. We recommend to use different keys for Euler and Leonhard Open, with according names

$HOME/.ssh/id_ed25519_euler
$HOME/.ssh/id_ed25519_leonhard

In order to use those keys with the jupyter script, then there are two options.

You can configure your ssh client to use this option automatically by adding the following lines in the $HOME/.ssh/config file on your local computer:

Host login.leonhard.ethz.ch
IdentityFile ~/.ssh/id_ed25519_leonhard

Host euler.ethz.ch
IdentityFile ~/.ssh/id_ed25519_euler

or you would need to edit the following section at the beginning of the script and add the path to your SSH keys. In the example below we show how this would look like for Euler:

#########################
# Configuration options #
#########################

# SSH key location is the path to your SSH key. Please specify the path if you are using a non-standard name for your SSH key
SSH_KEY_LOCATION="$HOME/.ssh/id_ed255519_euler" 

# Waiting time interval after starting the jupyter notebook. Check every $WAITING_TIME_INTERVAL seconds if the job already started
WAITING_TIME_INTERVAL=60

#############################
# End configuration options #
#############################

Both options work and are equivalent.

Installing additional Python and R packages locally

When starting a Jupyter notebook with this script, then it will use a central Python and R installation:

  • Euler: python/3.6.1, r/3.6.0
  • Leonhard Open: python_cpu/3.6.4, r/3.5.1

Therefore you can only use packages that are centrally installed out-of-the-box. But you have the option to install additional packages locally in your home directory, which can afterwards be used.

For installing a Python package from inside a Jupyter notebook, you would need to run the following command:

!pip install --user package_name

This will install package_name into $HOME/.local, as described on our wiki page about Python:

https://scicomp.ethz.ch/wiki/Python#Installing_a_Python_package.2C_using_PIP

The command to locally install an R package:

install.packages("package_name")

Then follow the instructions provided on our wiki:

https://scicomp.ethz.ch/wiki/R#Extensions


Running the script

The start_jupyer_nb.sh script needs to be executed on your local computer:

./start_jupyter_nb.sh CLUSTER ETH_USERNAME NUM_CORES RUN_TIME MEM_PER_CORE
Parameter Description
CLUSTER Name of the cluster (Euler or LeoOpen)
ETH_USERNAME ETH username for which the notebook should be started
NUM_CORES Number of cores to be used on the cluster (maximum: 36)
RUN_TIME Run time limit for the jupyter notebook on the cluster (HH:MM)
MEM_PER_CORE Memory limit in MB per core

Example:

./start_jupyter_nb.sh Euler sfux 4 01:20 2048
Example for running a Jupyter notebook on the Euler cluster

Reconnect to a Jupyter notebook

When running the script, it creates a local file called reconnect_info in the installation directory, which contains all information regarding the used ports, the remote ip address, the command for the SSH tunnel and the URL for the browser. This information should be sufficient to reconnect to a Jupyter notebook if connection was lost.

Running multiple notebooks in a single Jupyter instance

If you run Jupyter on the Leonhard cluster, using GPUs (the default version of the uses a python_cpu module, which does not support GPU usage. You would need to change the Python version in the script to enable GPU usage), then you need to make sure a notebook is correctly terminated before you can start another one.

If you don't properly close the first notebook and run a second one, then the previous notebook will still occupy some GPU memory and have processes running, which will throw some errors, when executing the second notebook.

Therefore please make sure that you stop running kernels in the "running" tab in the browser, before starting a new notebook.

Terminate the Jupyter session

Please note that when you finish working with the jupyter notebook, you need to click on the "Quit" or "Logout" button in your Browser. "Quit" will stop the batch job running on Euler, "Logout" will just log you out from the session but not stop the batch job (in this case you need to login to the cluster, identify the job with bjobs and then kill it with the bkill command, using the jobid as parameter). Afterwards you also need to clean up the SSH tunnel that is running in the background.

Example:

samfux@bullvalene:~/Jupyter-on-Euler-or-Leonhard-Open$ ps -u | grep -m1 -- "-L" | grep -- "-N"
samfux    8729  0.0  0.0  59404  6636 pts/5    S    13:46   0:00 ssh sfux@euler.ethz.ch -L 51339:10.205.4.122:8888 -N
samfux@bullvalene:~/jupyter-on-Euler-or-Leonhard-Open$ kill 8729

Modifications of the script

Starting in a different location than your home directory

By default, the Jupyter notebook will start in your home directory. It is also possible to start in a different location. For this you would need to change line 122 in the script from

jupyter notebook --no-browser --ip "\$IP_REMOTE" &> /cluster/home/$USERNAME/jnbinfo

to

jupyter notebook --no-browser --ip "\$IP_REMOTE" --notebook-dir PATH &> /cluster/home/$USERNAME/jnbinfo

where PATH needs to be replaced with the path in which the Jupyter notebook should start.

Using a different Python installation, e.g., one from the new software stack

For changing the Python environment used by the script, it is sufficient to change the string in PCOMMAND:

if [ "$CLUSTERNAME" == "Euler" ]; then
    CHOSTNAME="euler.ethz.ch"
    PCOMMAND="new gcc/4.8.2 r/3.6.0 python/3.6.1 eth_proxy"
elif [ "$CLUSTERNAME" == "LeoOpen" ]; then
    CHOSTNAME="login.leonhard.ethz.ch"
    PCOMMAND="r/3.5.1 python_cpu/3.6.4 eth_proxy"
else
    echo -e "Incorrect cluster name. Please specify Euler or LeoOpen as cluster and and try again.\n"
    print_usage
    exit
fi

You can replace it with a different set of modules. Currently the script uses the old software stack, but you can also change this. For using the new software stack, you would need to make it the default software stack that is initialized upon login. Then you can replace the modules in PCOMMAND with modules from the new software stack, e.g.,

if [ "$CLUSTERNAME" == "Euler" ]; then
    CHOSTNAME="euler.ethz.ch"
    PCOMMAND="gcc/6.3.0 python/3.8.5 eth_proxy"
elif [ "$CLUSTERNAME" == "LeoOpen" ]; then
    CHOSTNAME="login.leonhard.ethz.ch"
    PCOMMAND="r/3.5.1 python_cpu/3.6.4 eth_proxy"
else
    echo -e "Incorrect cluster name. Please specify Euler or LeoOpen as cluster and and try again.\n"
    print_usage
    exit
fi

I would recommend to keep the eth_proxy module, as it can be helpful when installing new packages, which need to be downloaded.

Troubleshooting

Logs

If the script does not work, then we recommend to check the logs. There is on one hand the file $HOME/jnbinfo which contains logs, and on the other hand there is the lsf.o* logfile of the job that was running the jupyter notebook.

Mismatch of software stacks

In many cases it turns out that there are issues when the software stack used in the script does not match the default software stack that you have set. You can check which software stack is set as default by running the command

set_software_stack.sh -i

The modules

PCOMMAND="new gcc/4.8.2 r/3.6.0 python/3.6.1 eth_proxy"

are from the old software stack, whereas

PCOMMAND="gcc/6.3.0 python/3.8.5 eth_proxy"

is from the new software stack.

Conflict with locally installed Python packages

When the jnbinfo contains a Python error with a backtrace, then it could be that some of the Python packages that you installed locally in $HOME/.local are conflicting with the centrally provided Jupyter installation.