Euler III Beta Testing
The Euler III extension to the Euler cluster is available for all interested beta testers. We welcome your feedback about these nodes.
Serial jobs or single-node parallel jobs using from 1 to 4 cores and use up to 30 GB of total memory per node are good candidates to run on these nodes. In the beginning only jobs that request up to 24 hours will run.
Before starting, familiarize yourself with the known issues listed below. Be aware that these nodes are provided without guarantee and you should not rely on them for production purposes. Especially during the beta phase we reserve the right to reboot nodes or otherwise terminate jobs for diagnostic, troubleshooting, operational, or other reasons.
Contents
Known issues
- Missing libraries
- Euler III nodes run CentOS 7, unlike the rest of Euler, which runs CentOS 6. Some libraries may be missing. See Missing libraries below if you encounter problems.
- Infiniband and MPI
- Euler III nodes do not have an Infiniband network, but they do have a fast, low-latency Ethernet interconnect. See Submitting parallel jobs below for details.
- NAS NFS mounts
- Euler III nodes are in a different IP range than the rest of the Euler nodes. If you use your own NAS, then you need to change the export rules and/or update your firewall to include the new IP addresses. The NAS shares provided by the Storage Group of the IT Services have been automatically changed to include the new IP ranges.
- Back connections (from Euler to external server)
- If your job connects to your workstation or another external server, you will need to change your firewall and/or access rules because Euler III nodes are in a different IP range than the rest of the Euler nodes.
- Jobs submitted from Euler III nodes will not start
- We are looking into this.
Submitting beta jobs
To submit a job to run on the beta Euler III nodes, you must request the beta resource, e.g.,
bsub -R beta [other bsub options] ./my_command
Submitting parallel jobs
While the Euler III nodes are targeted to serial and shared-memory parallel jobs, multi-node parallel jobs are still accepted. You need to request at most four cores per node:
bsub -R beta -R "span[ptile=4]" [other bsub options] ./my_command
For MVAPICH2 you need to tell the system that Infiniband is not available,
module load interconnect/ethernet
before loading the MPI module.
- Open MPI
- Open MPI 1.6.5 has been tested to work with acceptable performance.
- MVAPICH2
- MVAPICH2 2.1 works but preliminary results show low scalability. You need to load the interconnect/ethernet module.
- Intel MPI
- Intel MPI 5.1.3 has been tested.
Troubleshooting
Missing libraries
Euler III nodes run CentOS 7, which includes many updated libraries. We have included as many backward-compatible libraries as possible in the default system. However, due to stability and operational concerns, there are some that we had to install as a separate module.
If your program aborts with an error message such as
[leonhard@eu-ms-001-01 ~]$ ./some_program some_program: error while loading shared libraries: libpython2.6.so.1.0: cannot open shared object file: No such file or directory
but it works on the other, older, Euler nodes, then one of the libraries is not found.
Self-compiled programs
If you have built your program yourself, then for now it is advisable to load the “legacy” and “centos_cruft/6” modules before submitting your beta job (or before calling your program within your job shell script). For example,
[leonhard@euler00 ~]$ module load legacy centos_cruft/6 [leonhard@euler00 ~]$ bsub -R beta ./my_program
Euler-provided programs and modules
Let us know if you encounter this problem when using a program provided by us so we can fix it for all users. Please include the error message in your report.