Documentation

Elysium is the central HPC Cluster at the Ruhr-Universität Bochum. See the overview of its resources.

To use Elysium, you need to

Please read about the basic concept of using Elysium first.

The login process combines SSH key-based authentication with web-based two-factor authentication.

After login, you can use available software modules or build your own software.

Read about submitting jobs and allocating resources in the SLURM section.

Subsections of Documentation

Basics

Elysium provides four login nodes: login1.elysium.hpc.rub.de, …, login4.elysium.hpc.rub.de. These are your entry points to the cluster.

After login, you typically use them to prepare your software and copy your data to the appropriate locations.

You can then allocate resources on the cluster using the Slurm workload manager.

After submitting your request, Slurm will grant you the resources as soon as they are free and your priority is higher than the priority of other jobs that might be waiting for some of the same resources.

Your priority depends on your waiting time and your remaining FairShare.

Login

Login to Elysium combines the common SSH key-based authentication with web-based two-factor authentication. You need to enable two-factor authentication for your RUB LoginID at rub.de/login.

The additional web-based authentication is cached for 14 hours so that you typically only have to do it once per work day, per login node, and per IP address you connect from. After that, your normal key-based SSH workflow will work as expected.

Follow these steps:

  1. Start ssh with the correct private key, your RUB LoginID, and one of the four login hosts, e.g.
    ssh -i ~/.ssh/elysium LOGINID@login1.elysium.hpc.ruhr-uni-bochum.de.
    Available login nodes are login1 to login4. Login step 1: start ssh Login step 1: start ssh

  2. Open the URL in a browser (or scan the QR code with your smartphone) to start web-based two-factor authentication. Login step 2 part 1: start web-based authentication Login step 2 part 1: start web-based authentication

  3. Enter the second factor for two-factor authentication. Login step 2 part 2: web-based LoginID / password authentication Login step 2 part 2: web-based LoginID / password authentication

  4. After successful login, you get a four-digit verification code. Login step 2 part 3: get the verification code Login step 2 part 3: get the verification code

  5. Enter this code at your ssh prompt to finish login. Login step 3: verify the SSH session Login step 3: verify the SSH session

For the next 14 hours, only step 1 (classic key-based authentication) will be necessary on the chosen login node for the IP address you connected from.

Login will fail if:

  • You use the wrong private key (“Permission denied (publickey)”)
  • You are not member of an active HPC project (“Permission denied (publickey)”)
  • You did not enable two-factor authentication for your LoginID (“Two-factor authentication is required”)
  • Web-based login fails
  • You enter the wrong verification code (“Verification failed”)
  • A timeout happens between starting the SSH session and finalizing web-based login (“session_id not found”); just start the process again to get a new session ID.

Software

We provide a basic set of toolchains and some common libraries via modules.

To build common HPC software packages, we provide a central installation of the Spack package manager.

Subsections of Software

Modules

We use the Lmod module system:

  • module available (shortcut ml av) lists available modules
  • module load loads selected modules
  • module list shows modules currently loaded in your environment

There are also hidden modules that are generally less relevant to users but can be viewed with ml --show_hidden av.

We are committed to providing the tools you need for your research and development efforts. If you require modules that are not listed here or need different versions, please contact our support team, and we will be happy to assist you.

Compilers

  • GCC 11.4.1, default on the system.
  • GCC 13.2.0.
  • AOCC 4.2.0, AMD Optimizing C/C++ Compiler
  • Intel OneAPI Compilers: 2024.1.0
  • Intel Classic Compilers: 2021.10.0
  • NVHPC 24.7, NVIDIA HPC Compilers

MPI Libraries

  • OpenMPI 4.1.6
  • OpenMPI 5.0.3 (default).
  • MPICH 4.2.1
  • Intel OneAPI MPI 2021.12.1.

Mathematical Libraries

  • AMD Math Libraries:
  • AMD BLIS, BLAS-like libraries.
  • AMD FFTW, a fast Fourier transform library.
  • AMD libFLAME, a library for dense matrix computations. (LAPACK)
  • AMD ScaLAPACK.
  • HDF5: Version 1.14.3 (built with MPI)
  • Boost 1.85.0

Programming Languages

  • Julia 1.10.2, a high-level, high-performance dynamic programming language.
  • R 4.4.0, for statistical computing.
  • Python 3.11.7

Tools and Utilities

  • CUDA Toolkit 12.6.1
  • GDB (GNU Debugger)
  • Apptainer

Spack

We use the Spack package manager to build a set of common HPC software packages.

This section describes how to use an extent the central installation.

Alternatively, you can use a full independent Spack installation in your home directory, or use EasyBuild.

Using the Central Installation

Activate the central Spack installation with source /cluster/spack/0.22.2/share/spack/setup-env.sh.

You can use this as a starting point for your own software builds, without the need to rebuild everything from scratch.

For this purpose, add the following three files to ~/.spack:

  • ~/.spack/upstreams.yaml
    upstreams:
      central-spack:
        install_tree: /cluster/spack/0.22.2/opt/spack
  • ~/.spack/config.yaml
    config:
      install_tree:
        root: $HOME/spack/opt/spack
      source_cache: $HOME/spack/cache
  • ~/.spack/modules.yaml
    modules:
      default:
        roots:
          lmod: $HOME/spack/share/spack/lmod
        enable: [lmod]
        lmod:
          all:
            autoload: direct
          hide_implicits: true
          hierarchy: []

Also, put these two lines into your ~/.bashrc:

export MODULEPATH=$MODULEPATH:$HOME/spack/share/spack/lmod/linux-almalinux9-x86_64/Core
. /cluster/spack/0.22.2/share/spack/setup-env.sh

You can then use the central Spack installation, with local additions added in ~/spack.

Run spack compiler find to add the system compiler to your compiler list. Also run it after loading other compilers via module load to add those, too.

Overriding Package Definitions

If you need to override selected package definitions, create an additional file ~/.spack/repos.yaml:

repos:
  - $HOME/spack/var/spack/repos

and create a description for your local repo in ~/spack/var/spack/repos/repo.yaml:

repo:
  namespace: overrides

You can then copy the package definition you need to override, and edit it locally. Example for ffmpeg:

$ cd ~/spack/var/spack/repos/
$ mkdir -p packages/ffmpeg
$ cp /cluster/spack/0.22.2/var/spack/repos/builtin/packages/ffmpeg/package.py packages/ffmpeg
$ vim packages/ffmpeg/package.py
... edit as necessary (e.g. disable the patch for version 6.1.1) ...

When running spack install ffmpeg, your local override will take precedence over the central version.

SLURM

The Elysium HPC system utilizes SLURM as a resource manager, scheduler, and accountant in order to guarantee fair share of the computing resources.

If you are looking for technical details regarding the usage and underlying mechanisms of SLURM we recommend participating in the Introduction to HPC training course.

Examples of job scripts for different job types that are tailored to the Elysium cluster can be found in the Training Section.

List of Partition

All nodes in the Elysium cluster are grouped by their hardware kind, and job submission type. This way users can request specific computing hardware, and multi node jobs are guaranteed to run on nodes with the same setup.

In order to get a list of the available partitions, their current state, and available nodes, the sinfo command can be used.

 1[login_id@login001 ~]$ sinfo
 2PARTITION      AVAIL  TIMELIMIT  NODES  STATE NODELIST
 3cpu               up 7-00:00:00      4  alloc cpu[033-034,037-038]
 4cpu               up 7-00:00:00    280   idle cpu[001-032,035-036,039-284]
 5cpu_filler        up    3:00:00      4  alloc cpu[033-034,037-038]
 6cpu_filler        up    3:00:00    280   idle cpu[001-032,035-036,039-284]
 7fat_cpu           up 2-00:00:00     13   idle fatcpu[001-013]
 8fat_cpu_filler    up    3:00:00     13   idle fatcpu[001-013]
 9gpu               up 2-00:00:00     20   idle gpu[001-020]
10gpu_filler        up    1:00:00     20   idle gpu[001-020]
11fat_gpu           up 2-00:00:00      1 drain* fatgpu005
12fat_gpu           up 2-00:00:00      5    mix fatgpu[001,003-004,006-007]
13fat_gpu           up 2-00:00:00      1   idle fatgpu002
14fat_gpu_filler    up    1:00:00      1 drain* fatgpu005
15fat_gpu_filler    up    1:00:00      5    mix fatgpu[001,003-004,006-007]
16fat_gpu_filler    up    1:00:00      1   idle fatgpu002
17vis               up 1-00:00:00      3   idle vis[001-003]

Requesting Nodes of a Partition

SLURM provides two commands to request resources. srun is used to start an interactive session.

1[login_id@login001 ~]$ srun --partition=cpu --job-name=test --time=00:05:00 --account=testproj_0000 --pty bash
2[login_id@cpu001 ~]$

sbatch is used to request resources that will execute a job script.

1[login_id@login001 ~]$ sbatch --partition=cpu --job-name=test --time=00:05:00 --account=testproj_0000 myscript.sh
2Submitted batch job 10290

For sbatch the submission flags can also be incorporated into the job script itself. More information about job scripts, and the required and some optional flags can be found in the Training/SLURM Header section.

Use spredict myscript.sh to estimate the start time of your job.

List of Currently Running and Pending Jobs

If requested resources are currently not available, jobs are queued and will start as soon as the resources are available again. To check which jobs are currently running, and which ones are pending and for what reason the squeue command can be used. For privacy reasons only the user’s own jobs are displayed.

1[login_id@login001 ~]$ squeue
2             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
3             10290       cpu     test login_id  R       2:51      1 cpu001

List of Computing Resources Share

Users/Projects/Groups/Institutes are billed for computing resources used. To check how many resources a user is entitled to and how many they have already used the sshare command is used. For privacy reasons only the user’s own shares are displayed.

1[login_id@login001 ~]$ sshare
2Account                    User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare
3-------------------- ---------- ---------- ----------- ----------- ------------- ----------
4testproj_0000          login_id       1000    0.166667    20450435      0.163985   0.681818

List of Project Accounts

Due to technical reasons the project names on Elysium have rather cryptic names, based on the loginID of the project manager and a number. In order to make it easier to select a project account for the --account flag for srun, or sbatch, and to check the share and usage of projects, the RUB-exclusive rub-acclist command can be used.

1[login_id@login001 ~]$ rub-acclist
2Project ID    | Project Description
3--------------+--------------------------------------------------
4testproj_0000 | The fundamental interconnectedness of all things
5testproj_0001 | The translated quaternion for optimal pivoting

Visualization

We provide Visualization via VirtualGL on the visualization nodes on Elysium.hpc.ruhr-uni-bochum.de

Requirements:

X11 server with 24-bit- or 32-bit Visuals. VirtualGL version > 3.0.2 installed.

You can check support for your Operating Sytsem at: https://virtualgl.org/Documentation/OSSupport You can download VirtualGL at: https://github.com/VirtualGL/virtualgl/releases

To use VirtualGL on Elysium, you will only need the VirtualGL client, it is not necessary to configure a VirtualGL Server.

Resource allocation:

Allocate resources in the vis partition.

salloc -p vis -N1 --time=02:00:00 --account=$ACCOUNT

This will allocate one entire vis node for 2 hours. Wait until a Slot in the vis partition is available. You can check if your resources are already allocated using the ‘squeue’ command.

Establish Virtual GL connection:

Connect directly from your computer to the visualization node via ssh with vglonnect -s Use one of the login servers login[001-004] as a jump host.

vglconnect -s $HPCUSER@vis001.elysium.hpc.rub.de -J $HPCUSER@login001.elysium.hpc.rub.de

If you don’t like long commands, you can configure one of the login nodes as jump host in your ~/.ssh/config for the vis[001-003] hosts. The command vglconnect -s accepts nearly the same syntax as ssh.

Run your Software:

Load a module if required. Start your application using vglrun, please remember to use useful command line options like -fps .

module load vmd
vglrun +pr -fps 60 vmd

Please remember to cancel the resource allocation once you are done with your interactive session.

scancel $jobID