Introduction to Linux
Why Linux?
Linux based operating systems are the de facto standard for HPC systems.
Thus it is vital to have a solid understanding of how to work with Linux.
Linux Introductory Training
We offer an in-person course that combines a lecture and interactive exercises.
The course covers the following topics:
- Why Linux?
- Directory Structure
- The Terminal
- Navigating the Directory Structure
- Modifying the Directory Structure
- Handling Files
- Permission Denied
- Editing Files in the Terminal
- Workflow and Pipelines
- Automation and Scripting
- Environment Variables
- Monitoring System Resources
Registration
Dates for the courses are announced via the
tier3-hpc mailing list.
Registration for the next course can be done via
Moodle.
We expect all who registered in the course to participate in the next course.
If you change your mind about participation please deregister from the course
to free one of the limited spots to others.
Do I Need This Course?
If you are already proficient in the topics listed above you may skip the course.
It is not a requirement to get access to the cluster.
In the Moodle course we provide a quiz where you can check your proficiency with Linux.
Slides
Here you may download the slides for the course:
Introduction to Linux.
Introduction to HPC
Why HPC Training?
Usage of HPC resources differs significantly from handling a regular desktop computer.
Thus it is vital to have a solid understanding of how to work with HPC systems.
HPC Introductory Training
We offer an in-person course that combines a lecture and interactive exercises.
The course covers the following topics:
- What is High Performance Computing
- HPC-Cluster Components
- How to Access a Cluster?
- SLURM - Requesting Resources
- SLURM - How Resources are Scheduled
- SLURM - Accounting and Sharing of Compute Time
- Environment Modules
- Parallelization Models
- Scaling of Parallel Applications
- Code of Conduct
Registration
Dates for the courses are announced via the
tier3-hpc mailing list.
Registration for the next course can be done via
Moodle.
We expect all who registered in the course to participate in the next course.
If you change your mind about participation please deregister from the course
to free one of the limited spots to others.
Do I Need This Course?
If you are already proficient in the topics listed above you may skip the course.
It is not a requirement to get access to the cluster.
In the Moodle course we provide a quiz where you can check your proficiency with HPC systems.
Slides
Here you may download the slides for the course:
Introduction to HPC.
Job Scripts
Job Scripts
Jump to Example Scripts
A SLURM job script usually follows the following steps:
- SLURM Header
- Create temporary folder on local disk
- Copy input data to temporary folder
- Load required modules
- Perform actual calculation
- Copy output file back to global file system
- Tidy up local storage
The SLURM HEADER is a section in the script after the shebang.
Every line begins with #SBATCH
.
1#SBATCH --nodes=1 # Request 1 Node
2#SBATCH --partition=gpu # Run in partition cpu
3#SBATCH --job-name=minimal_gpu # name of the job in squeue
4#SBATCH --gpus=1 # number of GPUs to reserve
5#SBATCH --time=00:05:00 # estimated runtime (dd-hh:mm:ss)
6#SBATCH --account=lambem64_0000 # Project ID (check with rub-acclist)
This way the bash interpreter ignores these lines,
but SLURM can pick them out to parse the contents.
Additionally each line contains one of the
sbatch flags.
On Elysium the flags --partition
, --time
, and --account
are required.
For GPU-jobs the additional --gpus
flag needs to be specified and at least 1.
Mandatory Flags
Flag |
Example |
Note |
--partiton=<partition> |
--partition=cpu |
list of partitions with sinfo |
--time=<dd-hh:mm:ss> |
--time=00-02:30:00 |
maximum time the job will run |
--account=<account> |
--account=snublaew_0001 |
project the used computing time shall be billed to. list of project accounts with rub-acclist |
--gpus=<n> |
--gpus=1 |
number of GPUs. Must be at least 1 for GPU partitions |
Optional Flags
Flag |
Example |
Note |
--job-name=<name> |
--job-name="mysim" |
job name that is shown in squeue for the job |
--exclusive |
--exclusive |
Nodes are not shared with other jobs (default on cpu, fat_cpu, gpu). |
--output=<filename> |
--output=%x-%j.out |
Filename to contain stdout (%x=job name , %j=job-id ) |
--errput=<filename> |
--errput=%x-%j.err |
Filename to contain stderr (%x=job name , %j=job-id ) |
--mail-type=<TYPE> |
--mail-type=ALL |
Notify user by email when certain event types occur. If specified --mail-user needs to be set. |
--mail-user=<rub-mail> |
--mail-user=max.muster@rub.de |
Address to which job notifications of type --mail-type are send. |
Temporary Folder
If your code reads from some input, or writes output,
the performance can strongly depend on where the data is located.
If the data is in your home, or on the lustre file system
the read/write performance is limited by the bandwidth of the interconnect.
In addition to that a parallel file system has problems
with many small read/write operations by design.
It’s performance shines with reading/writing big chunks.
Thus it is advisable to create a folder on the local disks in the /tmp/
directory,
and perform all read/write operations in there.
At the beginning of the job any input data is put there in one copy,
and all output data is copied from the /tmp/
directory to its final location in one go.
1# obtain the current location
2HDIR=$(pwd)
3
4# create a temporary working directory on the node
5WDIR=/tmp
6cd ${WDIR}
7
8# copy set of input files to the working directory
9cp ${HDIR}/inputdata/* ${WDIR}
10
11...
12
13# copy the set of output files back to the original folder
14cp outputdata ${HDIR}/outputs/
15
16# tidy up local files
17rm -rf ${WDIR}/*
Loading Modules
If your program was build with certain versions of libraries
it may be required to provide the same libraries at runtime.
Since everybody’s needs regarding library versions is different
Elysium utilizes environment modules to manager software versions.
1# unload all previously loaded modules
2module purge
3
4# show all module that are available
5module avail
6
7# load a specific module
8module load the_modules_name_and_version
9
10# list all loaded modules
11module list
How to perform your calculation strongly depends on your specific software and inputs.
In general there are four typical ways to run HPC jobs.
Farming
Farming jobs are used if the program is not parallelized,
or scales in a way that it can only utilize a few CPU cores efficiently.
Then multiple instances of the same program are started.
Each with a different input, as long as the instances have roughly the same runtime.
1for irun in $(seq 1 ${stride} ${ncores})
2do
3 # The core count needs to start at 0 and goes to ncores-1
4 taskset -c $(bc <<< "${irun-1}") ${myexe} inp.${irun} > out.${irun}
5done
6wait
Shared Memory
Programs that incorporate thread spawning (usually via OpenMP) can
make use of multiple cores.
1export OMP_NUM_THREADS=${SLURM_TASKS_PER_NODE}
2${myexe} input
Distributed Memory
If programs require more resources than can be provided by one node
it is necessary to pass information between the processes running on different nodes.
This is usually done via the MPI protocol.
A program must be specifically programmed to utilize MPI.
1ncorespernode=48
2nnodes=${SLURM_JOB_NUM_NODES}
3ncorestotal=$(bc <<< "${ncorespernode}*${nnodes}")
4mpirun -np ${ncorestotal} -ppn ${ncorespernode} ${myexe} input
Hybrid Memory (Shared and Distributed Memory)
In programs that utilize distributed memory parallelization via MPI
it is possible to spawn threads within each process to make use of
shared memory parallelization.
1nthreadsperproc=2
2ncorespernode=$(bc <<< "48/${nthreadsperproc}")
3nnodes=${SLURM_JOB_NUM_NODES}
4ncorestotal=$(bc <<< "${ncorespernode}*${nnodes}")
5export OMP_NUM_THREADS=${nthreadsperproc}
6mpirun -np ${ncorestotal} -ppn ${ncorespernode} ${myexe} input
GPU
Support for offloading tasks to GPUs needs to be incorporated into the program.
1export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
2${myexe} input
Examples
The following example scripts are ready to use on the Elysium cluster.
The only change you need to make is to specify a valid account for the --account
flag.
You can use the rub-acclist
command to get a list of your available project accounts.
The executed programs do not produce any load and will finish in a few seconds.
The generated output shows where each process/thread ran, and if it had access to a GPU.
Minimal CPU Job Script Example
Farming Job Script Example
Shared Memory Job Script Example
Distributed Memory Job Script Example
Hybrid Memory Job Script Example
Minimal GPU Job Script Example
GPU Job Script Example
Distributed Memory with GPU Job Script Example