Job Scripts
Job Scripts
A SLURM job script usually follows the following steps:
- SLURM Header
- Create temporary folder on local disk
- Copy input data to temporary folder
- Load required modules
- Perform actual calculation
- Copy output file back to global file system
- Tidy up local storage
SLURM Header
The SLURM HEADER is a section in the script after the shebang.
Every line begins with #SBATCH
.
1#SBATCH --nodes=1 # Request 1 Node
2#SBATCH --partition=gpu # Run in partition cpu
3#SBATCH --job-name=minimal_gpu # name of the job in squeue
4#SBATCH --gpus=1 # number of GPUs to reserve
5#SBATCH --time=00:05:00 # estimated runtime (dd-hh:mm:ss)
6#SBATCH --account=lambem64_0000 # Project ID (check with rub-acclist)
This way the bash interpreter ignores these lines,
but SLURM can pick them out to parse the contents.
Additionally each line contains one of the
sbatch flags.
On Elysium the flags --partition
, --time
, and --account
are required.
For GPU-jobs the additional --gpus
flag needs to be specified and at least 1.
Mandatory Flags
Flag | Example | Note |
---|---|---|
--partiton=<partition> |
--partition=cpu |
list of partitions with sinfo |
--time=<dd-hh:mm:ss> |
--time=00-02:30:00 |
maximum time the job will run |
--account=<account> |
--account=snublaew_0001 |
project the used computing time shall be billed to. list of project accounts with rub-acclist |
--gpus=<n> |
--gpus=1 |
number of GPUs. Must be at least 1 for GPU partitions |
Optional Flags
Flag | Example | Note |
---|---|---|
--job-name=<name> |
--job-name="mysim" |
job name that is shown in squeue for the job |
--exclusive |
--exclusive |
Nodes are not shared with other jobs (default on cpu, fat_cpu, gpu). |
--output=<filename> |
--output=%x-%j.out |
Filename to contain stdout (%x=job name , %j=job-id ) |
--errput=<filename> |
--errput=%x-%j.err |
Filename to contain stderr (%x=job name , %j=job-id ) |
--mail-type=<TYPE> |
--mail-type=ALL |
Notify user by email when certain event types occur. If specified --mail-user needs to be set. |
--mail-user=<rub-mail> |
--mail-user=max.muster@rub.de |
Address to which job notifications of type --mail-type are send. |
Temporary Folder
If your code reads from some input, or writes output,
the performance can strongly depend on where the data is located.
If the data is in your home, or on the lustre file system
the read/write performance is limited by the bandwidth of the interconnect.
In addition to that a parallel file system has problems
with many small read/write operations by design.
It’s performance shines with reading/writing big chunks.
Thus it is advisable to create a folder on the local disks in the /tmp/
directory,
and perform all read/write operations in there.
At the beginning of the job any input data is put there in one copy,
and all output data is copied from the /tmp/
directory to its final location in one go.
1# obtain the current location
2HDIR=$(pwd)
3
4# create a temporary working directory on the node
5WDIR=/tmp
6cd ${WDIR}
7
8# copy set of input files to the working directory
9cp ${HDIR}/inputdata/* ${WDIR}
10
11...
12
13# copy the set of output files back to the original folder
14cp outputdata ${HDIR}/outputs/
15
16# tidy up local files
17rm -rf ${WDIR}/*
Loading Modules
If your program was build with certain versions of libraries it may be required to provide the same libraries at runtime. Since everybody’s needs regarding library versions is different Elysium utilizes environment modules to manager software versions.
1# unload all previously loaded modules
2module purge
3
4# show all module that are available
5module avail
6
7# load a specific module
8module load the_modules_name_and_version
9
10# list all loaded modules
11module list
Perform Calculation
How to perform your calculation strongly depends on your specific software and inputs. In general there are four typical ways to run HPC jobs.
Farming
Farming jobs are used if the program is not parallelized, or scales in a way that it can only utilize a few CPU cores efficiently. Then multiple instances of the same program are started. Each with a different input, as long as the instances have roughly the same runtime.
1for irun in $(seq 1 ${stride} ${ncores})
2do
3 # The core count needs to start at 0 and goes to ncores-1
4 taskset -c $(bc <<< "${irun-1}") ${myexe} inp.${irun} > out.${irun}
5done
6wait
Shared Memory
Programs that incorporate thread spawning (usually via OpenMP) can make use of multiple cores.
1export OMP_NUM_THREADS=${SLURM_TASKS_PER_NODE}
2${myexe} input
Distributed Memory
If programs require more resources than can be provided by one node it is necessary to pass information between the processes running on different nodes. This is usually done via the MPI protocol. A program must be specifically programmed to utilize MPI.
1ncorespernode=48
2nnodes=${SLURM_JOB_NUM_NODES}
3ncorestotal=$(bc <<< "${ncorespernode}*${nnodes}")
4mpirun -np ${ncorestotal} -ppn ${ncorespernode} ${myexe} input
Hybrid Memory (Shared and Distributed Memory)
In programs that utilize distributed memory parallelization via MPI it is possible to spawn threads within each process to make use of shared memory parallelization.
1nthreadsperproc=2
2ncorespernode=$(bc <<< "48/${nthreadsperproc}")
3nnodes=${SLURM_JOB_NUM_NODES}
4ncorestotal=$(bc <<< "${ncorespernode}*${nnodes}")
5export OMP_NUM_THREADS=${nthreadsperproc}
6mpirun -np ${ncorestotal} -ppn ${ncorespernode} ${myexe} input
GPU
Support for offloading tasks to GPUs needs to be incorporated into the program.
1export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
2${myexe} input
Examples
The following example scripts are ready to use on the Elysium cluster.
The only change you need to make is to specify a valid account for the --account
flag.
You can use the rub-acclist
command to get a list of your available project accounts.
The executed programs do not produce any load and will finish in a few seconds.
The generated output shows where each process/thread ran, and if it had access to a GPU.
Minimal CPU Job Script Example
Shared Memory Job Script Example
Distributed Memory Job Script Example
Hybrid Memory Job Script Example