This page contains all information you need to submit GPU-jobs successfully on Ubelix.
Important Information on GPU Usage
Privileged vs. Regular Users
We have two categories of users on Ubelix concerning GPU usage: privileged and regular users. Privileged users are users that have invested money into GPUs. Jobs of privileged users can preempt running jobs of regular users on a certain number of GPUs. Unless the option --no-requeue was used when submitting the job, a preempted job is automatically requeued, or canceled otherwise. A requeued job can start on different resources. This behavior is enforced by job QOSs. Whether a job is privileged or not depends on the job QoS that was used to submit the job. Regular users submit their jobs always with the unprivileged QoS 'job_gpu', while privileged users submits their jobs by default with the privileged QoS 'job_gpu_<name_of_head>'. Additionally, privileged users can also submit jobs with the unprivileged QoS. A privileged job will cancel a running unprivileged job when the following two criteria are met:
- There are no free GPU resources of the requested GPU type available.
- The QoS of the privileged user has not yet reached the maximum number of GPUs allowed to use with this QoS.
If an unprivileged job needs to be preempted to make resources available for a privileged job, Slurm will always preempt the youngest running job in the partition.
Access to the 'gpu' Partition
While the 'gpu' partition is open for everybody, regular users must request access to this partition explicitly before they can submit jobs. You have to request access only once. To do so, simply write an email to email@example.com and describe in a few words your application.
Ubelix currently features two types of GPUs:
- 48x Nvidia Geforce GTX 1080 Ti
- 6x Nvidia Tesla P100
You must request a GPU type using the --gres option:
--gres=gpu:1080ti:<number_of_gpus> or --gres=gpu:teslaP100:<number_of_gpus>
Use the following options to submit a job to the 'gpu' partition using the default job QoS:
#SBATCH --partition=gpu #SBATCH --gres=gpu:<type>:<number_of_gpus>
Privileged user only: Use the following options to submit a job using the non-privileged QoS:
#SBATCH --partition=gpu #SBATCH --qos=job_gpu #SBATCH --gres=gpu:<type>:<number_of_gpus>
Use the following option to ensure that the job, if preempted, won't be requeued but canceled instead:
CUDA C/C++ Basics: http://www.nvidia.com/docs/IO/116711/sc11-cuda-c-basics.pdf
Nvidia Geforce GTX 1080 Ti: https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080-ti/
Nvidia Tesla P100: http://www.nvidia.com/object/tesla-p100.html
There is no content with the specified labels