Support of Nvidia GPU devices was added to OAR and should ship with future versions (newer than 2.5.7).
Meanwhile, for those who can wait, since this only involves some configuration of the resources and using the last version of the job resource manager script taken from the sources (the job resource manager is part of the configuration files of OAR, which the administrator can modify), one can backport the feature to released versions (e.g. 2.5.7).
Next releases of OAR might provide tools to help setup GPU resources, but for now, here are some explanations for the setup (for advanced OAR admins only).
In order to enable the mechanism, you have to:
$ENABLE_DEVICESCG = “YES”;)
oarproperty -a gpudevice)
For step 4, several scenarios are possible. Let's consider here the scenario where you have 2 GPUs on your nodes, one attached to the first CPU's PCI-e bus, and the second to the 2nd CPU's PCI-e bus (see
lstopo's output for instance to know that). You can then use
oarnodesetting to set the
gpudevice property to 0 (matching the
/dev/nvidia0 Linux device) for the resources associated with the first CPU, and 1 (matching the
/dev/nvidia1 Linux device) for those associated to the 2nd CPU, for every hosts.
Users would then only have access to 1 GPU of N hosts if requesting a single CPU per host, e.g.:
$ oarsub -l /host=N/cpu=1 ...
or could even request GPUs directely, e.g.:
$ oarsub -l /host=N/gpudevice=1
Result would be equivalent (1 GPU = 1 CPU).
(Warning giving host=N is mandatory in the second variant of the
oarsub command, or one would reserve all hosts of the cluster with a same GPU id.)
nvidia-smi command in the job should show what GPU is available.
Also, if some nodes do not have any GPU, you could set the value of the property for the corresponding resources to
gpudevice=-1, and let the users add to the
oarsub command a
-p “gpudevice >=0” in order to get resources with GPUs.
Many thanks to Nicolas Niclausse for this contribution.