Support of Nvidia GPU devices was added to OAR and should ship with future versions (newer than 2.5.7).

Meanwhile, for those who can wait, since this only involves some configuration of the resources and using the last version of the job resource manager script taken from the sources (the job resource manager is part of the configuration files of OAR, which the administrator can modify), one can backport the feature to released versions (e.g. 2.5.7).

See:

Next releases of OAR might provide tools to help setup GPU resources, but for now, here are some explanations for the setup (for advanced OAR admins only).

In order to enable the mechanism, you have to:

  1. use the last job resource manager (see above)
  2. enable the device cgroup mechanism in it ($ENABLE_DEVICESCG = “YES”;)
  3. add a resource property for the gpu devices (oarproperty -a gpudevice)
  4. set the values for this new resource property for all resources.

For step 4, several scenarios are possible. Let's consider here the scenario where you have 2 GPUs on your nodes, one attached to the first CPU's PCI-e bus, and the second to the 2nd CPU's PCI-e bus (see lstopo's output for instance to know that). You can then use oarnodesetting to set the gpudevice property to 0 (matching the /dev/nvidia0 Linux device) for the resources associated with the first CPU, and 1 (matching the /dev/nvidia1 Linux device) for those associated to the 2nd CPU, for every hosts.

Users would then only have access to 1 GPU of N hosts if requesting a single CPU per host, e.g.:

$ oarsub -l /host=N/cpu=1 ...

or could even request GPUs directely, e.g.:

$ oarsub -l /host=N/gpudevice=1 

Result would be equivalent (1 GPU = 1 CPU). (Warning giving host=N is mandatory in the second variant of the oarsub command, or one would reserve all hosts of the cluster with a same GPU id.)

Running the nvidia-smi command in the job should show what GPU is available.

Also, if some nodes do not have any GPU, you could set the value of the property for the corresponding resources to gpudevice=-1, and let the users add to the oarsub command a -p “gpudevice >=0” in order to get resources with GPUs.

Many thanks to Nicolas Niclausse for this contribution.

wiki/managing_gpu_resources.txt · Last modified: 2016/04/14 19:54 by neyron
Recent changes RSS feed GNU Free Documentation License 1.3 Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki