This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
wiki:managing_resources_cpu_gpu [2019/11/19 17:56] – [Second scenario, more complex] neyron | wiki:managing_resources_cpu_gpu [2020/03/03 14:03] – [First scenario, simple] neyron | ||
---|---|---|---|
Line 11: | Line 11: | ||
In database these 4 kinds of resource properties are all stored as **columns** of the '' | In database these 4 kinds of resource properties are all stored as **columns** of the '' | ||
- | **Given a hierarchy** (chosen by the administrator for its cluster setup, for instance cluster/ | + | **Given a hierarchy** (chosen by the administrator for his cluster setup, for instance cluster/ |
| | ||
One rule must be kept in mind: **any unique object in the resources hierarchy must have a unique id among its set of object**. For example: | One rule must be kept in mind: **any unique object in the resources hierarchy must have a unique id among its set of object**. For example: | ||
Line 69: | Line 69: | ||
Also, if some nodes do not have any GPU, you could set the value of the property for the corresponding resources to '' | Also, if some nodes do not have any GPU, you could set the value of the property for the corresponding resources to '' | ||
- | ===== Second scenario, | + | |
+ | But be **warned**, that the following commands will mostly not provide what a user would expect: | ||
+ | <code bash> | ||
+ | $ oarsub -l gpudevice=1 | ||
+ | </ | ||
+ | will gives all resources matching one identifier of gpudevices, which means all nodes limited to their first gpus ('' | ||
+ | |||
+ | <code bash> | ||
+ | $ oarsub -l gpudevice=N | ||
+ | </ | ||
+ | with N > 1 makes even less sense. See the setup proposed in the section below if you want to let your users request N gpus like that (using '' | ||
+ | |||
+ | **We strongly suggest to setup the second scenario below, which defines the '' | ||
+ | |||
+ | |||
+ | |||
+ | ===== Second scenario, | ||
Lets assume now that you have a cluster of 3 nodes with 32 GB of RAM and per node: | Lets assume now that you have a cluster of 3 nodes with 32 GB of RAM and per node: | ||
* 2 CPUs of 6 cores each | * 2 CPUs of 6 cores each | ||
Line 81: | Line 97: | ||
Lets translate that to technical words: | Lets translate that to technical words: | ||
- | * first we have to define the resources hierarchy levels: cluster, host, cpu, core, gpu | + | * first we have to define the resources hierarchy levels: cluster, host, cpu, gpu, core |
- | * then we have to define the gpu resource for the system mapping: gpudevice (i.e. the GPU equivalent of cpuset | + | * then we have to define the GPU resource for the system mapping: gpudevice (i.e. the GPU' |
* finally any additional resource property can be added, like mem for host memory, cpumodel for the CPU model, gpumodel, etc. | * finally any additional resource property can be added, like mem for host memory, cpumodel for the CPU model, gpumodel, etc. | ||
Line 168: | Line 184: | ||
When reserving 1 GPU, the user obviously gets the 3 cores associated to the GPUs. | When reserving 1 GPU, the user obviously gets the 3 cores associated to the GPUs. | ||
- | GPU job can be tied to GPU resources (where '' | + | Finally, |
<code perl> | <code perl> | ||
foreach my $mold (@{$ref_resource_list}){ | foreach my $mold (@{$ref_resource_list}){ | ||
Line 192: | Line 208: | ||
Warning: make sure to look at lstopo output in order to correctly associate cpuset and gpudevices, e.g. not associating cores and GPUs not attached to a same CPU. | Warning: make sure to look at lstopo output in order to correctly associate cpuset and gpudevices, e.g. not associating cores and GPUs not attached to a same CPU. | ||
+ | Warning: mind the fact that with the defined hierarchy '' | ||
+ | <code bash> | ||
+ | $ oarsub -l host=1/ | ||
+ | </ | ||
+ | In that case, you select 1 host, with 8 of its cores, and 2 GPUs of each cores. But since for each core, there is at most 1 gpu value, that makes no sense. | ||
+ | |||
+ | Also, that: | ||
+ | <code bash> | ||
+ | $ oarsub -l host=1/ | ||
+ | </ | ||
+ | is equivalent to: | ||
+ | <code bash> | ||
+ | $ oarsub -l host=1/ | ||
+ | </ | ||
+ | The user will get 1 host with 8 cores of it. Nothing is said about what or how many GPUs will be available in the job. | ||
====== The oar_resource_add command ====== | ====== The oar_resource_add command ====== | ||
In OAR 2.5.8 (starting from 2.5.8 RC6), the oar_resource_add command provides some support to create GPU resources in OAR as well as CPU-Core resources with relevant topologies. | In OAR 2.5.8 (starting from 2.5.8 RC6), the oar_resource_add command provides some support to create GPU resources in OAR as well as CPU-Core resources with relevant topologies. | ||