Demonstration of the use of GNU parallel
with oarsh
, so that a user can execute batch (parallel
) jobs using subsets of the resources allocated to a bigger (OAR
) job.
E.g. 1 batch per core (or gpu), in a job which has many nodes/cores/gpus allocated.
This requires OAR 2.5.9 which is only available in beta version for now (oar-2.5.9+g5k6)
PoC in oar-docker
docker@frontend ~$ oarsub -l nodes=2 "sleep 4h" [ADMISSION RULE] Set default walltime to 7200. [ADMISSION RULE] Modify resource description with type constraints OAR_JOB_ID=1
docker@frontend ~$ oarstat -j 1 -p | oarprint -f - core -P cpuset,host -F "1/OAR_JOB_ID=1 OAR_USER_CPUSET=% oarsh %" | tee .parallel/cores 1/OAR_JOB_ID=1 OAR_USER_CPUSET=3 oarsh node2 1/OAR_JOB_ID=1 OAR_USER_CPUSET=1 oarsh node1 1/OAR_JOB_ID=1 OAR_USER_CPUSET=2 oarsh node2 1/OAR_JOB_ID=1 OAR_USER_CPUSET=0 oarsh node2 1/OAR_JOB_ID=1 OAR_USER_CPUSET=3 oarsh node1 1/OAR_JOB_ID=1 OAR_USER_CPUSET=1 oarsh node2 1/OAR_JOB_ID=1 OAR_USER_CPUSET=2 oarsh node1 1/OAR_JOB_ID=1 OAR_USER_CPUSET=0 oarsh node1
We force the 1/
at the beginning of the lines, so that there is only 1 run of parallel on each remote. If not, parallel complains it cannot find out itself the number of cpus = run per remote, and if it could anyway, it might be completely wrong because not aware of the cgroups/cpusets.
docker@frontend ~$ cat <<'EOF' > script.sh #!/bin/bash HOSTNAME=$(hostname) CPUSET=$(< /proc/self/cpuset) CPUSET_CPUS=$(< /sys/fs/cgroup/cpuset/$CPUSET/cpuset.cpus) echo "$HOSTNAME:$CPUSET:$CPUSET_CPUS> job $@" EOF docker@frontend ~$ chmod 755 script.sh
docker@frontend ~$ seq 10 | parallel --slf cores ./script.sh node2:/oardocker/node2/oar/docker_1/2:2> job 1 node1:/oardocker/node1/oar/docker_1/3:3> job 4 node1:/oardocker/node1/oar/docker_1/1:1> job 2 node2:/oardocker/node2/oar/docker_1/3:3> job 3 node2:/oardocker/node2/oar/docker_1/0:0> job 5 node1:/oardocker/node1/oar/docker_1/0:0> job 8 node2:/oardocker/node2/oar/docker_1/1:1> job 7 node1:/oardocker/node1/oar/docker_1/2:2> job 6 node2:/oardocker/node2/oar/docker_1/2:2> job 9 node1:/oardocker/node1/oar/docker_1/3:3> job 10
As we can see, every job is indeed run in the cpuset with only 1 logical cpu available for the execution !
Same can be done with GPUs: run of a batch of jobs that executes each on a single GPU only.
Here we have 2 nodes (chifflet-3 and chifflet-7) with 2 GeForce GPUs each.
From the head node of the OAR job, chifflet-3:
[pneyron@chifflet-3 ~](1733271-->56mn)$ oarprint gpu -P gpudevice,cpuset,host -C+ -F "1/OAR_USER_GPUDEVICE=% OAR_USER_CPUSET=% oarsh %" | tee ~/.parallel/gpus 1/OAR_USER_GPUDEVICE=0 OAR_USER_CPUSET=20+18+14+12+16+0+6+26+24+22+10+4+2+8 oarsh chifflet-3.lille.grid5000.fr 1/OAR_USER_GPUDEVICE=0 OAR_USER_CPUSET=20+18+12+14+16+0+26+24+22+6+10+4+8+2 oarsh chifflet-7.lille.grid5000.fr 1/OAR_USER_GPUDEVICE=1 OAR_USER_CPUSET=25+13+27+11+9+15+23+19+1+21+17+5+7+3 oarsh chifflet-3.lille.grid5000.fr 1/OAR_USER_GPUDEVICE=1 OAR_USER_CPUSET=13+25+27+11+23+19+15+9+1+21+17+7+5+3 oarsh chifflet-7.lille.grid5000.fr
Here we use the -C+
option of oarprint
, because GNU parallel
does not like ,
as a separator for the OAR_USER_CPUSET
values in the sshlogin file. oarsh
accepts +
like ,
or .
or :
indifferently.
[pneyron@chifflet-3 ~](1733271-->56mn)$ cat <<'EOF' > ~/script.sh #!/bin/bash echo =============================================================================== echo "JOB: $@" echo -n "BEGIN: "; date echo -n "HOSTNAME: "; hostname echo -n "CGROUPS CPUSET: "; grep -o -e "cpuset:.*" /proc/self/cgroup echo -n "CPUs: "; cat /sys/fs/cgroup/cpuset/$(< /proc/self/cpuset)/cpuset.cpus echo -n "CGROUPS DEVICES: "; grep -o -e "devices:.*" /proc/self/cgroup echo -n "GPUs: "; nvidia-smi | grep -io -e " \(tesla\|geforce\)[^|]\+|[^|]\+" | sed 's/|/=/' | paste -sd+ - sleep 3 echo -n "END: "; date [pneyron@chifflet-3 ~](1733271-->56mn)$ chmod 755 ~/script.sh
[pneyron@chifflet-3 ~](1733271-->-56mn)$ seq 5 | parallel --slf gpus ~/script.sh =============================================================================== JOB: 1 BEGIN: Thu 27 Feb 2020 10:01:53 PM CET HOSTNAME: chifflet-3.lille.grid5000.fr CGROUPS CPUSET: cpuset:/oar/pneyron_1733271/3,31,25,53,17,45,7,35,5,33,13,41,11,39,15,43,19,47,21,49,9,37,1,29,23,51,27,55 CPUs: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55 CGROUPS DEVICES: devices:/oar/pneyron_1733271/1 GPUs: GeForce GTX 108... Off = 00000000:82:00.0 Off END: Thu 27 Feb 2020 10:01:56 PM CET =============================================================================== JOB: 3 BEGIN: Thu 27 Feb 2020 10:01:53 PM CET HOSTNAME: chifflet-3.lille.grid5000.fr CGROUPS CPUSET: cpuset:/oar/pneyron_1733271/14,42,8,36,26,54,2,30,12,40,10,38,0,28,4,32,6,34,22,50,20,48,16,44,24,52,18,46 CPUs: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54 CGROUPS DEVICES: devices:/oar/pneyron_1733271/0 GPUs: GeForce GTX 108... Off = 00000000:03:00.0 Off END: Thu 27 Feb 2020 10:01:56 PM CET =============================================================================== JOB: 2 BEGIN: Thu 27 Feb 2020 10:01:53 PM CET HOSTNAME: chifflet-7.lille.grid5000.fr CGROUPS CPUSET: cpuset:/oar/pneyron_1733271/19,47,15,43,23,51,9,37,1,29,21,49,27,55,25,53,3,31,7,35,5,33,17,45,11,39,13,41 CPUs: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55 CGROUPS DEVICES: devices:/oar/pneyron_1733271/1 GPUs: GeForce GTX 108... Off = 00000000:82:00.0 Off END: Thu 27 Feb 2020 10:01:56 PM CET =============================================================================== JOB: 4 BEGIN: Thu 27 Feb 2020 10:01:53 PM CET HOSTNAME: chifflet-7.lille.grid5000.fr CGROUPS CPUSET: cpuset:/oar/pneyron_1733271/24,52,16,44,6,34,20,48,22,50,18,46,10,38,12,40,2,30,26,54,8,36,14,42,4,32,0,28 CPUs: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54 CGROUPS DEVICES: devices:/oar/pneyron_1733271/0 GPUs: GeForce GTX 108... Off = 00000000:03:00.0 Off END: Thu 27 Feb 2020 10:01:56 PM CET =============================================================================== JOB: 5 BEGIN: Thu 27 Feb 2020 10:01:56 PM CET HOSTNAME: chifflet-3.lille.grid5000.fr CGROUPS CPUSET: cpuset:/oar/pneyron_1733271/3,31,25,53,17,45,7,35,5,33,13,41,11,39,15,43,19,47,21,49,9,37,1,29,23,51,27,55 CPUs: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55 CGROUPS DEVICES: devices:/oar/pneyron_1733271/1 GPUs: GeForce GTX 108... Off = 00000000:82:00.0 Off END: Thu 27 Feb 2020 10:01:59 PM CET
As expected, every job only has access to 1 single GPU.
Regarding the logical CPUs, we see that we got those given by OAR along with their thread sibling. This is because OAR in Grid'5000 does not define the the siblings in its resources (using a thread resource, or given all siblings in the cpuset
resource property), but uses the “COMPUTE_THREAD_SIBLINGS” option to compute them at the execution time.