Demonstration of the use of ''GNU parallel'' with ''oarsh'', so that a user can execute batch (''parallel'') jobs using subsets of the resources allocated to a bigger (''OAR'') job. E.g. 1 batch per core (or gpu), in a job which has many nodes/cores/gpus allocated. This requires OAR 2.5.9 which is only available in beta version for now (oar-2.5.9+g5k6) ===== PoC with cores ===== PoC in [[oar-docker]] ==== Create a job with 2 nodes (and all their cores, here 4) ==== docker@frontend ~$ oarsub -l nodes=2 "sleep 4h" [ADMISSION RULE] Set default walltime to 7200. [ADMISSION RULE] Modify resource description with type constraints OAR_JOB_ID=1 ==== Create the parallel sshloginfile that defines the connector to each core ==== docker@frontend ~$ oarstat -j 1 -p | oarprint -f - core -P cpuset,host -F "1/OAR_JOB_ID=1 OAR_USER_CPUSET=% oarsh %" | tee .parallel/cores 1/OAR_JOB_ID=1 OAR_USER_CPUSET=3 oarsh node2 1/OAR_JOB_ID=1 OAR_USER_CPUSET=1 oarsh node1 1/OAR_JOB_ID=1 OAR_USER_CPUSET=2 oarsh node2 1/OAR_JOB_ID=1 OAR_USER_CPUSET=0 oarsh node2 1/OAR_JOB_ID=1 OAR_USER_CPUSET=3 oarsh node1 1/OAR_JOB_ID=1 OAR_USER_CPUSET=1 oarsh node2 1/OAR_JOB_ID=1 OAR_USER_CPUSET=2 oarsh node1 1/OAR_JOB_ID=1 OAR_USER_CPUSET=0 oarsh node1 We force the ''1/'' at the beginning of the lines, so that there is only 1 run of parallel on each remote. If not, parallel complains it cannot find out itself the number of cpus = run per remote, and if it could anyway, it might be completely wrong because not aware of the cgroups/cpusets. ==== Create a sample script ==== docker@frontend ~$ cat <<'EOF' > script.sh #!/bin/bash HOSTNAME=$(hostname) CPUSET=$(< /proc/self/cpuset) CPUSET_CPUS=$(< /sys/fs/cgroup/cpuset/$CPUSET/cpuset.cpus) echo "$HOSTNAME:$CPUSET:$CPUSET_CPUS> job $@" EOF docker@frontend ~$ chmod 755 script.sh ==== Test a run with a batch of 10 inputs ==== docker@frontend ~$ seq 10 | parallel --slf cores ./script.sh node2:/oardocker/node2/oar/docker_1/2:2> job 1 node1:/oardocker/node1/oar/docker_1/3:3> job 4 node1:/oardocker/node1/oar/docker_1/1:1> job 2 node2:/oardocker/node2/oar/docker_1/3:3> job 3 node2:/oardocker/node2/oar/docker_1/0:0> job 5 node1:/oardocker/node1/oar/docker_1/0:0> job 8 node2:/oardocker/node2/oar/docker_1/1:1> job 7 node1:/oardocker/node1/oar/docker_1/2:2> job 6 node2:/oardocker/node2/oar/docker_1/2:2> job 9 node1:/oardocker/node1/oar/docker_1/3:3> job 10 As we can see, every job is indeed run in the cpuset with only 1 logical cpu available for the execution ! ===== PoC with GPUs in Grid'5000 ===== Same can be done with GPUs: run of a batch of jobs that executes each on a single GPU only. Here we have 2 nodes (chifflet-3 and chifflet-7) with 2 GeForce GPUs each. ==== Generate the parallel sshlogin file to execute on each GPU ==== From the head node of the OAR job, chifflet-3: [pneyron@chifflet-3 ~](1733271-->56mn)$ oarprint gpu -P gpudevice,cpuset,host -C+ -F "1/OAR_USER_GPUDEVICE=% OAR_USER_CPUSET=% oarsh %" | tee ~/.parallel/gpus 1/OAR_USER_GPUDEVICE=0 OAR_USER_CPUSET=20+18+14+12+16+0+6+26+24+22+10+4+2+8 oarsh chifflet-3.lille.grid5000.fr 1/OAR_USER_GPUDEVICE=0 OAR_USER_CPUSET=20+18+12+14+16+0+26+24+22+6+10+4+8+2 oarsh chifflet-7.lille.grid5000.fr 1/OAR_USER_GPUDEVICE=1 OAR_USER_CPUSET=25+13+27+11+9+15+23+19+1+21+17+5+7+3 oarsh chifflet-3.lille.grid5000.fr 1/OAR_USER_GPUDEVICE=1 OAR_USER_CPUSET=13+25+27+11+23+19+15+9+1+21+17+7+5+3 oarsh chifflet-7.lille.grid5000.fr Here we use the ''-C+'' option of ''oarprint'', because ''GNU parallel'' does not like '','' as a separator for the ''OAR_USER_CPUSET'' values in the sshlogin file. ''oarsh'' accepts ''+'' like '','' or ''.'' or '':'' indifferently. ==== Create a new sample script ==== [pneyron@chifflet-3 ~](1733271-->56mn)$ cat <<'EOF' > ~/script.sh #!/bin/bash echo =============================================================================== echo "JOB: $@" echo -n "BEGIN: "; date echo -n "HOSTNAME: "; hostname echo -n "CGROUPS CPUSET: "; grep -o -e "cpuset:.*" /proc/self/cgroup echo -n "CPUs: "; cat /sys/fs/cgroup/cpuset/$(< /proc/self/cpuset)/cpuset.cpus echo -n "CGROUPS DEVICES: "; grep -o -e "devices:.*" /proc/self/cgroup echo -n "GPUs: "; nvidia-smi | grep -io -e " \(tesla\|geforce\)[^|]\+|[^|]\+" | sed 's/|/=/' | paste -sd+ - sleep 3 echo -n "END: "; date [pneyron@chifflet-3 ~](1733271-->56mn)$ chmod 755 ~/script.sh ==== Run parallel ==== [pneyron@chifflet-3 ~](1733271-->-56mn)$ seq 5 | parallel --slf gpus ~/script.sh =============================================================================== JOB: 1 BEGIN: Thu 27 Feb 2020 10:01:53 PM CET HOSTNAME: chifflet-3.lille.grid5000.fr CGROUPS CPUSET: cpuset:/oar/pneyron_1733271/3,31,25,53,17,45,7,35,5,33,13,41,11,39,15,43,19,47,21,49,9,37,1,29,23,51,27,55 CPUs: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55 CGROUPS DEVICES: devices:/oar/pneyron_1733271/1 GPUs: GeForce GTX 108... Off = 00000000:82:00.0 Off END: Thu 27 Feb 2020 10:01:56 PM CET =============================================================================== JOB: 3 BEGIN: Thu 27 Feb 2020 10:01:53 PM CET HOSTNAME: chifflet-3.lille.grid5000.fr CGROUPS CPUSET: cpuset:/oar/pneyron_1733271/14,42,8,36,26,54,2,30,12,40,10,38,0,28,4,32,6,34,22,50,20,48,16,44,24,52,18,46 CPUs: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54 CGROUPS DEVICES: devices:/oar/pneyron_1733271/0 GPUs: GeForce GTX 108... Off = 00000000:03:00.0 Off END: Thu 27 Feb 2020 10:01:56 PM CET =============================================================================== JOB: 2 BEGIN: Thu 27 Feb 2020 10:01:53 PM CET HOSTNAME: chifflet-7.lille.grid5000.fr CGROUPS CPUSET: cpuset:/oar/pneyron_1733271/19,47,15,43,23,51,9,37,1,29,21,49,27,55,25,53,3,31,7,35,5,33,17,45,11,39,13,41 CPUs: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55 CGROUPS DEVICES: devices:/oar/pneyron_1733271/1 GPUs: GeForce GTX 108... Off = 00000000:82:00.0 Off END: Thu 27 Feb 2020 10:01:56 PM CET =============================================================================== JOB: 4 BEGIN: Thu 27 Feb 2020 10:01:53 PM CET HOSTNAME: chifflet-7.lille.grid5000.fr CGROUPS CPUSET: cpuset:/oar/pneyron_1733271/24,52,16,44,6,34,20,48,22,50,18,46,10,38,12,40,2,30,26,54,8,36,14,42,4,32,0,28 CPUs: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54 CGROUPS DEVICES: devices:/oar/pneyron_1733271/0 GPUs: GeForce GTX 108... Off = 00000000:03:00.0 Off END: Thu 27 Feb 2020 10:01:56 PM CET =============================================================================== JOB: 5 BEGIN: Thu 27 Feb 2020 10:01:56 PM CET HOSTNAME: chifflet-3.lille.grid5000.fr CGROUPS CPUSET: cpuset:/oar/pneyron_1733271/3,31,25,53,17,45,7,35,5,33,13,41,11,39,15,43,19,47,21,49,9,37,1,29,23,51,27,55 CPUs: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55 CGROUPS DEVICES: devices:/oar/pneyron_1733271/1 GPUs: GeForce GTX 108... Off = 00000000:82:00.0 Off END: Thu 27 Feb 2020 10:01:59 PM CET As expected, every job only has access to 1 single GPU. Regarding the logical CPUs, we see that we got those given by OAR along with their thread sibling. This is because OAR in Grid'5000 does not define the the siblings in its resources (using a thread resource, or given all siblings in the ''cpuset'' resource property), but uses the "COMPUTE_THREAD_SIBLINGS" option to compute them at the execution time.