Differences

This shows you the differences between two versions of the page.

--- wiki:coupling_oarsh_with_gnu_parallel_to_mimic_salloc_sbatch [2020/02/27 22:04] – neyron
+++ wiki:coupling_oarsh_with_gnu_parallel_to_mimic_salloc_srun [2020/03/03 13:28] – [PoC with cores] neyron
@@ Line 3: / Line 3: @@
 E.g. 1 batch per core (or gpu), in a job which has many nodes/cores/gpus allocated.
-This requires the change made in this commit:
+This requires the changes made in this [[https://github.com/oar-team/oar/tree/2.5_oarsh_on_a_sub_set_of_resources_a_la_srun|branch]], especially this
-https://github.com/oar-team/oar/commit/72a6cce952a3fa736872d69c6e72fb2d8ffac5de
+[[https://github.com/oar-team/oar/commit/72a6cce952a3fa736872d69c6e72fb2d8ffac5de|commit]]
-(not merged yet)
+(not merged yet).
-==== PoC with cores ====
+===== PoC with cores =====
-== Create a job with 2 nodes (and all their cores, here 4) ==
+PoC in [[oar-docker]]
+==== Create a job with 2 nodes (and all their cores, here 4) ====
 <code bash>
@@ Line 17: / Line 18: @@
 </code>
-== Create the parallel sshloginfile that defines the connector to each core ==
+==== Create the parallel sshloginfile that defines the connector to each core ====
 <code bash>
 docker@frontend ~$ oarstat -j 1 -p | oarprint -f - core -P cpuset,host -F "1/OAR_JOB_ID=1 OAR_USER_CPUSET=% oarsh %" | tee .parallel/cores
@@ Line 31: / Line 32: @@
 We force the ''1/'' at the beginning of the lines, so that there is only 1 run of parallel on each remote. If not, parallel complains it cannot find out itself the number of cpus = run per remote, and if it could anyway, it might be completely wrong because not aware of the cgroups/cpusets.
-== Create a sample script ==
+==== Create a sample script ====
 <code bash>
 docker@frontend ~$ cat <<'EOF' > script.sh
@@ Line 43: / Line 44: @@
 </code>
-== Test a run with a batch of 10 inputs ==
+==== Test a run with a batch of 10 inputs ====
 <code bash>
 docker@frontend ~$ seq 10 | parallel --slf cores ./script.sh
@@ Line 59: / Line 60: @@
 As we can see, every job is indeed run in the cpuset with only 1 logical cpu available for the execution !
-==== PoC with GPUs in Grid'5000====
+===== PoC with GPUs in Grid'5000 =====
 Same can be done with GPUs: run of a batch of jobs that executes each on a single GPU only.
 Here we have 2 nodes (chifflet-3 and chifflet-7) with 2 GeForce GPUs each.
-== Generate the parallel sshlogin file to execute on each GPU ==
+==== Generate the parallel sshlogin file to execute on each GPU ====
 From the head node of the OAR job, chifflet-3:
 <code bash>
@@ Line 75: / Line 78: @@
 Here we use the ''-C+'' option of ''oarprint'', because ''GNU parallel'' does not like '','' as a separator for the ''OAR_USER_CPUSET'' values in the sshlogin file. ''oarsh'' accepts ''+'' like '','' or ''.'' or '':'' indifferently.
-== Create a new sample script ==
+==== Create a new sample script ====
 <code bash>
 [pneyron@chifflet-3 ~](1733271-->56mn)$ cat <<'EOF' > ~/script.sh
@@ Line 93: / Line 96: @@
 </code>
-== Run parallel ==
+==== Run parallel ====
 <code bash>
 [pneyron@chifflet-3 ~](1733271-->-56mn)$ seq 5 | parallel --slf gpus ~/script.sh