This shows you the differences between two versions of the page.
Previous revisionLast revision | |||
— | wiki:use_cases_and_user_tips [2021/07/28 15:59] – [Many small jobs grouping] neyron | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Use cases ====== | ||
+ | ===== OpenMPI + affinity ===== | ||
+ | |||
+ | We saw that the Linux kernel seems to be incapable of using correctly all the CPUs from the cpusets. | ||
+ | |||
+ | Indeed, reserving 2 out of 8 cores on a node and running a code that uses 2 | ||
+ | process, these 2 process where not well assigned to each cpu. | ||
+ | We had to give the CPU MAP to OpenMPI to do cpu_affinity: | ||
+ | |||
+ | <code bash> | ||
+ | i=0 ; oarprint core -P host,cpuset -F "% slot=%" | ||
+ | |||
+ | | ||
+ | </ | ||
+ | |||
+ | ===== NUMA topology optimization ===== | ||
+ | In this use case, we've got a numa host (an Altix 450) having a " | ||
+ | |||
+ | {{: | ||
+ | In yellow, " | ||
+ | |||
+ | Routers interconnect IRUS (chassis) on which the nodes are plugged (4 or 5 nodes per IRU). | ||
+ | |||
+ | What we want is that for jobs that can enter into 2 IRUS or less, minimize the distance between the resources (ie use IRUS that have only one router interconnexion between them). The topology may be siplified as follows: | ||
+ | |||
+ | {{: | ||
+ | |||
+ | The idea is to use moldable jobs and an admission rule that converts automatically the user requests to a moldable job. This job uses 2 resource properties: **numa_x** and **numa_y** that may be analogue to the square coordinates. What we want in fact, is the job that ends the soonest between a job running on an X or on a Y coordinate (we only want vertical or horizontal placed jobs). | ||
+ | |||
+ | The numa_x and numa_y properties are set up this way (pnode is a property corresponding to physical nodes): | ||
+ | |||
+ | {|border=" | ||
+ | !pnode | ||
+ | !iru | ||
+ | !numa_x | ||
+ | !numa_y | ||
+ | |- | ||
+ | |itanium1 | ||
+ | |1 | ||
+ | |0 | ||
+ | |1 | ||
+ | |- | ||
+ | |itanium2 | ||
+ | |1 | ||
+ | |0 | ||
+ | |1 | ||
+ | |- | ||
+ | |itanium3 | ||
+ | |1 | ||
+ | |0 | ||
+ | |1 | ||
+ | |- | ||
+ | |itanium4 | ||
+ | |1 | ||
+ | |0 | ||
+ | |1 | ||
+ | |- | ||
+ | |itanium5 | ||
+ | |2 | ||
+ | |1 | ||
+ | |1 | ||
+ | |- | ||
+ | |itanium6 | ||
+ | |2 | ||
+ | |1 | ||
+ | |1 | ||
+ | |- | ||
+ | |itanium7 | ||
+ | |2 | ||
+ | |1 | ||
+ | |1 | ||
+ | |- | ||
+ | |itanium8 | ||
+ | |2 | ||
+ | |1 | ||
+ | |1 | ||
+ | |- | ||
+ | |itanium9 | ||
+ | |2 | ||
+ | |1 | ||
+ | |1 | ||
+ | |- | ||
+ | |itanium10 | ||
+ | |3 | ||
+ | |0 | ||
+ | |0 | ||
+ | |- | ||
+ | |itanium11 | ||
+ | |3 | ||
+ | |0 | ||
+ | |0 | ||
+ | |- | ||
+ | |itanium12 | ||
+ | |3 | ||
+ | |0 | ||
+ | |0 | ||
+ | |- | ||
+ | |itanium13 | ||
+ | |3 | ||
+ | |0 | ||
+ | |0 | ||
+ | |- | ||
+ | |itanium14 | ||
+ | |3 | ||
+ | |0 | ||
+ | |0 | ||
+ | |- | ||
+ | |itanium15 | ||
+ | |4 | ||
+ | |1 | ||
+ | |0 | ||
+ | |- | ||
+ | |itanium16 | ||
+ | |4 | ||
+ | |1 | ||
+ | |0 | ||
+ | |- | ||
+ | |itanium17 | ||
+ | |4 | ||
+ | |1 | ||
+ | |0 | ||
+ | |- | ||
+ | |itanium18 | ||
+ | |4 | ||
+ | |1 | ||
+ | |0 | ||
+ | |} | ||
+ | |||
+ | For example, the following requested ressources: | ||
+ | < | ||
+ | -l /core=16 | ||
+ | </ | ||
+ | will result into: | ||
+ | < | ||
+ | -l / | ||
+ | </ | ||
+ | |||
+ | Here is the admission rule making that optimization: | ||
+ | |||
+ | <code bash> | ||
+ | # Title : Numa optimization | ||
+ | # Description : Creates a moldable job to take into account the " | ||
+ | my $n_core_per_cpus=2; | ||
+ | my $n_cpu_per_pnode=2; | ||
+ | if (grep(/ | ||
+ | print " | ||
+ | my $resources_def=$ref_resource_list-> | ||
+ | my $core=0; | ||
+ | my $cpu=0; | ||
+ | my $pnode=0; | ||
+ | | ||
+ | | ||
+ | if ($resource-> | ||
+ | if ($resource-> | ||
+ | if ($resource-> | ||
+ | } | ||
+ | } | ||
+ | # Now, calculate the number of total cores | ||
+ | my $n_cores=0; | ||
+ | if ($pnode == 0 && $cpu != 0 && $core == 0) { | ||
+ | | ||
+ | } | ||
+ | elsif ($pnode != 0 && $cpu == 0 && $core == 0) { | ||
+ | | ||
+ | } | ||
+ | elsif ($pnode != 0 && $cpu == 0 && $core != 0) { | ||
+ | | ||
+ | } | ||
+ | elsif ($pnode == 0 && $cpu != 0 && $core != 0) { | ||
+ | | ||
+ | } | ||
+ | elsif ($pnode == 0 && $cpu == 0 && $core != 0) { | ||
+ | | ||
+ | } | ||
+ | else { $n_cores = $pnode*$cpu*$core; | ||
+ | print " | ||
+ | if ($n_cores > 32) { | ||
+ | print " | ||
+ | | ||
+ | print " | ||
+ | / | ||
+ | |||
+ | my @newarray=eval(Dumper(@{$ref_resource_list}-> | ||
+ | push (@{$ref_resource_list}, | ||
+ | | ||
+ | | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | ====== Users tips ====== | ||
+ | ===== oarsh completion ===== | ||
+ | //Tip based on an idea from Jerome Reybert// | ||
+ | |||
+ | In order to complete nodes names in a oarsh command, add these lines in your .bashrc | ||
+ | |||
+ | <code bash> | ||
+ | function _oarsh_complete_() { | ||
+ | if [ -n " | ||
+ | local word=${comp_words[comp_cword]} | ||
+ | local list=$(cat $OAR_NODEFILE | uniq | tr ' | ||
+ | COMPREPLY=($(compgen -W " | ||
+ | fi | ||
+ | } | ||
+ | complete -o default -F _oarsh_complete_ oarsh | ||
+ | </ | ||
+ | |||
+ | Then try oarsh <TAB> | ||
+ | ===== OAR aware shell prompt for Interactive jobs ===== | ||
+ | If you want to have a bash prompt with your job id and the remaining walltime then you can add in your ~/.bashrc: | ||
+ | |||
+ | <code bash> | ||
+ | if [ " | ||
+ | __oar_ps1_remaining_time(){ | ||
+ | if [ -n " | ||
+ | DATE_NOW=$(date +%s) | ||
+ | DATE_JOB_START=$(stat -c %Y $OAR_NODE_FILE) | ||
+ | DATE_TMP=$OAR_JOB_WALLTIME_SECONDS | ||
+ | ((DATE_TMP = (DATE_TMP - DATE_NOW + DATE_JOB_START) / 60)) | ||
+ | echo -n " | ||
+ | fi | ||
+ | } | ||
+ | PS1=' | ||
+ | if [ -n " | ||
+ | echo "[OAR] OAR_JOB_ID=$OAR_JOB_ID" | ||
+ | echo "[OAR] Your nodes are:" | ||
+ | sort $OAR_NODE_FILE | uniq -c | awk ' | ||
+ | fi | ||
+ | fi | ||
+ | </ | ||
+ | |||
+ | |||
+ | Then the prompt inside an Interactive job will be like: | ||
+ | |||
+ | <code bash> | ||
+ | [capitn@node006~](3101--> | ||
+ | </ | ||
+ | |||
+ | ===== Many small jobs grouping ===== | ||
+ | Many small jobs of a few seconds may be painful for the OAR system. OAR may spend more time scheduling, allocating and launching than the actual computation time for each job. | ||
+ | |||
+ | Gabriel Moreau developed a script that may be useful when you have a large set of small jobs. It groups you jobs into a unique bigger OAR job: | ||
+ | * http:// | ||
+ | You can download it from this page: | ||
+ | * http:// | ||
+ | |||
+ | For a more generic approach you can use Cigri, a grid middleware running onto OAR cluster(s) that is able to automatically group parametric jobs. Cigri is currently in a re-writing process and a new public release is planned for the end of 2012. | ||
+ | |||
+ | Please contact Bruno.Bzeznik@imag.fr for more informations. | ||
+ | |||
+ | ===== Overcoming quoting issue in oarsub ===== | ||
+ | One may get a bit of a weird behavior when trying to submit a series of jobs through a bash script. For instance: | ||
+ | <code bash> | ||
+ | #!/bin/bash | ||
+ | |||
+ | for d in ../test/*/ ; do | ||
+ | CMD=" | ||
+ | echo $CMD | ||
+ | $CMD | ||
+ | done | ||
+ | </ | ||
+ | This does not work, due to quoting issues. | ||
+ | |||
+ | A solution is to use a bash arrays to define the command, as follows: | ||
+ | <code bash> | ||
+ | #!/bin/bash | ||
+ | |||
+ | for d in ../test/*/ ; do | ||
+ | declare -a CMD | ||
+ | CMD=(oarsub -p " | ||
+ | " | ||
+ | done | ||
+ | </ | ||
+ | This works. | ||
+ | ===== Environment variables through oarsh ===== | ||
+ | |||
+ | * http:// | ||
+ | * http:// | ||