Differences

This shows you the differences between two versions of the page.

Link to this comparison view

wiki:use_cases_and_user_tips [2020/03/25 15:23] (current)
neyron created
Line 1: Line 1:
 +====== Use cases ======
 +===== OpenMPI + affinity =====
 +
 +We saw that the Linux kernel seems to be incapable of using correctly all the CPUs from the cpusets.
 +
 +Indeed, reserving 2 out of 8 cores on a node and running a code that uses 2
 +process, these 2 process where not well assigned to each cpu.
 +We had to give the CPU MAP to OpenMPI to do cpu_affinity:​
 +
 +<code bash>
 + i=0 ; oarprint core -P host,cpuset -F "% slot=%"​ | while read line ; do echo "rank $i=$line";​ ((i++)); done > affinity.txt
 +
 + ​[user@node12 tmp]$ mpirun -np 8 --mca btl openib,self -v -display-allocation -display-map ​ --machinefile $OAR_NODEFILE -rf affinity.txt /​home/​user/​espresso-4.0.4/​PW/​pw.x < BeO_100.inp
 +</​code>​
 +
 +===== NUMA topology optimization =====
 +In this use case, we've got a numa host (an Altix 450) having a "​squared"​ topology: nodes are interconnected by routers like in this view:
 +
 +{{:​wiki:​lfi_topology.png?​nolink&​300|}}
 +In yellow, "​routers",​ in magenta, "​nodes"​ (2 dual-core processors per node)
 +
 +Routers interconnect IRUS (chassis) on which the nodes are plugged (4 or 5 nodes per IRU). 
 +
 +What we want is that for jobs that can enter into 2 IRUS or less, minimize the distance between the resources (ie use IRUS that have only one router interconnexion between them). The topology may be siplified as follows:
 +
 +{{:​wiki:​lfi_topology_square.png?​nolink&​300|}}
 +
 +The idea is to use moldable jobs and an admission rule that converts automatically the user requests to a moldable job. This job uses 2 resource properties: **numa_x** and **numa_y** that may be analogue to the square coordinates. What we want in fact, is the job that ends the soonest between a job running on an X or on a Y coordinate (we only want vertical or horizontal placed jobs).
 +
 +The numa_x and numa_y properties are set up this way (pnode is a property corresponding to physical nodes):
 +
 +{|border="​1"​
 +!pnode
 +!iru
 +!numa_x
 +!numa_y
 +|-
 +|itanium1
 +|1
 +|0
 +|1
 +|-
 +|itanium2
 +|1
 +|0
 +|1
 +|-
 +|itanium3
 +|1
 +|0
 +|1
 +|-
 +|itanium4
 +|1
 +|0
 +|1
 +|-
 +|itanium5
 +|2
 +|1
 +|1
 +|-
 +|itanium6
 +|2
 +|1
 +|1
 +|-
 +|itanium7
 +|2
 +|1
 +|1
 +|-
 +|itanium8
 +|2
 +|1
 +|1
 +|-
 +|itanium9
 +|2
 +|1
 +|1
 +|-
 +|itanium10
 +|3
 +|0
 +|0
 +|-
 +|itanium11
 +|3
 +|0
 +|0
 +|-
 +|itanium12
 +|3
 +|0
 +|0
 +|-
 +|itanium13
 +|3
 +|0
 +|0
 +|-
 +|itanium14
 +|3
 +|0
 +|0
 +|-
 +|itanium15
 +|4
 +|1
 +|0
 +|-
 +|itanium16
 +|4
 +|1
 +|0
 +|-
 +|itanium17
 +|4
 +|1
 +|0
 +|-
 +|itanium18
 +|4
 +|1
 +|0
 +|}
 +
 +For example, the following requested ressources:
 +<​code>​
 + -l /core=16
 +</​code>​
 +will result into:
 +<​code>​
 + -l /​numa_x=1/​pnode=4/​cpu=2/​core=2 -l /​numa_y=1/​pnode=4/​cpu=2/​core=2
 +</​code>​
 +
 +Here is the admission rule making that optimization:​
 +
 +<code bash>
 + # Title : Numa optimization
 + # Description : Creates a moldable job to take into account the "​squared"​ topology of an Altix 450
 + my $n_core_per_cpus=2;​
 + my $n_cpu_per_pnode=2;​
 + if (grep(/​^itanium$/,​ @{$type_list}) && (grep(/​^manual$/,​ @{$type_list}) == ""​) && $#​$ref_resource_list == 0){
 +   print "​[ADMINSSION RULE] Optimizing for numa architecture (use \\"-t manual\\"​ to disable)";​
 +   my $resources_def=$ref_resource_list->​[0];​
 +   my $core=0;
 +   my $cpu=0;
 +   my $pnode=0;
 +   ​foreach my $r (@{$resources_def->​[0]}) {
 +     ​foreach my $resource (@{$r->​{resources}}) {
 +       if ($resource->​{resource} eq "​core"​) {$core=$resource->​{value};​}
 +       if ($resource->​{resource} eq "​cpu"​) {$cpu=$resource->​{value};​}
 +       if ($resource->​{resource} eq "​pnode"​) {$pnode=$resource->​{value};​}
 +     }
 +   }
 +   # Now, calculate the number of total cores
 +   my $n_cores=0;
 +   if ($pnode == 0 && $cpu != 0 && $core == 0) {
 +     ​$n_cores = $cpu*$n_core_per_cpus;​
 +   }
 +   elsif ($pnode != 0 && $cpu == 0 && $core == 0) {
 +     ​$n_cores = $pnode*$n_cpu_per_pnode*$n_core_per_cpus;​
 +   }
 +   elsif ($pnode != 0 && $cpu == 0 && $core != 0) {
 +     ​$n_cores = $pnode*$core;​
 +   }
 +   elsif ($pnode == 0 && $cpu != 0 && $core != 0) {
 +     ​$n_cores = $cpu*$core;
 +   }
 +   elsif ($pnode == 0 && $cpu == 0 && $core != 0) {
 +     ​$n_cores = $core;
 +   }
 +   else { $n_cores = $pnode*$cpu*$core;​ }
 +   print "​[ADMINSSION RULE] You requested $n_cores cores\";​
 +   if ($n_cores > 32) {
 +     print "​[ADMISSION RULE] Big job (>32 cores), no optimization is possible";​
 +   ​}else{
 +     print "​[ADMISSION RULE] Optimization produces: /​numa_x=1/​$pnode/​$cpu/​$core
 +                                         /​numa_y=1/​$pnode/​$cpu/​$core";​
 + 
 +     my @newarray=eval(Dumper(@{$ref_resource_list}->​[0]));​
 +     push (@{$ref_resource_list},​@newarray);​
 +     ​unshift(@{%{@{@{@{$ref_resource_list}->​[0]}->​[0]}->​[0]}->​{resources}},​{'​resource'​ => '​numa_x','​value'​ => '​1'​});​
 +     ​unshift(@{%{@{@{@{$ref_resource_list}->​[1]}->​[0]}->​[0]}->​{resources}},​{'​resource'​ => '​numa_y','​value'​ => '​1'​});​
 +   }
 + }
 +</​code>​
 +
 +====== Users tips ======
 +===== oarsh completion =====
 +//Tip based on an idea from Jerome Reybert//
 +
 +In order to complete nodes names in a oarsh command, add these lines in your .bashrc
 +
 +<code bash>
 +function _oarsh_complete_() {
 +  if [ -n "​$OAR_NODEFILE"​ -a "​$COMP_CWORD"​ -eq 1 ]; then
 +    local word=${comp_words[comp_cword]}
 +    local list=$(cat $OAR_NODEFILE | uniq | tr '​\n'​ ' ')
 +    COMPREPLY=($(compgen -W "​$list"​ -- "​${word}"​))
 +  fi
 +}
 +complete -o default -F _oarsh_complete_ oarsh
 +</​code>​
 +
 +Then try oarsh <TAB>
 +===== OAR aware shell prompt for Interactive jobs =====
 +If you want to have a bash prompt with your job id and the remaining walltime then you can add in your ~/.bashrc:
 +
 +<code bash>
 +if [ "​$PS1"​ ]; then
 +    __oar_ps1_remaining_time(){
 +        if [ -n "​$OAR_JOB_WALLTIME_SECONDS"​ -a -n "​$OAR_NODE_FILE"​ -a -r "​$OAR_NODE_FILE"​ ]; then
 +            DATE_NOW=$(date +%s)
 +            DATE_JOB_START=$(stat -c %Y $OAR_NODE_FILE)
 +            DATE_TMP=$OAR_JOB_WALLTIME_SECONDS
 +            ((DATE_TMP = (DATE_TMP - DATE_NOW + DATE_JOB_START) / 60))
 +            echo -n "​$DATE_TMP"​
 +        fi
 +    }
 +    PS1='​[\u@\h|\W]$([ -n "​$OAR_NODE_FILE"​ ] && echo -n "​(\[\e[1;​32m\]$OAR_JOB_ID\[\e[0m\]-->​\[\e[1;​34m\]$(__oar_ps1_remaining_time)mn\[\e[0m\])"​)\$ '
 +    if [ -n "​$OAR_NODE_FILE"​ ]; then
 +        echo "[OAR] OAR_JOB_ID=$OAR_JOB_ID"​
 +        echo "[OAR] Your nodes are:"
 +        sort $OAR_NODE_FILE | uniq -c | awk '​{printf(" ​     %s*%d",​ $2, $1)}END{printf("​\n"​)}'​ | sed -e '​s/,​$//'​
 +    fi
 +fi
 +</​code>​
 +
 +
 +Then the prompt inside an Interactive job will be like:
 +
 +<code bash>
 +  [capitn@node006~](3101-->​29mn)$
 +</​code>​
 +
 +===== Many small jobs grouping =====
 +Many small jobs of a few seconds may be painful for the OAR system. OAR may spend more time scheduling, allocating and launching than the actual computation time for each job.
 +
 +Gabriel Moreau developed a script that may be useful when you have a large set of small jobs. It groups you jobs into a unique bigger OAR job:
 +  * http://​servforge.legi.grenoble-inp.fr/​pub/​soft-trokata/​oarutils/​oar-parexec.html
 +You can download it from this page:
 +  * http://​servforge.legi.grenoble-inp.fr/​projects/​soft-trokata/​wiki/​SoftWare/​OarUtils
 +
 +For a more generic approach you can use Cigri, a grid middleware running onto OAR cluster(s) that is able to automatically group parametric jobs. Cigri is currently in a re-writing process and a new public release is planned for the end of 2012.
 +
 +Please contact Bruno.Bzeznik@imag.fr for more informations.
 +
 +===== Environment variables through oarsh =====
 +
 +  * http://​servforge.legi.grenoble-inp.fr/​pub/​soft-trokata/​oarutils/​oar-envsh.html
 +  * http://​servforge.legi.grenoble-inp.fr/​projects/​soft-trokata/​wiki/​SoftWare/​OarUtils
  
wiki/use_cases_and_user_tips.txt ยท Last modified: 2020/03/25 15:23 by neyron
Recent changes RSS feed GNU Free Documentation License 1.3 Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki