This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
wiki:customization_tips [2020/03/25 15:06] – [NUMA topology optimization] neyron | wiki:customization_tips [2020/03/25 15:19] – neyron | ||
---|---|---|---|
Line 336: | Line 336: | ||
| | ||
*/10 * * * * root / | */10 * * * * root / | ||
- | </ | ||
- | |||
- | ====== Useful commands and administration tasks ====== | ||
- | //Here, you'll find useful commands, sometimes a bit tricky, to put into your scripts or administration tasks// | ||
- | |||
- | ===== List suspected nodes without running jobs ===== | ||
- | You may need this list of nodes if you want to automatically reboot them because you don't know why they have been suspected and you think that it is a simple way to clean things: | ||
- | < | ||
- | | ||
- | | ||
- | </ | ||
- | |||
- | ===== List alive nodes without running jobs ===== | ||
- | < | ||
- | | ||
- | | ||
- | </ | ||
- | |||
- | ===== Oarstat display without best-effort jobs ===== | ||
- | |||
- | < | ||
- | | ||
- | </ | ||
- | |||
- | ===== Setting some nodes in maintenance mode only when they are free ===== | ||
- | |||
- | You may need to plan some maintenance operations on some particular nodes (for example add somme memory, upgrade bios,...) but you don't want to interrupt currently running or planned users jobs. To do so, you can simply run a " | ||
- | <code bash> | ||
- | | ||
- | </ | ||
- | This uses the " | ||
- | |||
- | The example above will disable 2 free nodes, but you may want to add a //-p// option to specify the nodes you want to disable, for example: '' | ||
- | |||
- | **Note:** you can't simply do that within a " | ||
- | |||
- | ===== Optimizing and re-initializing the database with Postgres ===== | ||
- | Sometimes, the database contains so much jobs that you need to optimize it. Normally, you should have a **vacuumdb** running daily fron cron. You can do manually a **vacuumdb -a -f -z ; reindexdb oar** but don't forget to stop OAR before, and be aware that it may take some time. But the DB still may be very big and it may be a problem for backups or the nightly vaccum takes too much time. A more radical solution is to start again with a new database, but keep the old one so that you can still connect to it for jobs history. You can do this once a year for example, and you only have to backup the current database. Here is a way to do this: | ||
- | |||
- | * First of all, make a backup of your database! With postgres, it is as easy as: | ||
- | < | ||
- | | ||
- | </ | ||
- | It will create an exact copy of the " | ||
- | * You should plan a maintenance and be sure there' | ||
- | * Make a dump of your " | ||
- | * Stop the oar server, drop the oar database and re-create it. | ||
- | * Finally, restore the " | ||
- | * And restart the server. | ||
- | |||
- | ====== Green computing ====== | ||
- | //In this section, you'll find tips for optimizing the fluids consumptions of your clusters// | ||
- | ===== Activating the dynamic on/off of nodes but keeping a few nodes always ready ===== | ||
- | **Warning: | ||
- | |||
- | First of all, you have to set up the ecological feature as told into the FAQ: [[http:// | ||
- | |||
- | **Note:** if you have an ordinary cluster with nodes that are always available, you may set the cm_availability property to 2147483646 (infinite minus 1) | ||
- | |||
- | **Note: ** once this feature has been activated, the **absent** status may not always really mean absent, but **standby** as oar may want to automatically power on the node. to put a node into a real absent status, you have to set the cm_availability property to **0** | ||
- | |||
- | This tip supposes that you have set up your nodes to automatically set them to the Alive state when they boot and to the Absent state when they shutdown. You may refer to the FAQ for this: [[http:// | ||
- | |||
- | Here, we provide 3 scripts that you may customize and that make your ecological configuration a bit smarter than the default as it will be aware of keeping powered on a few nodes (4 in this example) that will be ready for incoming jobs: | ||
- | |||
- | ==wake_up_nodes.sh== | ||
- | <code bash> | ||
- | #!/bin/bash | ||
- | |||
- | IPMI_HOST=" | ||
- | POWER_ON_CMD=" | ||
- | |||
- | NODES=`cat` | ||
- | |||
- | for NODE in $NODES | ||
- | do | ||
- | ssh $IPMI_HOST $POWER_ON_CMD $NODE | ||
- | done | ||
- | </ | ||
- | |||
- | Very simple script containing the command that powers on your nodes. In this example, suitable for an SGI Altix Ice, we do a **cpower** from an **admin** host. You'll probably have to customize this. This script is to be put in front of the SCHEDULER_NODE_MANAGER_WAKE_UP_CMD option of the oar.conf file, like this: | ||
- | < | ||
- | | ||
- | </ | ||
- | |||
- | == set_standby_nodes.sh == | ||
- | <code bash> | ||
- | #!/bin/bash | ||
- | set -e | ||
- | |||
- | # This script is intended to be used from the SCHEDULER_NODE_MANAGER_SLEEP_CMD | ||
- | # variable of the oar.conf file. | ||
- | # It halts the nodes given in the stdin, but refuses to stop nodes if this | ||
- | # results in less than # | ||
- | # want to have some nodes ready for treating immediately some jobs. | ||
- | |||
- | NODES_KEEP_ALIVE=4 | ||
- | |||
- | NODES=`cat` | ||
- | |||
- | ALIVE_NODES=`oarnodes | ||
- | |||
- | NODES_TO_SHUTDOWN="" | ||
- | |||
- | for NODE in $NODES | ||
- | do | ||
- | if [ $ALIVE_NODES -gt $NODES_KEEP_ALIVE ] | ||
- | then | ||
- | NODES_TO_SHUTDOWN=" | ||
- | let ALIVE_NODES=ALIVE_NODES-1 | ||
- | else | ||
- | echo "Not halting $NODE because I need to keep $NODES_KEEP_ALIVE alive nodes" | ||
- | fi | ||
- | done | ||
- | |||
- | if [ " | ||
- | then | ||
- | echo -e " | ||
- | fi | ||
- | </ | ||
- | |||
- | This is the script for shutting down nodes. It uses **sentinelle** to send the **halt** command to the nodes, as suggested by the default configuration, | ||
- | |||
- | < | ||
- | | ||
- | </ | ||
- | |||
- | ==nodes_keepalive.sh== | ||
- | <code bash> | ||
- | #!/bin/bash | ||
- | set -e | ||
- | |||
- | # This script is intended to be ran every 5 minutes from the crontab | ||
- | # It ensures that # | ||
- | # are always alive and not shut down. It wakes up the nodes by submiting | ||
- | # a dummy job. It does not submit jobs if all the resources are used or | ||
- | # not available (cm_availability set to a low value) | ||
- | |||
- | NODES_KEEP_ALIVE=4 | ||
- | ADMIN_USER=bzeznik | ||
- | |||
- | # Locking | ||
- | LOCK=/ | ||
- | ### Locking for Debian (using lockfile-progs): | ||
- | # | ||
- | # | ||
- | # | ||
- | ### Locking for others (using sendmail lockfile) | ||
- | lockfile -r3 -l 43200 $LOCK | ||
- | |||
- | if [ " | ||
- | then | ||
- | |||
- | # Get the number of Alive nodes with at least 1 free resource | ||
- | | ||
- | |||
- | # Get the number of nodes in standby | ||
- | let AVAIL_DATE=`date +%s`+3600 | ||
- | | ||
- | |||
- | if [ $ALIVE_NODES -lt $NODES_KEEP_ALIVE ] | ||
- | then | ||
- | if [ $WAKEABLE_NODES -gt 0 ] | ||
- | then | ||
- | if [ $NODES_KEEP_ALIVE -gt $WAKEABLE_NODES ] | ||
- | then | ||
- | | ||
- | fi | ||
- | su - $ADMIN_USER -c " | ||
- | fi | ||
- | fi | ||
- | fi | ||
- | |||
- | ### Unlocking for Debian: | ||
- | #kill " | ||
- | # | ||
- | ### Unlocking for others: | ||
- | rm -f $LOCK | ||
- | </ | ||
- | |||
- | This script is responsible of waking up (power on) some nodes if there' | ||
- | |||
- | < | ||
- | */5 * * * * | ||
- | </ | ||
- | |||
- | ====== Admission rules ====== | ||
- | //OAR offers a powerful system letting you customize the way that jobs enter into queues (or are rejected from queues) called " | ||
- | |||
- | ===== Cluster routing depending on the name of the queue ===== | ||
- | <code perl> | ||
- | # Title : Cluster routing | ||
- | # Description : Send to the corresponding cluster | ||
- | my $cluster=$queue_name; | ||
- | if ($jobproperties ne "" | ||
- | | ||
- | } | ||
- | else{ | ||
- | | ||
- | } | ||
- | </ | ||
- | |||
- | ===== Cluster routing depending on the name of the submission host ===== | ||
- | <code perl> | ||
- | # Title : Cluster routing | ||
- | # Description : Send to the corresponding cluster and queue depending on the submission host | ||
- | use Sys:: | ||
- | my @h = split(' | ||
- | my $cluster; | ||
- | if ($h[0] eq " | ||
- | | ||
- | print " | ||
- | }else { | ||
- | | ||
- | print " | ||
- | } | ||
- | if ($queue_name eq " | ||
- | | ||
- | } | ||
- | if ($jobproperties ne "" | ||
- | | ||
- | } | ||
- | else{ | ||
- | | ||
- | } | ||
- | </ | ||
- | |||
- | ===== Best-effort automatic routing for some unprivileged users ===== | ||
- | |||
- | Description : Users that are not members of a given group are automatically directed to the besteffort queue | ||
- | |||
- | <code perl> | ||
- | my $GROUP=" | ||
- | | ||
- | if ($? != 0){ | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | push (@{$type_list}," | ||
- | if ($jobproperties ne "" | ||
- | | ||
- | } | ||
- | | ||
- | } | ||
- | </ | ||
- | |||
- | ===== Automatic licence assignment by job type ===== | ||
- | Description : Creates a **mathlab** job type that automatically assigns a mathlab licence | ||
- | |||
- | <code perl> | ||
- | if (grep(/ | ||
- | print " | ||
- | | ||
- | | ||
- | {' | ||
- | | ||
- | ' | ||
- | ); | ||
- | } | ||
- | } | ||
- | </ | ||
- | |||
- | ===== Walltime limit ===== | ||
- | Description : By default, an admission rule limits the walltime of interactiv jobs to 2 hours. This modified rule also set up a walltime for passive jobs. | ||
- | |||
- | <code perl> | ||
- | my $max_interactive_walltime = OAR:: | ||
- | # 7 days = 168 hours | ||
- | my $max_batch_walltime = OAR:: | ||
- | | ||
- | if (defined($mold-> | ||
- | if (($jobType eq " | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | } | ||
- | } | ||
- | } | ||
- | </ | ||
- | |||
- | Thanks to Nicolas Capit | ||
- | |||
- | ===== Cpu time limit ===== | ||
- | Description : Rejects jobs asking for more than a cpu*walltime limit. Current limit is set to 384 hours (16 days of cpu-time) | ||
- | |||
- | Note: This rule is for an SMP host on which we only have a " | ||
- | |||
- | <code perl> | ||
- | my $cpu_walltime=iolib:: | ||
- | my $msg=""; | ||
- | | ||
- | foreach my $r (@{$mold-> | ||
- | my $cpus=0; | ||
- | my $pnodes=0; | ||
- | # Catch the cpu and pnode resources | ||
- | foreach my $resource (@{$r-> | ||
- | if ($resource-> | ||
- | $cpus=$resource-> | ||
- | } | ||
- | if ($resource-> | ||
- | $pnodes=$resource-> | ||
- | } | ||
- | } | ||
- | # Calculate the number of cpus | ||
- | if ($pnodes == 0 && $cpus == 0) { $cpus=1; } | ||
- | if ($pnodes != 0) { | ||
- | if ($cpus == 0) { $cpus=$pnodes*2; | ||
- | else {$cpus=$pnodes*$cpus; | ||
- | } | ||
- | # Reject if walltime*cpus is too big | ||
- | if ($cpus * $mold-> | ||
- | $msg=" | ||
- | | ||
- | $msg.= $cpu_walltime / $cpus / 3600; | ||
- | $msg.= " hours."; | ||
- | die($msg); | ||
- | } | ||
- | } | ||
- | } | ||
- | </ | ||
- | |||
- | ===== Jobs number limit ===== | ||
- | Description : Limits the maximum number of simultaneous jobs allowed for each user on the cluster. Default is 50 jobs maximum per user.< | ||
- | It is possible to specify users having unlimited jobs number in // | ||
- | You can also configure the max_nb_jobs by setting your value in // | ||
- | Note : Array jobs are also limited by this rule. | ||
- | |||
- | <code perl> | ||
- | # Title : Limit the number of jobs per user to max_nb_jobs | ||
- | # Description : If user is not listed in unlimited users file, it checks if current number of jobs is well under $max_nb_jobs, | ||
- | my $unlimited=0; | ||
- | if (open(FILE, "< $ENV{home}/ | ||
- | while (< | ||
- | if (m/ | ||
- | | ||
- | } | ||
- | } | ||
- | | ||
- | } | ||
- | if ($unlimited == 0) { | ||
- | my $max_nb_jobs = 50; | ||
- | if (open(FILE, "< $ENV{home}/ | ||
- | while (< | ||
- | | ||
- | | ||
- | } | ||
- | | ||
- | } | ||
- | my $nb_jobs = $dbh-> | ||
- | qq{ select count(job_id) | ||
- | FROM jobs | ||
- | WHERE job_user = ? | ||
- | AND (state = \\' | ||
- | OR state = \\' | ||
- | OR state = \\' | ||
- | OR state = \\' | ||
- | OR state = \\' | ||
- | OR state = \\' | ||
- | OR state = \\' | ||
- | OR state = \\' | ||
- | OR state = \\' | ||
- | | ||
- | | ||
- | if (($nb_jobs + $array_job_nb) > $max_nb_jobs) { | ||
- | | ||
- | } | ||
- | } | ||
- | </ | ||
- | |||
- | ===== Project assignment ===== | ||
- | If you want to automatically assign a project to users submissions (replacing --project oarsub option), you simply have to set the **$project** variable to what you want inside an admission rule. | ||
- | |||
- | ===== Restricts access to a user list for a set of resources ===== | ||
- | For example, if you defined with the command " | ||
- | then you can enforce some properties constraints for some users. | ||
- | |||
- | <code perl> | ||
- | # Title : Restricts the use of resources for some users | ||
- | # Description : think to change the user list in this admission rule | ||
- | my %allowed_users = ( | ||
- | " | ||
- | " | ||
- | " | ||
- | ); | ||
- | if (!defined($allowed_users{$user}) or ($allowed_users{$user} == 0)){ | ||
- | if ($jobproperties ne "" | ||
- | $jobproperties = " | ||
- | }else{ | ||
- | $jobproperties = "model != ' | ||
- | } | ||
- | print(" | ||
- | } | ||
- | </ | ||
- | |||
- | ===== Limit the number of interactive jobs per user ===== | ||
- | <code perl> | ||
- | # Title : Limit the number of interactive jobs per user | ||
- | # Description : Limit the number of interactive jobs per user | ||
- | my $max_interactive_jobs = 2; | ||
- | if (($jobType eq " | ||
- | my $nb_jobs = $dbh-> | ||
- | FROM jobs | ||
- | WHERE | ||
- | job_user = ' | ||
- | reservation = ' | ||
- | job_type = ' | ||
- | (state = ' | ||
- | OR state = ' | ||
- | OR state = ' | ||
- | OR state = ' | ||
- | OR state = ' | ||
- | OR state = ' | ||
- | OR state = ' | ||
- | OR state = ' | ||
- | OR state = ' | ||
- | " | ||
- | if ($nb_jobs >= $max_interactive_jobs){ | ||
- | die(" | ||
- | } | ||
- | } | ||
- | </ | ||
- | |||
- | ===== Auto property restriction for specific user groups ===== | ||
- | |||
- | <code perl> | ||
- | # Title : Infiniband user restrictions | ||
- | # Description : put the ib property restriction depending of the groups of the user | ||
- | if ((! grep(/ | ||
- | print(" | ||
- | my ($user_name, | ||
- | my ($primary_group, | ||
- | my ($seiscope_name, | ||
- | my %seiscope_hash = map { $_ => 1 } split(/ | ||
- | my ($globalseis_name, | ||
- | my %globalseis_hash = map { $_ => 1 } split(/ | ||
- | my ($tohoku_name, | ||
- | my %tohoku_hash = map { $_ => 1 } split(/ | ||
- | my $sql_str = "ib = \\' | ||
- | if (($primary_group eq " | ||
- | print(" | ||
- | $sql_str .= " OR ib = \\' | ||
- | } | ||
- | if (($primary_group eq " | ||
- | print(" | ||
- | $sql_str .= " OR ib = \\' | ||
- | } | ||
- | if ($jobproperties ne "" | ||
- | $jobproperties = " | ||
- | }else{ | ||
- | $jobproperties = " | ||
- | } | ||
- | } | ||
- | </ | ||
- | |||
- | ===== Debug admission rule ===== | ||
- | When you play with admission rules, you can dump some data structures with YAML to have a readable output of the submission requests for example: | ||
- | |||
- | < | ||
- | print " | ||
- | print YAML:: | ||
- | </ | ||
- | ===== NUMA topology ===== | ||
- | See the [[# | ||
- | ===== Short, medium and long queues ===== | ||
- | |||
- | Description: | ||
- | |||
- | Queues creation: | ||
- | <code bash> | ||
- | | ||
- | | ||
- | | ||
- | </ | ||
- | |||
- | Rules: | ||
- | <code perl> | ||
- | | ||
- | Rule : 20 | ||
- | # Title: Automatic routing into the short queue | ||
- | # Description: | ||
- | my $max_walltime=" | ||
- | my $walltime=0; | ||
- | # Search for the max walltime of the moldable jobs | ||
- | | ||
- | | ||
- | if ($mold-> | ||
- | | ||
- | } | ||
- | } | ||
- | } | ||
- | # Put into the short queue if the job is short | ||
- | if ($walltime <= OAR:: | ||
- | && | ||
- | print " | ||
- | | ||
- | } | ||
- | | ||
- | Rule : 21 | ||
- | # Title: Automatic routing into the medium queue | ||
- | # Description: | ||
- | my $max_walltime=" | ||
- | my $min_walltime=" | ||
- | my $walltime=0; | ||
- | # Search for the max walltime of the moldable jobs | ||
- | | ||
- | | ||
- | if ($mold-> | ||
- | | ||
- | } | ||
- | } | ||
- | } | ||
- | # Put into the medium queue if the job is medium | ||
- | if ($walltime <= OAR:: | ||
- | && | ||
- | && | ||
- | print " | ||
- | | ||
- | } | ||
- | | ||
- | Rule : 22 | ||
- | # Title: Automatic routing into the long queue | ||
- | # Description: | ||
- | my $max_walltime=" | ||
- | my $min_walltime=" | ||
- | my $walltime=0; | ||
- | # Search for the max walltime of the moldable jobs | ||
- | | ||
- | | ||
- | if ($mold-> | ||
- | | ||
- | } | ||
- | } | ||
- | } | ||
- | # Put into the long queue if the job is long | ||
- | if ($walltime > OAR:: | ||
- | && | ||
- | print " | ||
- | | ||
- | } | ||
- | # Limit walltime of the " | ||
- | if ($queue_name eq " | ||
- | my $min_walltime=" | ||
- | my $max_walltime=" | ||
- | | ||
- | | ||
- | if ($mold-> | ||
- | print "\ | ||
- | | ||
- | | ||
- | } | ||
- | if ($mold-> | ||
- | print "\ | ||
- | | ||
- | | ||
- | } | ||
- | } | ||
- | } | ||
- | } | ||
- | | ||
- | Rule : 23 | ||
- | # Title : Core number restrictions | ||
- | # Description : Count the number of cores requested and reject if the queue does not allow this | ||
- | # Check the resources | ||
- | my $resources_def=$ref_resource_list-> | ||
- | my $n_core_per_cpus=6; | ||
- | my $n_cpu_per_node=2; | ||
- | my $core=0; | ||
- | my $cpu=0; | ||
- | my $node=0; | ||
- | | ||
- | | ||
- | if ($resource-> | ||
- | if ($resource-> | ||
- | if ($resource-> | ||
- | if ($resource-> | ||
- | } | ||
- | } | ||
- | # Now, calculate the number of total cores | ||
- | my $n_cores=0; | ||
- | if ($node == 0 && $cpu != 0 && $core == 0) { | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | } | ||
- | else { $n_cores = $node*$cpu*$core; | ||
- | print " | ||
- | |||
- | # Now the restrictions: | ||
- | my $short=132; # 132 cores = 11 noeuds | ||
- | my $medium=132; | ||
- | my $long=132; # 132 cores = 11 noeuds | ||
- | if (" | ||
- | print "\ | ||
- | | ||
- | | ||
- | } | ||
- | if (" | ||
- | print "\ | ||
- | | ||
- | | ||
- | } | ||
- | if (" | ||
- | print "\ | ||
- | | ||
- | | ||
- | } | ||
- | | ||
- | Rule : 24 | ||
- | # Title : Restriction des jobs long ou medium | ||
- | # Description : Les jobs long ou medium ne peuvent pas tourner sur les ressources ayant la propriété long=NO | ||
- | if (" | ||
- | if ($jobproperties ne "" | ||
- | | ||
- | | ||
- | | ||
- | } | ||
- | print " | ||
- | } | ||
- | </ | ||
- | |||
- | ===== Naming interactive jobs by default ===== | ||
- | Description: | ||
- | |||
- | <code perl> | ||
- | if (($jobType eq " | ||
- | $job_name = ' | ||
- | } | ||
- | </ | ||
- | |||
- | ===== Filter resources by job walltime ===== | ||
- | Description: | ||
- | |||
- | First we define the '' | ||
- | <code bash> | ||
- | oarproperty -a max_walltime | ||
- | for node in <set 1>; do | ||
- | oarnodesetting -h node -p max_walltime=< | ||
- | done | ||
- | for node in <set 2>; do | ||
- | oarnodesetting -h node -p max_walltime=< | ||
- | done | ||
- | ... | ||
- | </ | ||
- | |||
- | A node with the '' | ||
- | <code perl> | ||
- | if ((($jobType eq " | ||
- | foreach my $mold (@{$ref_resource_list}) { | ||
- | if (defined($mold-> | ||
- | foreach my $r (@{$mold-> | ||
- | my $resource = $r-> | ||
- | if ($resource =~ / | ||
- | my $max_walltime = $mold-> | ||
- | my $current_properties = $r-> | ||
- | |||
- | if ($current_properties ne "" | ||
- | $r-> | ||
- | } else { | ||
- | $r-> | ||
- | } | ||
- | } | ||
- | } | ||
- | } | ||
- | } | ||
- | } | ||
</ | </ | ||