Table of content:

OAR CHANGELOG

next version

  • Bugfix: Fix a regression (only for PostgreSQL) in Drawgantt introduced in the version 2.4.6 (thanks to Yann Genevois).

version 2.4.7:

  • Backport: Debug checkpoint feature with cosystem or deploy jobs.

version 2.4.6:

  • Fix the user variable used in oarsh. When using oarsh from the frontal, the variable OAR_USER was not defined in the environment, and make oarsh unable to read the user private key file.
  • Backport: Bugfix #13434: reservation were not handled correctly with the energy saving feature
  • Draw-Gantt: Do not display Absent node in the future that are in the stanby "sub-state".
  • Bugfix: spelling error (network_addess > network_address)

version 2.4.5:

  • backport: node_change_state: do not Suspect the first node of a job which was EXTERMINATED by Leon if the cpuset feature is configured (let do the job by the cpuset)
  • backport: OAREXEC: ESRF detected that sometime oarexec think that he notified the Almighty with it exit code but nothing was seen on the server. So try to resend the exit code until oarexec is killed.
  • backport: oar_Tools: add in notify_almighty a check on the print and on the close of the socket connected to Almighty.
  • backport: switch to /bin/bash as default (some scripts currently need bash).

version 2.4.4:

  • Bug 10999: memory leak into Hulot when used with postgresql. The leak has been minimized, but it is still there (DBD::Pg bug)
  • Almighty cleans ipcs used by oar on exit
  • Bugfix #10641 and #10999 : Hulot is automatically and periodically restarted
  • oar_resource_init: bad awk delimiter. There's a space and if the property is the first one then there is not a ','.
  • job suspend: oardo does not exist anymore (long long time ago). Replace it with oardodo.
  • Bugfix: oaradmin rules edition/add was broken
  • Bug #11599: missing pingchecker line into Leon
  • Bug #10567: enabling to bypass window mechanism of hulot (Backport from 2.5)
  • Bug #10568: Wake up timeout changing with the number of nodes (Backport from 2.5)
  • oarsub: when an admission rule died micheline returns an integer and not an array ref. Now oarsub ends nicely.
  • Monika: add a link on each jobid on the node display area.
  • sshd_config: with nodes with a lot of core, 10 // connections could be too few

version 2.4.3:

  • Hulot module now has customizable keepalive feature (backport from 2.5)
  • Added a hook to launch a healing command when nodes are suspected (backport from 2.5)
  • Bugfix #9995: oaraccouting script doesn't freeze anymore when db is unreachable.
  • Bugfix #9990: prevent from inserting jobs with invalid username (like an empty username)
  • Oarnodecheck improvements: node is not checked if a job is already running
  • New oaradmin option: --auto-offset
  • Feature request #10565: add the possibility to check the aliveness of the nodes of a job at the end of this one (pingchecker)

version 2.4.2:

  • New "Hulot" module for intelligent and configurable energy saving
  • Bug #9906: fix bad optimization in the gantt lib (so bad scheduling

version 2.4.1:

  • Bug #9038: Security flaw in oarsub --notify option
  • Bug #9601: Cosystem jobs are no more killed when a resource is set to Absent
  • Fixed some packaging bugs
  • API bug fixes in job submission parsing
  • Added standby info into `oarnodes -s` and available_upto info into /resources uri of the API
  • Bug Grid'5000 #2687 Fix possible crashes of the scheduler.
  • Bug fix: with MySQL DB Finaud suspected resources which are not of the "default" type.
  • Signed debian packages (install oar-keyring package)

version 2.4.0:

  • Fix bug in oarnodesetting command generated by oar_resources_init (detect_resources)
  • Added a --state option to oarstat to only get the status of specified jobs (optimized query, to allow scripting)
  • Added a REST API for OAR and OARGRID
  • Added JSON support into oarnodes, oarstat and oarsub
  • New Makefile adapted to build packages as non-root user
  • add the command "oar_resources_init" to easily detect and initialize the whole resources of a cluster.
  • "oaradmin version" : now retrieve the most recent database schema number
  • Fix rights on the "schema" table in postgresql.
  • Bug #7509: fix bug in add_micheline_subjob for array jobs + jobtypes
  • Ctrl-C was not working anymore in oarsub. It seems that the signal handler does not handle the previous syntax ($SIG = 'qdel')
  • Fix bug in oarsh with the "-l" option
  • Bug #7487: bad initialisation of the gnatt for the container jobs.
  • Scheduler: move the "delete_unnecessary_subtrees" directly into "find_first_hole". Thus this is possible to query a job like:

    oarsub -I -l nodes=1/core=1+nodes=4/core=2
    (no hard separation between each group)
    

    For the same behaviour as before, you can query: ~ oarsub -I -l {prop=1}/nodes=1/core=1+{prop=2}/nodes=4/core=2

  • Bug #7634: test if the resource property value is effectively defined otherwise print a ''

  • Optional script to take into account cpu/core topology of the nodes at boot time (to activate inside oarnodesetting_ssh)
  • Bug #7174: Cleaned default PATH from "./" into oardodo
  • Bug #7674: remove the computation of the scheduler_priority field for besteffort jobs from the asynchronous OAR part. Now the value is set when the jobs are turned into toLaunch state and in Error/Terminated.
  • Bug #7691: add --array and --array-param-file options parsing into the submitted script. Fix also some parsing errors.
  • Bug #7962: enable resource property "cm_availability" to be manipulated by the oarnodesetting command
  • Added the (standby) information to a node state in oarnodes when it's state is Absent and cm_availability != 0
  • Changed the name of cm_availability to available_upto which is more relevant
  • add a --maintenance option to oarnodesetting that sets the state of a resource to Absent and its available_upto to 0 if maintenance is on and resets previous values if maintenance is off.
  • added a --signal option to oardel that allow a user to send a signal to one of his jobs
  • added a name field in the schema table that will refer to the OAR version name
  • added a table containing scheduler name, script and description
  • Bug #8559: Almighty: Moved OAREXEC_XXXX management code out of the queue for immediate action, to prevent potential problems in case of scheduler timeouts.
  • oarnodes, oarstat and the REST API are no more making retry connections to the database in case of failure, but exit with an error instead. The retry behavior is left for daemons.
  • improved packaging (try to install files in more standard places)
  • improved init script for Almighty (into deb and rpm packages)
  • fixed performance issue on oarstat (array_id index missing)
  • fixed performance issue (job_id index missing in event_log table)
  • fixed a performance issue at job submission (optimized a query and added an index on challenges table) decisions).

version 2.3.5:

  • Bug #8139: Drawgantt nil error (Add condition to test the presence of nil value in resources table.)
  • Bug #8416: When a the automatic halt/wakeup feature is enabled then there was a problem to determine idle nodes.
  • Debug a mis-initialization of the Gantt with running jobs in the metascheduler (concurrency access to PG database)

version 2.3.4:

  • add the command "oar_resources_init" to easily detect and initialize the whole resources of a cluster.
  • "oaradmin version" : now retrieve the most recent database schema number
  • Fix rights on the "schema" table in postgresql.
  • Bug #7509: fix bug in add_micheline_subjob for array jobs + jobtypes
  • Ctrl-C was not working anymore in oarsub. It seems that the signal handler does not handle the previous syntax ($SIG = 'qdel')
  • Bug #7487: bad initialisation of the gnatt for the container jobs.
  • Fix bug in oarsh with the "-l" option
  • Bug #7634: test if the resource property value is effectively defined otherwise print a ''
  • Bug #7674: remove the computation of the scheduler_priority field for besteffort jobs from the asynchronous OAR part. Now the value is set when the jobs are turned into toLaunch state and in Error/Terminated.
  • Bug #7691: add --array and --array-param-file options parsing into the submitted script. Fix also some parsing errors.
  • Bug #7962: enable resource property "cm_availability" to be manipulated by the oarnodesetting command

version 2.3.3:

  • Fix default admission rules: case unsensitive check for properties used in oarsub
  • Add new oaradmin subcommand : oaradmin conf. Useful to edit conf files and keep changes in a Subversion repository.
  • Kill correctly each taktuk command children in case of a timeout.
  • New feature: array jobs (option --array) (on oarsub, oarstat oardel, oarhold and oarresume) and file-based parametric array jobs (oarsub --array-param-file) /! in this version the DB scheme has changed. If you want to upgrade your installation from a previous 2.3 release then you have to execute in your database one of these SQL script (stop OAR before):

    mysql:
        DB/mysql_structure_upgrade_2.3.1-2.3.3.sql
    
    
    postgres:
        DB/pg_structure_upgrade_2.3.1-2.3.3.sql
    

version 2.3.2:

  • Change scheduler timeout implementation to schedule the maximum of jobs.
  • Bug #5879: do not show initial_request in oarstat when it is not a job of the user who launched the oarstat command (oar or root).
  • Add a --event option to oarnodes and oarstat to display events recorded for a job or node
  • Display reserved resources for a validated waiting reservation, with a hint in their state
  • Fix oarproperty: property names are lowercase
  • Fix OAR_JOB_PROPERTIES_FILE: do not display system properties
  • Add a new user command: oarprint which allow to pretty print resource properties of a job
  • Debug temporary job UID feature
  • Add 'kill -9' on subprocesses that reached a timeout (avoid Perl to wait something)
  • desktop computing feature is now available again. (ex: oarsub -t desktop_computing date)
  • Add versioning feature for admission rules with Subversion

version 2.3.1:

  • Add new oarmonitor command. This will permit to monitor OAR jobs on compute nodes.
  • Remove sudo dependency and replace it by the commands "oardo" and "oardodo".
  • Add possibility to create a temporary user for each jobs on compute nodes. So you can perform very strong restrictions for each job (ex: bandwidth restrictions with iptable, memory management, ... everything that can be handled with a user id)
  • Debian packaging: Run OAR specific sshd with root privileges (under heavy load, kernel may be more responsive for root processes...)
  • Remove ALLOWED_NETWORKS tag in oar.conf (added more complexeity than resolving problems)
  • /! change database scheme for the field exit_code in the table jobs. Now oarstat exit_code line reflects the right exit code of the user passive job (before, even when the user script was not launched the exit_code was 0 which was BAD)
  • /! add DB field initial_request in the table jobs that stores the oarsub line of the user
  • Feature Request #4868: Add a parameter to specify what the "nodes" resource is a synomym for. Network_address must be seen as an internal data and not used.
  • Scheduler: add timeout for each job == 1/4 of the remaining scheduler timeout.
  • Bug #4866: now the whole node is Suspected instead of just the par where there is no job onto. So it is possible to have a job on Suspected nodes.
  • Add job walltime (in seconds) in parameter of prologue and epilogue on compute nodes.
  • oarnodes does not show system properties anymore.
  • New feature: container job type now allows to submit inner jobs for a scheduling within the container job
  • Monika refactoring and now in the oar packaging.
  • Added a table schema in the db with the field version, reprensenting the version of the db schema.
  • Added a field DB_PORT in the oar config file.
  • Bug #5518: add right initialization of the job user name.
  • Add new oaradmin command. This will permit to create resources and manage admission rules more easily.
  • Bug #5692: change source code into a right Perl 5.10 syntax.

version 2.2.12:

  • Bug #5239: fix the bug if there are spaces into job name or project
  • Fix the bug in Iolib if DEAD_SWITCH_TIME >0
  • Fix a bug in bipbip when calling the cpuset_manager to clean jobs in error
  • Bug #5469: fix the bug with reservations and Dead resources
  • Bug #5535: checks for reservations made at a same time was wrong.
  • New feature: local checks on nodes can be plugged in the oarnodecheck mechanism. Results can be asynchronously checked from the server (taktuk ping checker)
  • Add 2 new tables to keep track of the scheduling decisions (gantt_jobs_predictions_log and gantt_jobs_resources_log). This will help debugging scheduling troubles (see SCHEDULER_LOG_DECISIONS in oar.conf)
  • Now reservations are scheduled only once (at submission time). Resources allocated to a reservations are definitively set once the validated is done and won't change in next scheduler's pass.
  • Fix DrawGantt to not display besteffort jobs in the future which is meaningless.

version 2.2.11:

  • Fix Debian package dependency on a CGI web server.
  • Fix little bug: remove notification (scheduled start time) for Interactive reservation.
  • Fix bug in reservation: take care of the SCHEDULER_JOB_SECURITY_TIME for reservations to check.
  • Fix bug: add a lock around the section which creates and feed the OAR cpuset.
  • Taktuk command line API has changed (we need taktuk >= 3.6).
  • Fix extra ' in the name of output files when using a job name.
  • Bug #4740: open the file in oarsub with user privileges (-S option)
  • Bug #4787: check if the remote socket is defined (problem of timing with nmap)
  • Feature Request #4874: check system names when renaming properties
  • DrawGantt can export charts to be reused to build a global multi-OAR view (e.g. DrawGridGantt).
  • Bug #4990: DrawGantt now uses the database localtime as its time reference.

version 2.2.10:

  • Job dependencies: if the required jobs do not have an exit code == 0 and in the state Terminated then the schedulers refuse to schedule this job.
  • Add the possibility to disable the halt command on nodes with cm_availability value.
  • Enhance oarsub "-S" option (more #OAR parsed).
  • Add the possibility to use oarsh without configuring the CPUSETs (can be useful for users that don't want to configure there ssh keys)

version 2.2.9:

  • Bug 4225: Dump only 1 data structure when using -X or -Y or -D.
  • Bug fix in Finishing sequence (Suspect right nodes).

version 2.2.8:

  • Bug 4159: remove unneeded Dump print from oarstat.
  • Bug 4158: replace XML::Simple module by XML::Dumper one.
  • Bug fix for reservation (recalculate the right walltime).
  • Print job dependencies in oarstat.

version 2.2.7:

version 2.2.11:

  • Fix Debian package dependency on a CGI web server.
  • Fix little bug: remove notification (scheduled start time) for Interactive reservation.
  • Fix bug in reservation: take care of the SCHEDULER_JOB_SECURITY_TIME for reservations to check.
  • Fix bug: add a lock around the section which creates and feed the OAR cpuset.
  • Taktuk command line API has changed (we need taktuk >= 3.6).
  • Fix extra ' in the name of output files when using a job name.
  • Bug #4740: open the file in oarsub with user privileges (-S option)
  • Bug #4787: check if the remote socket is defined (problem of timing with nmap)
  • Feature Request #4874: check system names when renaming properties
  • DrawGantt can export charts to be reused to build a global multi-OAR view (e.g. DrawGridGantt).
  • Bug #4990: DrawGantt now uses the database localtime as its time reference.

version 2.2.10:

  • Job dependencies: if the required jobs do not have an exit code == 0 and in the state Terminated then the schedulers refuse to schedule this job.
  • Add the possibility to disable the halt command on nodes with cm_availability value.
  • Enhance oarsub "-S" option (more #OAR parsed).
  • Add the possibility to use oarsh without configuring the CPUSETs (can be useful for users that don't want to configure there ssh keys)

version 2.2.9:

  • Bug 4225: Dump only 1 data structure when using -X or -Y or -D.
  • Bug fix in Finishing sequence (Suspect right nodes).

version 2.2.8:

  • Bug 4159: remove unneeded Dump print from oarstat.
  • Bug 4158: replace XML::Simple module by XML::Dumper one.
  • Bug fix for reservation (recalculate the right walltime).
  • Print job dependencies in oarstat.

version 2.2.7:

  • Bug 4106: fix oarsh and oarcp issue with some options (erroneous leading space).
  • Bug 4125: remove exit_code data when it is not relevant.
  • Fix potential bug when changing asynchronously the state of the jobs into "Terminated" or "Error".

version 2.2.6:

  • Bug fix: job types was not sent to cpuset manager script anymore. ~ (border effect from bug 4069 resolution)

version 2.2.5:

  • Bug fix: remove user command when oar execute the epilogue script on the nodes.
  • Clean debug and mail messages format.
  • Remove bad oarsub syntax from oarsub doc.
  • Debug xauth path.
  • bug 3995: set project correctly when resubmitting a job
  • debug 'bash -c' on Fedora
  • bug 4069: reservations with CPUSET_ERROR (remove bad hosts and continue with a right integrity in the database)
  • bug 4044: fix free resources query for reservation (get the nearest hole from the beginning of the reservation)
  • bug 4013: now Dead, Suspected and Absent resources have different colors in drawgantt with a popup on them.

version 2.2.4:

  • Redirect third party commands into oar.log (easier to debug).
  • Add user info into drawgantt interface.
  • Some bug fixes.

version 2.2.3:

  • Debug prologue and epilogue when oarexec receives a signal.

version 2.2.2:

  • Switch nice value of the user processes into 0 in oarsh_shell (in case of sshd was launched with a different priority).
  • debug taktuk zombies in pingchecker and oar_Tools

version 2.2.1:

  • install the "allow_clasic_ssh" feature by default
  • debug DB installer

version 2.2:

  • oar_server_proepilogue.pl: can be used for server prologue and epilogue to authorize users to access to nodes that are completely allocated by OAR. If the whole node is assigned then it kills all jobs from the user if all cpus are assigned.
  • the same thing can be done with cpuset_manager_PAM.pl as the script used to configure the cpuset. More efficent if cpusets are configured.
  • debug cm_availability feature to switch on and off nodes automatically depending on waiting jobs.
  • reservations now take care of cm_availability field

version 2.1.0:

  • add "oarcp" command to help the users to copy files using oarsh.
  • add sudo configuration to deal with bash. Now oarsub and oarsh have the same behaviour as ssh (the bash configuration files are loaded correctly)
  • bug fix in drawgantt (loose jobs after submission of a moldable one)
  • add SCHEDULER_RESOURCES_ALWAYS_ASSIGNED_TYPE into oar.conf. Thus admin can add some resources for each jobs (like frontale node)
  • add possibility to use taktuk to check the aliveness of the nodes
  • %jobid% is now replaced in stdout and stderr file names by the effective job id
  • change interface to shu down or wake up nodes automatically (now the node list is read on STDIN)
  • add OARSUB_FORCE_JOB_KEY in oar.conf. It says to create a job ssh key by default for each job.
  • %jobid% is now replaced in the ssh job key name (oarsub -k ...).
  • add NODE_FILE_DB_FIELD_DISTINCT_VALUES in oar.conf that enables the admin to configure the generated containt of the OAR_NODE_FILE
  • change ssh job key oarsub options behaviour
  • add options "--reinitialize" and "--delete-before" to the oaraccounting command
  • cpuset are now stored in /dev/cpuset/oar
  • debian packaging: configure and launch a specific sshd for the user oar
  • use a file descriptor to send the node list --> able to handle a very large amount of nodes
  • every config files are now in /etc/oar/
  • oardel can add a besteffort type to jobs and vis versa

version 2.0.2:

  • add warnings and exit code to oarnodesetting when there is a bad node name or resource number
  • change package version
  • change default behaviour for the cpuset_manager.pl (more portable)
  • enable a user to use the same ssh key for several jobs (at his own risk!)
  • add node hostnames in oarstat -f
  • add --accounting and -u options in oarstat
  • bug fix on index fields in the database (syncro): bug 2020
  • bug fix about server pro/epilogue: bug 2022
  • change the default output of oarstat. Now it is usable: bug 1875
  • remove keys in authorized_keys of oar (on the nodes) that do not correspond to an active cpuset (clean after a reboot)
  • reread oar.conf after each database connection tries
  • add support for X11 forwarding in oarsub -I and -C
  • debug mysql initialization script in debian package
  • add a variable in oarsh for the default options of ssh to use (more useful to change if the ssh version installed does not handle one of these options)
  • read oar.conf in oarsh (so admin can more easily change options in this script)
  • add support for X11 forwarding via oarsh
  • change variable for oarsh: OARSH_JOB_ID --> OAR_JOB_ID

version 2.0.0:

  • Now, with the ability to declare any type of resources like licences, VLAN, IP range, computing resources must have the type default and a network_address not null.
  • Possibility to declare associated resources like licences, IP ranges, ... and to reserve them like others.
  • Now you can connect to your jobs (not only for reservations).
  • Add "cosystem" job type (execute and do nothing for these jobs).
  • New scheduler : "oar_sched_gantt_with_timesharing". You can specify jobs with the type "timesharing" that indicates that this scheduler can launch more than 1 job on a resource at a time. It is possible to restrict this feature with words "user and name". For example, '-t timesharing=user,name' indicates that only a job from the same user with the same name can be launched in the same time than it.
  • Add PostGresSQL support. So there is a choice to make between MySQL and PostgresSQL.
  • New approach for the scheduling : administrators have to insert into the databases descriptions about resources and not nodes. Resources have a network address (physical node) and properties. For example, if you have dual-processor, then you can create 2 different resources with the same natwork address but with 2 different processor names.
  • The scheduler can now handle resource properties in a hierarchical manner. Thus, for example, you can do "oarsub -l /switch=1/cpu=5" which submit a job on 5 processors on the same switch.
  • Add a signal handler in oarexec and propagate this signal to the user process.
  • Support '#OAR -p ...' options in user script.
  • Add in oar.conf: ~ - DB_BASE_PASSWD_RO : for security issues, it is possible to execute request with parts specified by users with a read only account (like "-p" option). - OARSUB_DEFAULT_RESOURCES : when nothing is specified with the oarsub command then OAR takes this default resource description. - OAREXEC_DEBUG_MODE : turn on or off debug mode in oarexec (create /tmp/oar/oar.log on nodes). - FINAUD_FREQUENCY : indicates the frequency when OAR launchs Finaud (search dead nodes). - SCHEDULER_TIMEOUT : indicates to the scheduler the amount of time after what it must end itself. - SCHEDULER_JOB_SECURITY_TIME : time between each job. - DEAD_SWITCH_TIME : after this time Absent and Suspected resources are turned on the Dead state. - PROLOGUE_EPILOGUE_TIMEOUT : the possibility to specify a different timeout for prologue and epilogue (PROLOGUE_EPILOGUE_TIMEOUT). - PROLOGUE_EXEC_FILE : you can specify the path of the prologue script executed on nodes. - EPILOGUE_EXEC_FILE : you can specify the path of the epilogue script executed on nodes. - GENERIC_COMMAND : a specific script may be used instead of ping to check aliveness of nodes. The script must return bad nodes on STDERR (1 line for a bad node and it must have exactly the same name that OAR has given in argument of the command). - JOBDEL_SOFTWALLTIME : time after a normal frag that the system waits to retry to frag the job. - JOBDEL_WALLTIME : time after a normal frag that the system waits before to delete the job arbitrary and suspects nodes. - LOG_FILE : specify the path of OAR log file (default : /var/log/oar.log).

  • Add wait() in pingchecker to avoid zombies.

  • Better code modularization.
  • Remove node install part to launch jobs. So it is easier to upgrade from one version to an other (oarnodesetting must already be installed on each nodes if we want to use it).
  • Users can specify a method to be notified (mail or script).
  • Add cpuset support
  • Add prologue and epilogue script to be executed on the OAR server before and after launching a job.
  • Add dependancy support between jobs ("-a" option in oarsub).
  • In oarsub you can specify the launching directory ("-d" option).
  • In oarsub you can specify a job name ("-n" option).
  • In oarsub you can specify stdout and stderr file names.
  • User can resubmit a job (option "--resubmit" in oarsub).
  • It is possible to specify a read only database account and it will be used to evaluate SQL properties given by the user with the oarsub command (more scecure).
  • Add possibility to order assigned resources with their properties by the scheduler. So you can privilege some resources than others (SCHEDULER_RESOURCE_ORDER tag in oar.conf file)
  • a command can be specified to switch off idle nodes (SCHEDULER_NODE_MANAGER_SLEEP_CMD, SCHEDULER_NODE_MANAGER_IDLE_TIME, SCHEDULER_NODE_MANAGER_SLEEP_TIME in oar.conf)
  • a command can be specified to switch on nodes in the Absent state according to the resource property cm_availability in the table resources (SCHEDULER_NODE_MANAGER_WAKE_UP_CMD in oar.conf).
  • if a job goes in Error state and this is not its fault then OAR will resubmit this one.