Note
For information about OAR release versions with changelogs along with errata and known bugs, see https://oar.imag.fr/oar_versions. This page only shows changelogs.
OAR CHANGELOG¶
version 2.5.9:¶
- [scheduler] add the SCHEDULER_RESOURCE_ORDER_ADV_RESERVATIONS option, so that the scheduling of advance reservations is not impacted by the current state of the resources (e.g. nodes in standby, current besteffort jobs)
- [admission rules] add an admission rule to restrict advance reservation inner jobs to use container jobs that are advance reservations as well
- [schedulers] fix issues with the scheduling of a inner job before its container
- [schedulers] waiting inner job in container that vanished are set to error
- [oarexec] add an option to have inner jobs killed along with their container
- [oarexec] do not run inner jobs before their container is already running
- [oarwalltime] make walltime change respect a possibly defined job deadline
- [oarwalltime] add an option to disable the walltime reduction
- [oarwalltime] fix oarwalltime the per queue configuration
- [oarsub/scheduler] fix a bug with the recent Perl max recursion depth limit
- [drawgantt] show the timezone in the dates
- [oarexec] fix oarsub shell termination when the job is killed
- [database] add an index to the resource_log table
- [oar_resource_add] add support for reparing the resource properties
- [oarexec] add support to disable the auto-repair of suspected nodes
- [job_resource_manager/oarsh] add the COMPUTE_THREAD_SIBLINGS option, to let OAR automatically set the HT thread siblings if not set in the resources hierarchy with a thread resource, or in the resource cpuset field
- [job_resource_manager] rework code, support more cgroup subsystems
- [oarsh] add support to let oarsh create a sub cgroup with either a subset of the cpuset or of the devices in the shell opened on the node. See an example of usage with GNU Parallel in the website documentation
- [oarsub] add the OARSUB_NODE_EXEC_FILE configuration to run a custom command on the head node of the job before the job shell
- [oarsub] make oarsub accept the submission of a noop job with no script
- [oarstat] fix JSON/YAML/XML output when no job to display
- [oarstat] oarstat -j can now use the OAR_JOB_ID environment variable
- [oarstat] fix YAML display with the YAML::Syck library
version 2.5.8:¶
- [job_resource_manager] manage nvidia gpu with the cgroup devices
- [oarwalltime] add functionality to allow changing the walltime of a running job. See the oarwalltime command and oar.conf
- [scheduler] fix the besteffort + deploy VS adv. reservation case
- [scheduler] add the state=permissive job type, allowing jobs to be scheduled and run (if noop or cosystem as well only) regardless of the aliveness of resources
- [oarsub/scheduler] fix warning “Use of uninitialized value $resource_value”
- [oarsub] fix unknown error message in case of job termination + typos
- [oarnodesetting] do not kill noop jobs using by resources changed to Dead or Absent
- [finaud] fix: make pingchecker run only on resources of type default
- [oar-database] fix the privilegies of oar’s read only user in PostgreSQL in new installation. For existing database, the following command apply the fix: oar-database –fix-ro-user-priv …
- [api] some improvement in the Apache configuration and tests
- [api] added POST /media/force to overwrite a file
- [finaud] bugfix: make pingchecker run only on resources of type default
- [api] hardening on the syntax of the URIs (should not impact good URIs!)
- [drawgantt-svg] add a mark next to the label of the resources pointed by the mouse
- [drawgantt-svg] fix possible SQL injection with the filters
- [drawgantt-svg] improve the label_display_regex text replacement mechanism
- [drawgantt-svg/oarstat] fix past and current moldable jobs display
- [drawgantt-svg] fix drain display
- [drawgantt-svg] fix nav_filter with only one option
- [oar.conf] update SSH options to the one of OpenSSH 7.6p1
- [oar-database] support –db-is-local (UNIX socket) for MySQL (MariaDB)
- [oar-node] fix warnings with OAR’s sshd configuration
- [oar-resource-add] fix the auto-offset option
- [oar-resource-add] add support for creating GPU resources
- [oar-resource-add] add support to handle the CPU and GPU topologies
version 2.5.7:¶
This version mainly brings a security fix for the oarsh command. It is highly recommended to upgrade (server, frontend(s) and nodes), since all previous versions of OAR are affected.
- [oarsh] fix a security hole when passing option to OpenSSH. See oar.conf to adapt settings to your setup, if required (OARSH_* variables)
- [oarsh] dropped the mechanism to select whether to use oarsh or fall back to ssh, given a list of hostname patterns
- [oarsub] fix the job-key information of the manual page
- [oarsub] handle cases where trailing spaces were breaking oarsub script directives
- [api] added an example of Apache configuration for the authentication
- [documentation] improve the SSH keys setup explanations for OAR installation
version 2.5.6:¶
- [oar.conf] add the SCHEDULER_MIN_TIME_BETWEEN_2_CALLS option
- [metascheduler] fix a bug with advance reservations when predicted resources
- must be recomputed
- [metascheduler] fix a bug with advance reservations with standby start job
- types (noop/cosystem/deploy=standby)
- [oar-node init] create /var/run/sshd if needed
- [oarsub] fix several bugs with the array job submission
- [oarstat] allow using Perl’s YAML::Syck for a quicker YAML output
- [oarstat] improve performance and information for the –gantt option
- [oarstat] prettier print of job events
- [oarnodesetting] optimize grouped operations on resources and add a lock
- around property changes
- [oaradmissionrules] fix bug: changing a rule priority does not enable it
- [oar_resources_init] fix node read from standard input
- [oarnodecheck] use /var/lib/oar instead of /etc/oar for working files
- [logs] several cosmetic fixes
- [api] add colmet extraction function
- [api] proposed apache configuration now uses a virtual host on port 6668
- [drawgantt-svg] fix the possibly very long delay when zooming
- [drawgantt-svg] add forecast buttons + relative start/stop url arguments
- [drawgantt-svg] rework configuration for the default display
- [drawgantt-svg] allow displaying resources of type != default
- [drawgantt-svg] improve support for use as a widget in custom HTML pages
- (multisite, etc)
- [monika] fix bugs with recent Perl/Perl CGI versions
- [monika] fix harmless bug in configuration
- [visualization] remove overlib.js (license issue), this breaks the legacy
- drawgantt (which is not supported anymore)
- [misc] remove some old development codes from sources
- [misc] fix inconsistent copyrights and licenses
- [doc] update the installation documentation
version 2.5.5:¶
- [iolib] fix deadlock with TRUNCATE in postgresql
- [almighty] add SCHEDULER_MIN_TIME_BETWEEN_2_CALLS:
- the scheduler is launched at max every t seconds (t=5 by default), this avoids the scheduler to cause starvation with regard to the other modules
- [scheduler] fix some memory leaks.
- [scheduler] add a cache to the resources tree computation: improve
- the scheduler speed by reducing the number of SQL queries.
- [scheduler] backport the expire/postpone/deadline job types.
- [scheduler] rename the placeholder job types: placeholder/allowed.
- [scheduler] fix timesharing (adv reservation and *_placeholder schedulers).
- [scheduler] allows noop/cosystem/deploy jobs to start on resources in
- standby, no wake-up is triggered (requires activating energy saving).
- [oarsub] use jobkey (-k) if the OAR_JOB_KEY_FILE env variable is set.
- [oarstat] fix accounting display
- [oar_resources_init] fix HyperThreading bug + improve CLI
- [oar_resources_add] make HyperThreading optional + fix long options + make
- nicer warning outputs for auto-offset
- [admission rules] rewrite the job type check rule
- [admission rules] fix oaradmissionrules bug with MySQL when modifying a rule
- [oar-node] fix pid in init script.
- [api] some optimizations + rework authentication configuration (apache).
- [api][drawgantt-svg][monika] fix apache config (apache 2.4).
- [drawgantt-svg] new version with aggreation of resources and more.
- [monika] add thread to the hidden properties.
- [api] fastcgi config now using suexec
- [api] now using apache environment variables when headers are not available
- [api] optimization of /jobs query response time (especially efficient for
- mysql based installations)
- [api] security fix: HTML outputs which did not break on errors
version 2.5.4:¶
- [api] Implemented GET /resources/<property>/jobs to get jobs running on
resources, grouped by a given property.
- [api] Implemented HTTP_X_API_PATH_PREFIX header variable to prefix all
returned URIs.
[api] Added GET /jobs/<id>/details support.
[api] Implemented the ability to get a set of jobs at once with GET /jobs?ids=<id1>:<id2>:<id3>:…
[api] BUGFIX: stderr and stdout where reversed.
[api] BUGFIX: memory leak in the API when used with FastCGI.
[api] Rewritten/commented apache config file.
[kamelot] BUGFIX: fix hierarchies manipulation (remove toplevel resource).
[accounting] Fixed a memory leak and a rare case of bad consumption count.
- [oar.conf] Replace the MAX_CONCURRENT_JOB_TERMINATIONS option by
MAX_CONCURRENT_JOBS_STARTING_OR_TERMINATING
- [almighty] Rewrote the handling of starting and finishing jobs: limit
bipbip processes to MAX_CONCURRENT_JOBS_STARTING_OR_TERMINATING to avoid overloading the server.
- [oarexec] Introduced BASH_ENV=~oar/.batch_job_bashrc for batch jobs
Batch jobs with bash shell have some difficulties to source the right bash scripts when launching. Now we set BASH_ENV=~oar/.batch_job_bashrc before launching the user bash process so we can handle which script must be sourced. By default we source ~/.bashrc.
[commands] Exit immediately on wrong arguments.
- [oarsh] Propagate OAR shell environment variables:
The users have access to the same OAR environment variables when connecting on all the job nodes with oarsh
[job_uid] Removed job uid feature (not used).
[job_resource_manager] Use fstrim (for SSD) when cleaning files.
- [deploy] Do not check the nodes when ACTIVATE_PINGCHECKER_AT_JOB_END is on
and the job is of the deploy type (bug #17128).
- [judas] Disabled sending log by email on errors as this could generate too
many mails.
- [noop] Added the ‘noop’ job type. If specified, nothing is done on computing
nodes. The job just reserves the resources for the specified walltime.
- [quotas] Added the possibility to make quotas on:
- the number of used resources
- the number of running jobs
- the result of resources X hours of running jobs
[runner] Added runner bipbip processes in the bipbip_laucher in Almighty.
- [database] Replaced field “maintenance” by “drain”.
The administrator can disable resources without killing current jobs by:
oarnodesetting -h n12 -p drain=YESor:
oarnodesetting --drain -h n12
WARNING: any admission rule using the “maintenance” keyword must be adapted to use the “drain” keyword. [oar_resources_init] Added support for SMT (hyperthreading)
- [cpuset] The cpuset resources filed is now a varchar.
It is now possible to specify several cpu id in the cpuset field as needed in some case where SMT is enabled on nodes, e.g.:
1+4+8
- [oarsub] Added a filter for notifications
- It now is possible to specify which TAGs must trigger motifications::
oarsub –notify “[END,ERROR]mail:name@domain.com” -I
- [admission rules] Added priority to rules that allows to manage more easily
the rules execution order.
- [admission rules] Added a enable/disable flag to rules to allow activating
or deactivating rules without having to comment the code.
- [oaradmin] The oaradmin rules command is now disabled since it does not
handle priority and enable flags.
[oaradmin] The oaradmin conf command is disabled.
- [oar_resources_add] Added the oar_resources_add command to help adding
resources and replace the oaradmin resources command.
- [oaradmissionrules] oaradminssionrules is a new command to manage the
oaradmission rules.
[oarnodesetting] Removed dependnency to oarnodes.
[drawgantt-svg] Various bugfixes and improvements
- [metasched] If a besteffort job has a checkpoint duration defined
(oarsub –checkpoint) then OAR tries to checkpoint it before killing it. It is possible to define a limit of the checkpoint duration with an admission rule ($checkpoint variable).
[drawgantt] Drawgantt is not now deprecated (and not shipped with packages)
[misc] OAR packaged components do not require Ruby anymore.
[oaraccounting] Fix bug reported in Debian tracker #678976
[sources] Clean-up some used or unrelevant files/codes
- [scheduler] change default schedulers to quota
The default scheduler of the queues default, admin and besteffort is now oar_sched_gantt_with_timesharing_and_fairsharing_and_quotas. The configuration file /etc/oar/scheduler_quotas.conf contains no quota enforcement so the behaviour remains the same as before.
version 2.5.3:¶
Add the “Name” field on the main Monika page. This is easier for the users to find there jobs.
Add MAX_CONCURRENT_JOB_TERMINATIONS into the oar.conf ofthe master. This limits the number of concurrent processes launched by the Almighty when the the jobs finish.
Bug fix in ssh key feature in oarsub.
Added –compact, -c option to oarstat (compact view or array jobs).
Improvements of the API: media upload from html forms, listing of files, security fixes, add of new configuration options, listing of the scheduled nodes into jobs, fixed bad reinitialization of the limit parameter, stress_factor, accounting… See OAR-DOCUMENTATION-API-USER for more informations.
CGROUP: handle cgroup hierarchy already mounted by the OS like in Fedora 18 (by systemd in /sys/fs/cgroup) in job_resource_manager_cgroups.pl.
Bug fix oar-database: fix the reset function for mysql.
SVG version of drawgantt: all features are now implemented to replace the legacy drawgantt. Both can be installed.
Bug fix schedulers: rewrite schedulers with placeholders.
Rework default admission rules.
Add support to the oar_resource_init command to generate resources with a “thread” property (useful if HyperThreading is activated/used on nodes).
Fix stdout/stderr bug: check the allowed characters in the path given by the users.
Fix: the user shell (bash) didn’t source /etc/bash.bashrc in batch jobs.
Add quota which limits the number of used resources at a time depending of the job attributes: queue, project, types, user (available with the scheduler “oar_sched_gantt_with_timesharing_and_fairsharing_and_quotas”).
Add comments in user job STDERR files to know if a job was killed or checkpointed.
Add the variable $jobproperties_applied_after_validation. It can be used in an admission rule to add a constraint after the validation of the job. Ex:
$jobproperties_applied_after_validation = “maintenance=’off’”;
So, even if all the ressources have “maintenance=’on’”, the new jobs will be accepted but not scheduled now.
Add the oardel option –force-terminate-finishing-job: to use when a job is stuck in the Finishing state.
Bug #15911: Energy saving now waits SCHEDULER_NODE_MANAGER_IDLE_TIME for nodes that have been woken up, even if they didn’t run any job.
Simplify job dependencies: do not check the exit code of the jobs in dependencies.
Admission rules: add the “estimate_job_nb_resources” function that is useful to know the number of resources that will be used by a job.
oarstat: add another output format that can be used by using “–format 2” or by setting “OARSTAT_DEFAULT_OUTPUT_FORMAT=2” in oar.conf.
oarsub: Add the capability to use the tag %jobname% in the STDOUT (-O) and/or STDERR (-E) filenames (like %jobid%).
bug #14935: fix timesharing jobs within a container issue
add schedulers with the placeholder feature.
version 2.5.2:¶
- Bugfix: /var/lib/oar/.bash_oar was empty due to an error in the common setup script.
- Bugfix: the PINGCHECKER_COMMAND in oar.conf depends now on %%OARDIR%%.
- Bug #13939: the job_resource_manager.pl and job_resource_manager_cgroups.pl
- now deletes the user files in /tmp, /var/tmp and /dev/shm at the end of the jobs.
- Bugfix: in oardodo.c, the preprocessed variables was not defined correclty.
- Finaud: fix race condition when there was a PINGCHECKER error jsut before another problem. The node became Alive again when the PINGCHECKER said OK BUT there was another error to resolve.
- Bugfix: The feature CHECK_NODES_WITH_RUNNING_JOB=yes never worked before.
- Speedup monika (X5).
- Monika: Add the conf max_cores_per_line to have several lines if the number of cores are too big.
- Minor changes into API:
- added cmd_output into POST /jobs.
- API: Added GET /select_all?query=<query> (read only mode).
- Add the field “array_index” into the jobs table. So that resubmit a job from an array will have the right array_index anvironment variable.
- oarstat: order the output by job_id.
- Speedup oarnodes.
- Fix a spelling error in the oaradmin manpage.
- Bugfix #14122 : the oar-node init.d script wasn’t executing start_oar_node/stop_oar_node during the ‘restart’ action.
- Allow the dash character into the –notify “exec:…” oarsub option.
- Remove some old stuffs from the tarball:
- visualization_interfaces/{tgoar,accounting,poar};
- scheduler/moldable;
- pbs-oar-lib.
- Fix some licence issues.
version 2.5.1:¶
Sources directories reorganized
New “Phoenix” tool to try to reboot automatically broken nodes (to setup into /etc/oar/oar_phoenix.pl)
New (experimental!) scheduler written in Ocaml
Cpusets are activated by default
Bugfix #11065: oar_resource_init fix (add a space)
Bug 10999: memory leak into Hulot when used with postgresql. The leak has been minimized, but it is still there (DBD::Pg bug)
Almighty cleans ipcs used by oar on exit
Bugfix #10641 and #10999 : Hulot is automatically and periodically restarted
Feature request #10565: add the possibility to check the aliveness of the nodes of a job at the end of this one (pingchecker)
REST API heavily updated: new data structures with paginated results, desktop computing functions, rspec tests, oaradmin resources management, admission rules edition, relative/absolutes uris fixed
New ruby desktop computing agent using REST API (experimental)
Experimental testsuite
Poar: web portal using the REST API (experimental)
Oaradmin YAML export support for resources creation (for the REST API)
Bugfix #10567: enabling to bypass window mechanism of hulot.
Bugfix #10568: Wake up timeout changing with the number of nodes
Add in oar.conf the tag “RUNNER_SLIDING_WINDOW_SIZE”: it allows the runner to use a sliding window to launch the bipbip processes if “DETACH_JOB_FROM_SERVER=1”. This feature avoids the overload of the server if plenty of jobs have to be launched at the same time.
Fix problem when deleting a job in the Suspended state (oarexec was stopped by a SIGSTOP so it was not able to handle the delete operation)
Make the USER_SIGNAL feature of oardel multi job independant and remove the temporary file at the end of the job
- Monika: display if the job is of timesharing type or not
add in the job listing the initial_request (is there a reason to not display it?)
- IoLib: update scheduler_priority resources property for timesharing jobs.
So the scheduler will be able to avoid to launch every timesharing jobs on the same resources (they can be dispatched)
OAREXEC: unmask SIGHUP and SIGPIPE for user script
node_change_state: do not Suspect the first node of a job which was EXTERMINATED by Leon if the cpuset feature is configured (let do the job by the cpuset)
OAREXEC: ESRF detected that sometime oarexec think that he notified the Almighty with it exit code but nothing was seen on the server. So try to resend the exit code until oarexec is killed.
oar_Tools: add in notify_almighty a check on the print and on the close of the socket connected to Almighty.
oaraccounting: –sql is now possible into a “oarstat –accounting” query
Add more logs to the command “oarnodes -e host” when a node turns into Suspected
Execute user commands with /proc/self/oom_adj to 15. So the first processes that will be killed when there is no more memory available is the user ones. Hence the system will remain up and running and the user job will finished. Drawback: this file can be changed manually by the user so if someone knows a method to do the same thing but only managed by root, we take???
Bugfix API: quotes where badly escaped into job submission
Add the possibility to automatically resubmit idempotent job which ends with an exit code of 99: oarsub -t idempotent “sleep 5; exit 99”
Bugfix API: Some informations where missing into jobs/details, especially the scheduled resources.
API: added support of “param_file” value for array job submissions. This value is a string representing the content of a parameters file. Sample submission:
{"resource":"/cpu=1", "command":"sleep", "param_file":"60\n90\n30"}This submits 3 sleep jobs with differents sleep values.
Remove any reference to gridlibs and gridapi as these components are obselete
Add stdout and stderr files of each job in oarstat output.
API now supports fastcgi (big performance raise!)
Add “-f” option to oarnodesetting to read hostnames from a file.
API can get/upload files (GET or POST /media/<file_path>)
Make “X11 forwarding” working even if the user XAUTHORITY environment variable does not contain ~/.Xauthority (GDM issue).
Add job_resource_manager_cgroups which handles cpuset + other cgroup features like network packet tagging, IO disk shares, …
Bugfix #13351: now oar_psql_db_init is executed with root privileges
Bugfix #13434: reservation were not handled correctly with the energy saving feature
Add cgroups FREEZER feature to the suspend/resume script (better than kill SIGSTOP/SIGCONT). This is doable thanks to the new job_resource_manager_cgroups.
Implement a new script ‘oar-database’ to manage the oar database. oar_mysql_init & oar_psql_init are dropped.
Huge code reorganisation to allow a better packaging and system integration
Drop the oarsub/oarstat 2.3 version that was kept for compatiblity issues during the 2.4.x branch.
By default the oar scheduler is now ‘oar_sched_gantt_with_timesharing_and_fairsharing’ and the following values has been set in oar.conf: SCHEDULER_TIMEOUT to 30, SCHEDULER_NB_PROCESSES to 4 and SCHEDULER_FAIRSHARING_MAX_JOB_PER_USER to 30
Add a limitation on the number of concurrent bipbip processes on the server (for detached jobs).
Add IPC cleaning to the job_resource_manager* when there is no other job of the same user on the nodes.
make better scheduling behaviour for dependency jobs
API: added missing stop_time into /jobs/details
version 2.4.4:¶
- oar_resource_init: bad awk delimiter. There’s a space and if the property is the first one then there is not a ‘,’.
- job suspend: oardo does not exist anymore (long long time ago). Replace it with oardodo.
- oarsub: when an admission rule died micheline returns an integer and not an array ref. Now oarsub ends nicely.
- Monika: add a link on each jobid on the node display area.
- sshd_config: with nodes with a lot of core, 10 // connections could be too few
version 2.4.3:¶
- Hulot module now has customizable keepalive feature
- Added a hook to launch a healing command when nodes are suspected (activate the SUSPECTED_HEALING_EXEC_FILE variable)
- Bugfix #9995: oaraccouting script doesn’t freeze anymore when db is unreachable.
- Bugfix #9990: prevent from inserting jobs with invalid username (like an empty username)
- Oarnodecheck improvements: node is not checked if a job is already running
- New oaradmin option: –auto-offset
- Feature request #10565: add the possibility to check the aliveness of the nodes of a job at the end of this one (pingchecker)
version 2.4.2:¶
- New “Hulot” module for intelligent and configurable energy saving
- Bug #9906: fix bad optimization in the gantt lib (so bad scheduling
version 2.4.1:¶
- Bug #9038: Security flaw in oarsub –notify option
- Bug #9601: Cosystem jobs are no more killed when a resource is set to Absent
- Fixed some packaging bugs
- API bug fixes in job submission parsing
- Added standby info into oarnodes -s and available_upto info into /resources uri of the API
- Bug Grid‘5000 #2687 Fix possible crashes of the scheduler.
- Bug fix: with MySQL DB Finaud suspected resources which are not of the “default” type.
- Signed debian packages (install oar-keyring package)
version 2.4.0:¶
Bug #8791: added CHECK_NODES_WITH_RUNNING_JOB=no to prevent from checking occupied nodes
Fix bug in oarnodesetting command generated by oar_resources_init (detect_resources)
Added a –state option to oarstat to only get the status of specified jobs (optimized query, to allow scripting)
Added a REST API for OAR and OARGRID
Added JSON support into oarnodes, oarstat and oarsub
New Makefile adapted to build packages as non-root user
add the command “oar_resources_init” to easily detect and initialize the whole resources of a cluster.
“oaradmin version” : now retrieve the most recent database schema number
Fix rights on the “schema” table in postgresql.
Bug #7509: fix bug in add_micheline_subjob for array jobs + jobtypes
Ctrl-C was not working anymore in oarsub. It seems that the signal handler does not handle the previous syntax ($SIG = ‘qdel’)
Fix bug in oarsh with the “-l” option
Bug #7487: bad initialisation of the gnatt for the container jobs.
Scheduler: move the “delete_unnecessary_subtrees” directly into “find_first_hole”. Thus this is possible to query a job like:
oarsub -I -l nodes=1/core=1+nodes=4/core=2 (no hard separation between each group)
- For the same behaviour as before, you can query:
oarsub -I -l {prop=1}/nodes=1/core=1+{prop=2}/nodes=4/core=2
Bug #7634: test if the resource property value is effectively defined otherwise print a ‘’
Optional script to take into account cpu/core topology of the nodes at boot time (to activate inside oarnodesetting_ssh)
Bug #7174: Cleaned default PATH from “./” into oardodo
Bug #7674: remove the computation of the scheduler_priority field for besteffort jobs from the asynchronous OAR part. Now the value is set when the jobs are turned into toLaunch state and in Error/Terminated.
Bug #7691: add –array and –array-param-file options parsing into the submitted script. Fix also some parsing errors.
Bug #7962: enable resource property “cm_availability” to be manipulated by the oarnodesetting command
- Added the (standby) information to a node state in oarnodes when it’s state
is Absent and cm_availability != 0
Changed the name of cm_availability to available_upto which is more relevant
add a –maintenance option to oarnodesetting that sets the state of a resource to Absent and its available_upto to 0 if maintenance is on and resets previous values if maintenance is off.
added a –signal option to oardel that allow a user to send a signal to one of his jobs
added a name field in the schema table that will refer to the OAR version name
added a table containing scheduler name, script and description
Bug #8559: Almighty: Moved OAREXEC_XXXX management code out of the queue for immediate action, to prevent potential problems in case of scheduler timeouts.
oarnodes, oarstat and the REST API are no more making retry connections to the database in case of failure, but exit with an error instead. The retry behavior is left for daemons.
improved packaging (try to install files in more standard places)
improved init script for Almighty (into deb and rpm packages)
fixed performance issue on oarstat (array_id index missing)
fixed performance issue (job_id index missing in event_log table)
fixed a performance issue at job submission (optimized a query and added an index on challenges table) decisions).
version 2.3.5:¶
- Bug #8139: Drawgantt nil error (Add condition to test the presence of nil value in resources table.)
- Bug #8416: When a the automatic halt/wakeup feature is enabled then there was a problem to determine idle nodes.
- Debug a mis-initialization of the Gantt with running jobs in the metascheduler (concurrency access to PG database)
version 2.3.4:¶
- add the command “oar_resources_init” to easily detect and initialize the whole resources of a cluster.
- “oaradmin version” : now retrieve the most recent database schema number
- Fix rights on the “schema” table in postgresql.
- Bug #7509: fix bug in add_micheline_subjob for array jobs + jobtypes
- Ctrl-C was not working anymore in oarsub. It seems that the signal handler does not handle the previous syntax ($SIG = ‘qdel’)
- Bug #7487: bad initialisation of the gnatt for the container jobs.
- Fix bug in oarsh with the “-l” option
- Bug #7634: test if the resource property value is effectively defined otherwise print a ‘’
- Bug #7674: remove the computation of the scheduler_priority field for besteffort jobs from the asynchronous OAR part. Now the value is set when the jobs are turned into toLaunch state and in Error/Terminated.
- Bug #7691: add –array and –array-param-file options parsing into the submitted script. Fix also some parsing errors.
- Bug #7962: enable resource property “cm_availability” to be manipulated by the oarnodesetting command
version 2.3.3:¶
Fix default admission rules: case unsensitive check for properties used in oarsub
Add new oaradmin subcommand : oaradmin conf. Useful to edit conf files and keep changes in a Subversion repository.
Kill correctly each taktuk command children in case of a timeout.
New feature: array jobs (option –array) (on oarsub, oarstat oardel, oarhold and oarresume) and file-based parametric array jobs (oarsub –array-param-file) /!in this version the DB scheme has changed. If you want to upgrade your installation from a previous 2.3 release then you have to execute in your database one of these SQL script (stop OAR before):
mysql: DB/mysql_structure_upgrade_2.3.1-2.3.3.sql postgres: DB/pg_structure_upgrade_2.3.1-2.3.3.sql
version 2.3.2:¶
- Change scheduler timeout implementation to schedule the maximum of jobs.
- Bug #5879: do not show initial_request in oarstat when it is not a job of the user who launched the oarstat command (oar or root).
- Add a –event option to oarnodes and oarstat to display events recorded for a job or node
- Display reserved resources for a validated waiting reservation, with a hint in their state
- Fix oarproperty: property names are lowercase
- Fix OAR_JOB_PROPERTIES_FILE: do not display system properties
- Add a new user command: oarprint which allow to pretty print resource properties of a job
- Debug temporary job UID feature
- Add ‘kill -9’ on subprocesses that reached a timeout (avoid Perl to wait something)
- desktop computing feature is now available again. (ex: oarsub -t desktop_computing date)
- Add versioning feature for admission rules with Subversion
version 2.3.1:¶
- Add new oarmonitor command. This will permit to monitor OAR jobs on compute nodes.
- Remove sudo dependency and replace it by the commands “oardo” and “oardodo”.
- Add possibility to create a temporary user for each jobs on compute nodes. So you can perform very strong restrictions for each job (ex: bandwidth restrictions with iptable, memory management, … everything that can be handled with a user id)
- Debian packaging: Run OAR specific sshd with root privileges (under heavy load, kernel may be more responsive for root processes…)
- Remove ALLOWED_NETWORKS tag in oar.conf (added more complexeity than resolving problems)
- /!change database scheme for the field exit_code in the table jobs. Now oarstat exit_code line reflects the right exit code of the user passive job (before, even when the user script was not launched the exit_code was 0 which was BAD)
- /!add DB field initial_request in the table jobs that stores the oarsub line of the user
- Feature Request #4868: Add a parameter to specify what the “nodes” resource is a synomym for. Network_address must be seen as an internal data and not used.
- Scheduler: add timeout for each job == 1/4 of the remaining scheduler timeout.
- Bug #4866: now the whole node is Suspected instead of just the par where there is no job onto. So it is possible to have a job on Suspected nodes.
- Add job walltime (in seconds) in parameter of prologue and epilogue on compute nodes.
- oarnodes does not show system properties anymore.
- New feature: container job type now allows to submit inner jobs for a scheduling within the container job
- Monika refactoring and now in the oar packaging.
- Added a table schema in the db with the field version, reprensenting the version of the db schema.
- Added a field DB_PORT in the oar config file.
- Bug #5518: add right initialization of the job user name.
- Add new oaradmin command. This will permit to create resources and manage admission rules more easily.
- Bug #5692: change source code into a right Perl 5.10 syntax.
version 2.2.12:¶
- Bug #5239: fix the bug if there are spaces into job name or project
- Fix the bug in Iolib if DEAD_SWITCH_TIME >0
- Fix a bug in bipbip when calling the cpuset_manager to clean jobs in error
- Bug #5469: fix the bug with reservations and Dead resources
- Bug #5535: checks for reservations made at a same time was wrong.
- New feature: local checks on nodes can be plugged in the oarnodecheck mechanism. Results can be asynchronously checked from the server (taktuk ping checker)
- Add 2 new tables to keep track of the scheduling decisions (gantt_jobs_predictions_log and gantt_jobs_resources_log). This will help debugging scheduling troubles (see SCHEDULER_LOG_DECISIONS in oar.conf)
- Now reservations are scheduled only once (at submission time). Resources allocated to a reservations are definitively set once the validated is done and won’t change in next scheduler’s pass.
- Fix DrawGantt to not display besteffort jobs in the future which is meaningless.
version 2.2.11:¶
- Fix Debian package dependency on a CGI web server.
- Fix little bug: remove notification (scheduled start time) for Interactive reservation.
- Fix bug in reservation: take care of the SCHEDULER_JOB_SECURITY_TIME for reservations to check.
- Fix bug: add a lock around the section which creates and feed the OAR cpuset.
- Taktuk command line API has changed (we need taktuk >= 3.6).
- Fix extra ‘ in the name of output files when using a job name.
- Bug #4740: open the file in oarsub with user privileges (-S option)
- Bug #4787: check if the remote socket is defined (problem of timing with nmap)
- Feature Request #4874: check system names when renaming properties
- DrawGantt can export charts to be reused to build a global multi-OAR view (e.g. DrawGridGantt).
- Bug #4990: DrawGantt now uses the database localtime as its time reference.
version 2.2.10:¶
- Job dependencies: if the required jobs do not have an exit code == 0 and in the state Terminated then the schedulers refuse to schedule this job.
- Add the possibility to disable the halt command on nodes with cm_availability value.
- Enhance oarsub “-S” option (more #OAR parsed).
- Add the possibility to use oarsh without configuring the CPUSETs (can be useful for users that don’t want to configure there ssh keys)
version 2.2.9:¶
- Bug 4225: Dump only 1 data structure when using -X or -Y or -D.
- Bug fix in Finishing sequence (Suspect right nodes).
version 2.2.8:¶
- Bug 4159: remove unneeded Dump print from oarstat.
- Bug 4158: replace XML::Simple module by XML::Dumper one.
- Bug fix for reservation (recalculate the right walltime).
- Print job dependencies in oarstat.
version 2.2.7:¶
version 2.2.11:¶
- Fix Debian package dependency on a CGI web server.
- Fix little bug: remove notification (scheduled start time) for Interactive reservation.
- Fix bug in reservation: take care of the SCHEDULER_JOB_SECURITY_TIME for reservations to check.
- Fix bug: add a lock around the section which creates and feed the OAR cpuset.
- Taktuk command line API has changed (we need taktuk >= 3.6).
- Fix extra ‘ in the name of output files when using a job name.
- Bug #4740: open the file in oarsub with user privileges (-S option)
- Bug #4787: check if the remote socket is defined (problem of timing with nmap)
- Feature Request #4874: check system names when renaming properties
- DrawGantt can export charts to be reused to build a global multi-OAR view (e.g. DrawGridGantt).
- Bug #4990: DrawGantt now uses the database localtime as its time reference.
version 2.2.10:¶
- Job dependencies: if the required jobs do not have an exit code == 0 and in the state Terminated then the schedulers refuse to schedule this job.
- Add the possibility to disable the halt command on nodes with cm_availability value.
- Enhance oarsub “-S” option (more #OAR parsed).
- Add the possibility to use oarsh without configuring the CPUSETs (can be useful for users that don’t want to configure there ssh keys)
version 2.2.9:¶
- Bug 4225: Dump only 1 data structure when using -X or -Y or -D.
- Bug fix in Finishing sequence (Suspect right nodes).
version 2.2.8:¶
- Bug 4159: remove unneeded Dump print from oarstat.
- Bug 4158: replace XML::Simple module by XML::Dumper one.
- Bug fix for reservation (recalculate the right walltime).
- Print job dependencies in oarstat.
version 2.2.7:¶
- Bug 4106: fix oarsh and oarcp issue with some options (erroneous leading space).
- Bug 4125: remove exit_code data when it is not relevant.
- Fix potential bug when changing asynchronously the state of the jobs into “Terminated” or “Error”.
version 2.2.6:¶
- Bug fix: job types was not sent to cpuset manager script anymore.
- (border effect from bug 4069 resolution)
version 2.2.5:¶
- Bug fix: remove user command when oar execute the epilogue script on the nodes.
- Clean debug and mail messages format.
- Remove bad oarsub syntax from oarsub doc.
- Debug xauth path.
- bug 3995: set project correctly when resubmitting a job
- debug ‘bash -c’ on Fedora
- bug 4069: reservations with CPUSET_ERROR (remove bad hosts and continue with a right integrity in the database)
- bug 4044: fix free resources query for reservation (get the nearest hole from the beginning of the reservation)
- bug 4013: now Dead, Suspected and Absent resources have different colors in drawgantt with a popup on them.
version 2.2.4:¶
- Redirect third party commands into oar.log (easier to debug).
- Add user info into drawgantt interface.
- Some bug fixes.
version 2.2.3:¶
- Debug prologue and epilogue when oarexec receives a signal.
version 2.2.2:¶
- Switch nice value of the user processes into 0 in oarsh_shell (in case of sshd was launched with a different priority).
- debug taktuk zombies in pingchecker and oar_Tools
version 2.2.1:¶
- install the “allow_clasic_ssh” feature by default
- debug DB installer
version 2.2:¶
- oar_server_proepilogue.pl: can be used for server prologue and epilogue to authorize users to access to nodes that are completely allocated by OAR. If the whole node is assigned then it kills all jobs from the user if all cpus are assigned.
- the same thing can be done with cpuset_manager_PAM.pl as the script used to configure the cpuset. More efficent if cpusets are configured.
- debug cm_availability feature to switch on and off nodes automatically depending on waiting jobs.
- reservations now take care of cm_availability field
version 2.1.0:¶
- add “oarcp” command to help the users to copy files using oarsh.
- add sudo configuration to deal with bash. Now oarsub and oarsh have the same behaviour as ssh (the bash configuration files are loaded correctly)
- bug fix in drawgantt (loose jobs after submission of a moldable one)
- add SCHEDULER_RESOURCES_ALWAYS_ASSIGNED_TYPE into oar.conf. Thus admin can add some resources for each jobs (like frontale node)
- add possibility to use taktuk to check the aliveness of the nodes
- %jobid% is now replaced in stdout and stderr file names by the effective job id
- change interface to shu down or wake up nodes automatically (now the node list is read on STDIN)
- add OARSUB_FORCE_JOB_KEY in oar.conf. It says to create a job ssh key by default for each job.
- %jobid% is now replaced in the ssh job key name (oarsub -k …).
- add NODE_FILE_DB_FIELD_DISTINCT_VALUES in oar.conf that enables the admin to configure the generated containt of the OAR_NODE_FILE
- change ssh job key oarsub options behaviour
- add options “–reinitialize” and “–delete-before” to the oaraccounting command
- cpuset are now stored in /dev/cpuset/oar
- debian packaging: configure and launch a specific sshd for the user oar
- use a file descriptor to send the node list –> able to handle a very large amount of nodes
- every config files are now in /etc/oar/
- oardel can add a besteffort type to jobs and vis versa
version 2.0.2:¶
- add warnings and exit code to oarnodesetting when there is a bad node name or resource number
- change package version
- change default behaviour for the cpuset_manager.pl (more portable)
- enable a user to use the same ssh key for several jobs (at his own risk!)
- add node hostnames in oarstat -f
- add –accounting and -u options in oarstat
- bug fix on index fields in the database (syncro): bug 2020
- bug fix about server pro/epilogue: bug 2022
- change the default output of oarstat. Now it is usable: bug 1875
- remove keys in authorized_keys of oar (on the nodes) that do not correspond to an active cpuset (clean after a reboot)
- reread oar.conf after each database connection tries
- add support for X11 forwarding in oarsub -I and -C
- debug mysql initialization script in debian package
- add a variable in oarsh for the default options of ssh to use (more useful to change if the ssh version installed does not handle one of these options)
- read oar.conf in oarsh (so admin can more easily change options in this script)
- add support for X11 forwarding via oarsh
- change variable for oarsh: OARSH_JOB_ID –> OAR_JOB_ID
version 2.0.0:¶
- Now, with the ability to declare any type of resources like licences, VLAN, IP range, computing resources must have the type default and a network_address not null.
- Possibility to declare associated resources like licences, IP ranges, … and to reserve them like others.
- Now you can connect to your jobs (not only for reservations).
- Add “cosystem” job type (execute and do nothing for these jobs).
- New scheduler : “oar_sched_gantt_with_timesharing”. You can specify jobs with the type “timesharing” that indicates that this scheduler can launch more than 1 job on a resource at a time. It is possible to restrict this feature with words “user and name”. For example, ‘-t timesharing=user,name’ indicates that only a job from the same user with the same name can be launched in the same time than it.
- Add PostGresSQL support. So there is a choice to make between MySQL and PostgresSQL.
- New approach for the scheduling : administrators have to insert into the databases descriptions about resources and not nodes. Resources have a network address (physical node) and properties. For example, if you have dual-processor, then you can create 2 different resources with the same natwork address but with 2 different processor names.
- The scheduler can now handle resource properties in a hierarchical manner. Thus, for example, you can do “oarsub -l /switch=1/cpu=5” which submit a job on 5 processors on the same switch.
- Add a signal handler in oarexec and propagate this signal to the user process.
- Support ‘#OAR -p …’ options in user script.
- Add in oar.conf:
- DB_BASE_PASSWD_RO : for security issues, it is possible to execute request with parts specified by users with a read only account (like “-p” option).
- OARSUB_DEFAULT_RESOURCES : when nothing is specified with the oarsub command then OAR takes this default resource description.
- OAREXEC_DEBUG_MODE : turn on or off debug mode in oarexec (create /tmp/oar/oar.log on nodes).
- FINAUD_FREQUENCY : indicates the frequency when OAR launchs Finaud (search dead nodes).
- SCHEDULER_TIMEOUT : indicates to the scheduler the amount of time after what it must end itself.
- SCHEDULER_JOB_SECURITY_TIME : time between each job.
- DEAD_SWITCH_TIME : after this time Absent and Suspected resources are turned on the Dead state.
- PROLOGUE_EPILOGUE_TIMEOUT : the possibility to specify a different timeout for prologue and epilogue (PROLOGUE_EPILOGUE_TIMEOUT).
- PROLOGUE_EXEC_FILE : you can specify the path of the prologue script executed on nodes.
- EPILOGUE_EXEC_FILE : you can specify the path of the epilogue script executed on nodes.
- GENERIC_COMMAND : a specific script may be used instead of ping to check aliveness of nodes. The script must return bad nodes on STDERR (1 line for a bad node and it must have exactly the same name that OAR has given in argument of the command).
- JOBDEL_SOFTWALLTIME : time after a normal frag that the system waits to retry to frag the job.
- JOBDEL_WALLTIME : time after a normal frag that the system waits before to delete the job arbitrary and suspects nodes.
- LOG_FILE : specify the path of OAR log file (default : /var/log/oar.log).
- Add wait() in pingchecker to avoid zombies.
- Better code modularization.
- Remove node install part to launch jobs. So it is easier to upgrade from one version to an other (oarnodesetting must already be installed on each nodes if we want to use it).
- Users can specify a method to be notified (mail or script).
- Add cpuset support
- Add prologue and epilogue script to be executed on the OAR server before and after launching a job.
- Add dependancy support between jobs (“-a” option in oarsub).
- In oarsub you can specify the launching directory (“-d” option).
- In oarsub you can specify a job name (“-n” option).
- In oarsub you can specify stdout and stderr file names.
- User can resubmit a job (option “–resubmit” in oarsub).
- It is possible to specify a read only database account and it will be used to evaluate SQL properties given by the user with the oarsub command (more scecure).
- Add possibility to order assigned resources with their properties by the scheduler. So you can privilege some resources than others (SCHEDULER_RESOURCE_ORDER tag in oar.conf file)
- a command can be specified to switch off idle nodes (SCHEDULER_NODE_MANAGER_SLEEP_CMD, SCHEDULER_NODE_MANAGER_IDLE_TIME, SCHEDULER_NODE_MANAGER_SLEEP_TIME in oar.conf)
- a command can be specified to switch on nodes in the Absent state according to the resource property cm_availability in the table resources (SCHEDULER_NODE_MANAGER_WAKE_UP_CMD in oar.conf).
- if a job goes in Error state and this is not its fault then OAR will resubmit this one.