This batch system is based on a database (MySql or PostgreSQL), a script language (Perl) and an optional scalable administrative tool (component of Taktuk framework). It is composed of modules which interact only with database and are executed as independent programs. So formally, there is no API, the system is completely defined by the database schema. This approach eases the development of specific modules. Indeed, each module (such as schedulers) may be developed in any language having a database access library.
Main features :
- Batch and Interactive jobs
- Admission rules
- Matching of resources (job/node properties)
- Hold and resume jobs
- Multi-schedulers support (simple fifo and fifo with matching)
- Multi-queues with priority
- Best-effort queues (for exploiting idle resources)
- Check compute nodes before launching
- Epilogue/Prologue scripts
- Activity visualization tools (Monika)
- No Daemon on compute nodes
- rsh and ssh as remote execution protocols (managed by Taktuk)
- Dynamic insertion/deletion of compute node
- First-Fit Scheduler with matching resource
- Advance Reservation
- Environnement of Demand support (Ka-tools integration)
- Grid integration with Cigri system
- Simple Desktop Computing Mode
We present below some points that explain benefits of the new version of OAR.
Using Linux kernel new feature called cpuset, OAR2 allows a more reliable management of the resources:
- No unattended processes should remain from previous jobs.
- Access to the resources is now restricted to the owner of the resources.
Beside, features like job dependency and check-pointing are now available, allowing a better resources use.
A cpuset is attached to every process, and allows:
- to specify which resource processor/memory can be used by a process, e.g. resources allocated to the job in OAR 2 context.
- to group and identify processes that share the same cpuset, e.g. the processes of a job in OAR 2 context, so that actions like clean-up can be efficiently performed. (here, cpusets provide a replacement for the group/session of processes concept that is not efficient in Linux).
OAR 2 can manage complex hierarchies of resources. For example: ~ 1. clusters 2. switchs 3. nodes 4. cpus 5. cores
By providing a mechanism to isolate the jobs at the core level, OAR 2 is one of the most modern cluster management systems. Users developing cluster or grid algorithms and programs will then work in a today's up-to-date environment similar to the ones they will meet with other recent cluster management systems on production platforms for instance.
Now a day, machines with more than 4 cores become common. Thus, it is then very important to be able to handle cores efficiently. By providing resources selection and processes isolation at the core level, OAR 2 allows users running experiments that do not require the exclusivity of a node (at least during a preparation phase) to have access to many nodes on one core only, but leave the remaining cores free for other users. This can allow to optimize the number of available resources.
Beside, OAR 2 also provide a time-sharing feature which will allow to share a same set of resources among users.
Using OAR 2 OARSH connector to access the job resources, basic usages will not anymore require the user to configure his SSH environment as everything is handled internally (known host keys management, etc). Beside, users that would actually prefer not using OARSH can still use SSH with just the cost of some options to set (one of the features of the OARSH wrapper is to actually hide these options).
As access to one cluster resources is restricted to an attached job, one may wonder if connections from job to job, from cluster to cluster, from site to site would still be possible. OAR 2 provides a mechanism called job-key than allows inter job communication, even on several sites managed by several OAR 2 servers (this mechanism is indeed used by OARGrid2 for instance).
OAR 2 features a mechanism to manage resources like software licenses or other non-material resources the same way it manages classical resources.
At the present time, OAR is used in several countries (France, Slovakia, Brazil) by several types of users. These users are not only programmers and computer specialists but also simple scientists, novices at programming. Thus the spread of users type is wide.
They are mainly:
- biologists that work on medical imaging, radioactivity study...
- chemical engineers
- computer sciences engineers and researchers that work on many subjects as cryptography, data mining, HPC...
- stargazers that work on subjetcs like trajectory computation and data analysis from probes
Oar is an opensource batch scheduler which provides a simple and flexible exploitation of a cluster.
It manages resources of clusters as a traditional batch scheduler (as PBS / Torque / LSF / SGE). In other words, it doesn't execute your job on the resources but manages them (reservation, acces granting) in order to allow you to connect these resources and use them.
Its design is based on high level tools:
- relational database engine MySQL or PostgreSQL,
- scripting language Perl,
- confinement system mechanism cpuset,
- scalable exploiting tool Taktuk.
It is flexible enough to be suitable for production clusters and research experiments. It currently manages over than 5000 nodes and has executed more than 5 million jobs.
- No specific daemon on nodes.
- No dependence on specific computing libraries like MPI. We support all sort of parallel user applications.
- Upgrades are made on the servers, nothing to do on computing nodes.
- CPUSET (2.6 linux kernel) integration which restricts the jobs on assigned resources (also useful to clean completely a job, even parallel jobs).
- All administration tasks are performed with the taktuk command (a large scale remote execution deployment): http://taktuk.gforge.inria.fr/.
- Hierarchical resource requests (handle heterogeneous clusters).
- Gantt scheduling (so you can visualize the internal scheduler decisions).
- Full or partial time-sharing.
- Licences servers management support.
- Best effort jobs : if another job wants the same resources then it is deleted automatically (useful to execute programs like SETI@home).
- Environment deployment support (Kadeploy): http://kadeploy.imag.fr/.
Other more common features:
- Batch and Interactive jobs.
- Admission rules.
- Multi-schedulers support.
- Multi-queues with priority.
- First-Fit Scheduler.
- Support of moldable tasks.
- Check compute nodes.
- Epilogue/Prologue scripts.
- Support of dynamic nodes.
- Suspend/resume jobs.