OAR is a versatile resource and task manager (also called a batch scheduler) for HPC clusters, and other computing infrastructures (like distributed computing experimental testbeds where versatility is a key).
OAR architecture is based on a database (PostgreSQL (preferred) or MySQL), a script language (Perl) and an optional scalable administrative tool (e.g. Taktuk). It is composed of modules which interact mainly via the database and are executed as independent programs. Therefore, formally, there is no API, the system interaction is completely defined by the database schema. This approach eases the development of specific modules. Indeed, each module (such as schedulers) may be developed in any language having a database access library.
Using Linux kernel new feature called cpuset, OAR2 allows a more reliable management of the resources:
Beside, features like job dependency and check-pointing are now available, allowing a better resources use.
A cpuset is attached to every process, and allows:
OAR 2 can manage complex hierarchies of resources. For example: ~ 1. clusters 2. switchs 3. nodes 4. cpus 5. cores
By providing a mechanism to isolate the jobs at the core level, OAR 2 is one of the most modern cluster management systems. Users developing cluster or grid algorithms and programs will then work in a today's up-to-date environment similar to the ones they will meet with other recent cluster management systems on production platforms for instance.
Now a day, machines with more than 4 cores become common. Thus, it is then very important to be able to handle cores efficiently. By providing resources selection and processes isolation at the core level, OAR 2 allows users running experiments that do not require the exclusivity of a node (at least during a preparation phase) to have access to many nodes on one core only, but leave the remaining cores free for other users. This can allow to optimize the number of available resources.
Beside, OAR 2 also provide a time-sharing feature which will allow to share a same set of resources among users.
Using OAR 2 OARSH connector to access the job resources, basic usages will not anymore require the user to configure his SSH environment as everything is handled internally (known host keys management, etc). Beside, users that would actually prefer not using OARSH can still use SSH with just the cost of some options to set (one of the features of the OARSH wrapper is to actually hide these options).
As access to one cluster resources is restricted to an attached job, one may wonder if connections from job to job, from cluster to cluster, from site to site would still be possible. OAR 2 provides a mechanism called job-key than allows inter job communication, even on several sites managed by several OAR 2 servers (this mechanism is indeed used by OARGrid2 for instance).
OAR 2 features a mechanism to manage resources like software licenses or other non-material resources the same way it manages classical resources.
Oar is an opensource batch scheduler which provides a simple and flexible exploitation of a cluster.
It manages resources of clusters like other traditional batch scheduler (as PBS / Torque / LSF / SGE). In other words, it doesn't execute your job on the resources but manages them (reservation, acces granting) in order to allow you to connect these resources and use them.
Its design is based on high level tools:
It is flexible enough to be suitable to manage large HPC clusters as well as other computer infrastructures such as research testbeds like Grid'5000.
Features: