The goal of this project is to develop upon OAR a mechanism for task confinement / isolation and task binding upon the resources of a node based upon cgroups (control groups) which is a (recently appeared) Linux kernel feature [1,2,3].
The emergence of multicore architectures that introduced new levels of hierarchies inside the node arose the need for methods of job confinement upon specific allocated resources in a single node. Indeed, multiple users can find themselves working on the same computing node, which raised the need of task placement upon specific cores in order to avoid collisions between different tasks. The problem of collisions are that cores or socket resources easily can be oversubscribed, resulting in degraded performance, while other sockets or cores of the same node stay idle. Hence, the secure placement upon cores and in general specific parts of the resources ( memory, disk I/O, bandwidth) became an important challenge for the new Resource Managers. OS techniques like Cpusets, sched_affinity are supported by the Resource and Job Management Systems to provide a fine management of the CPUs (cores) of the nodes.
In latest version of Linux kernels it is possible to specifically tell to the scheduler which process can run upon which core. This core affinity, can be influenced, in Linux, via 2 different ways:
The mask or set is inherited by all child threads/processes that means that all of them will be using the same set of cores. The use of any of this techniques allows the RJMS to have full control of the processes of its jobs and permits the dedicated binding of jobs upon specific cores along with the efficient cleaning of remaining processes after the end of the job.
An extension of cpusets which has recently been added to the Linux kernel (after 2.6.32) are the cgroups [1,2,3]. Each control group is a set of tasks on a system that have been grouped together to better manage their interaction with system hardware. They provide a mechanism for aggregating/partitioning sets of tasks and all their future children, into hierarchical groups with specialized behaviour. This behaviour is defined by the different subsystems that exist like: cpuset which assigns the tasks upon cpus, memory which sets limits on memory use, freezer which suspends or resumes tasks, devices which allows or denies access for tasks to specific devices and blkio (supported after kernel version 2.6.35) which assigns specific amount of IO or even network bandwidth to tasks. The intention is that different kind of subsystems hook into the generic cgroup support to provide new attributes for cgroups and therefore to the user’s tasks.
OAR currently provides tasks confinement and binding based upon the cpusets core affinity mecanism.