The goal of this project is to develop upon OAR a mechanism for task confinement / isolation and task binding upon the resources of a node based upon **cgroups** (control groups) which is a (recently appeared) Linux kernel feature [1,2,3]. ===== Background ===== The emergence of multicore architectures that introduced new levels of hierarchies inside the node arose the need for methods of job confinement upon specific allocated resources in a single node. Indeed, multiple users can find themselves working on the same computing node, which raised the need of task placement upon specific cores in order to avoid collisions between different tasks. The problem of collisions are that cores or socket resources easily can be oversubscribed, resulting in degraded performance, while other sockets or cores of the same node stay idle. Hence, the secure placement upon cores and in general specific parts of the resources ( memory, disk I/O, bandwidth) became an important challenge for the new Resource Managers. OS techniques like Cpusets, sched_affinity are supported by the Resource and Job Management Systems to provide a fine management of the CPUs (cores) of the nodes. In latest version of Linux kernels it is possible to specifically tell to the scheduler which process can run upon which core. This core affinity, can be influenced, in Linux, via 2 different ways: * **sched_affinity** [4] which is a system call which takes a bitmask as parameter where each bit reflects one core and if a core is on with bit value 1 (or off with value 0) then the scheduler does (or not) migrate the process to that core * **Cpusets** [5] which are processor sets defined by a hierarchical pseudo-filesystem upon which only processes bound to this set can be executed. The mask or set is inherited by all child threads/processes that means that all of them will be using the same set of cores. The use of any of this techniques allows the RJMS to have full control of the processes of its jobs and permits the dedicated binding of jobs upon specific cores along with the efficient cleaning of remaining processes after the end of the job. An extension of cpusets which has recently been added to the Linux kernel (after 2.6.32) are the **cgroups** [1,2,3]. Each control group is a set of tasks on a system that have been grouped together to better manage their interaction with system hardware. They provide a mechanism for aggregating/partitioning sets of tasks and all their future children, into hierarchical groups with specialized behaviour. This behaviour is defined by the different subsystems that exist like: cpuset which assigns the tasks upon cpus, memory which sets limits on memory use, freezer which suspends or resumes tasks, devices which allows or denies access for tasks to specific devices and blkio (supported after kernel version 2.6.35) which assigns specific amount of IO or even network bandwidth to tasks. The intention is that different kind of subsystems hook into the generic cgroup support to provide new attributes for cgroups and therefore to the user’s tasks. **OAR currently provides tasks confinement and binding based upon the cpusets core affinity mecanism.** ===== Goals of this GSOC project ===== - Design and implement the basic framework for the support of cgroups upon OAR. This framework will allow the easy porting of any of the different cgroup subsystems depending the needs of each system. - Provide the implementation of //cpuset//, //memory// and //devices// subsystems for task confinement/isolation upon particular resources. - Develop mechanisms for tasks binding upon cores using either //cpuset// or //hwloc// [6] to allow applications to change on their own the proposed task binding internally . - Experimentation with benchmark or real-life applications. - Eventually provide the implementation of //freezer// and //blkio// subsystems as well. ===== References ===== - http://en.wikipedia.org/wiki/Cgroups - http://www.kernel.org/doc/Documentation/cgroups/cgroups.txt - http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/index.html - http://manpages.unixforum.co.uk/man-pages/linux/suse-linux-10.1/2/sched_getaffinity-man-page.html - http://www.bullopensource.org/cpuset/ - http://www.open-mpi.org/projects/hwloc/