GSOC 2011 Scheduling

From WikiOAR

Jump to: navigation, search

The goal of this project is to extend the current scheduling capabilities of OAR Resource and Job Management System (RJMS). In more particular we want to have two new scheduling policies implemented: Preemption and Gang Scheduling

Background

The part of the RJMS which hides all the intelligence of the system is the scheduler. Its main role is to assign jobs according to the users needs and predefined rules and policies, upon available computational resources that match with the demands. A typical scheduler functions in cooperation with queues which are elements defined to organize jobs according to similar characteristics (for example priorities).

Different kind of scheduling policies have been defined and implemented upon the various production RJMS systems.

These are the most common ones:

  • FIFO: jobs are treated with the order they arrive.
  • Multifactor: jobs priorities are treated according to some parametrized factors and weights (like job age, size, fair share, etc)
  • Backfill: fill up empty wholes in the scheduling tables without modifying the order or the execution of previous submitted jobs.
  • Gang Scheduling: multiple jobs may be allocated to the same resources and are alternately suspended/resumed letting only one of them at a time have dedicated use of those resources, for a predefined duration.
  • Time-sharing: multiple jobs may be allocated to the same resources allowing the sharing of computational resources. The sharing is managed by the scheduler of the operating system
  • Fair-sharing: take into account past executed jobs of each user and give priorities to users that have been less using the cluster.
  • Preemption: suspending one or more ”low-priority” jobs to let a ”high-priority” job run uninterrupted until it completes.The above scheduling policies may co-exist (i.e. multifactor with preemption and backfill or backfill with fair-sharing, etc)

Backfill, has several versions like conservative or aggressive. In the first one, which is most commonly used, a smaller job is moved forward in the queue as long as it does not delay any previously queued job. The second also known as EASY backfilling a small job is allowed to leap forward as long as it does not delay the first job in the queue.

OAR currently supports: FIFO, backfill (conservative version), Time-sharing and Fair-sharing.

In this project we are interested to provide support of Gang Scheduling and preemption policies upon OAR.

Gang Scheduling policy is a stricter variant of TimeSharing which allows the actual concurrent execution of jobs upon the same esources. This policy temporarily preempts and then reschedules jobs upon specific time intervals. It provides an environment similar to a dedicated machine, in which all job’s processes are executed together and at the same time resources are time-shared among different jobs.

The preemption policy is defined by the stop and later restart of lower priority jobs in order to allow higher priority jobs perform urgent computations. Preemption can be implemented with stop/reschedule, suspend/resume or checkpoint/restart models.


Goals of this GSOC project

  1. Design and implementation of the Preemption scheduling policy for OAR
  2. Design and implementation of the Gang Scheduling policy for OAR
  3. Simulation and/or real experimentation to evaluate the newly defined policies or optimizations

References

  1. https://computing.llnl.gov/linux/slurm/preempt.html
  2. https://computing.llnl.gov/linux/slurm/gang_scheduling.html
Personal tools