Differences

This shows you the differences between two versions of the page.

Link to this comparison view

wiki:old:gsoc_proposal_cigri_scheduler [2013/07/10 22:55] (current)
Line 1: Line 1:
 +====== Advanced scheduler for the CiGri grid ======
 +
 +===== Introduction =====
 +
 +CiGri is a free software allowing to construct a computing grid which exploits idle processor cycles of a supercomputers group. CiGri is a lightweight grid which manages only "​bag-of-tasks"​ applications that generaly result in a big number (>​100000) of independant and idempotent sub-tasks. Running such applications at the grid level optimizes the load of supercomputers in a middle sized HPC center. Such a grid is running over HPC hosts of the Joseph Fourier university (Grenoble/​FRANCE) since 2002, inside the "​CIMENT"​ project. It manages today 2000 cpu cores and executed about 5 millions of tasks. The name CiGri comes originaly from this CIMENT project (CIment-GRId). CiGri is also an experimental platform for research on distributed computing scheduling at the computing laboratory of Grenoble (LIG).
 +
 +===== Cigri structure =====
 +
 +CiGri is currently composed of a MySQL database, some perl modules and some ruby modules. Communication and authentication mechanisms are based on simple and standard tools like SSH, SUDO and LDAP. A web PHP/Smarty interface allows users to check their job campaigns statuses and to interact if necessary. Everything has been developped and tested for Linux. The modules are cyclicly executed: nodes and tasks states update, tasks to submit scheduling, tasks launching, cleaning, error control, statistics update. An external asynchronous module is dedicated to results collecting.
 +
 +===== Environnement =====
 +
 +Development is done inside the "​MESCAL"​ team of the "​LIG"​ laboratory and guidelines are given by this team research projects and by the needs of the "​CIMENT"​ project. It's a motivating environnement because you can talk with engineers and researchers working on complex problems about distributed computing. This "​MESCAL"​ team is also in charge of the development of the OAR resources manager and is a major actor of the development and administration of the experimental french national grid called "​Grid5000"​.
 +
 +===== The need for a new scheduler =====
 +
 +The current scheduler module of CiGri is a basic FIFO and doesn'​t deal with parallel tasks. When the grid runs at a high load, it may happen that a user is blocked for days, even if he needs very few resources, for example if he only need to test a new campaign. Furthermore,​ the Grenoble'​s grid is becoming bigger and extends to other towns (Lyon, for the RaGrid project). That brings new needs like users priorities over some clusters, depending on their community.
 +
 +We want to develop a new scheduler that will be able to manage users fairsharing,​ several queues with complex priorities and to take into account parallel tasks running on several nodes of a cluster. This new scheduler should also manage interactions with other new modules that are also currently under development,​ like a checkpointing module. We want to use a high level modern language, like Ruby, to have a very maintenable and scalable code. In addition to this development task, the trainee, if he wants, could approach more "​system"​ aspects and participate to the implementation of other tools like checkpointing or distributed filesystems. Also note that this project is very close to the OAR project and Grid5000 administration.
 +
 +===== Links =====
 +
 +  *  CIMENT project: http://​ciment.ujf-grenoble.fr
 +  *  CiGri: http://​cigri.imag.fr
 +  *  MESCAL Team: http://​mescal.imag.fr/​
 +  *  RaGrid: https://​ciment.ujf-grenoble.fr/​ragrid
 +  *  OAR: http://​oar.imag.fr
 +  *  Grid5000: http://​www.grid5000.fr
 +
 +===== Contact =====
 +
 +Bruno Bzeznik <​Bruno.Bzeznik@imag.fr>​
 +
 +===== Mentors =====
 +  *  Bruno Bzeznik
 +  *  Yiannis Georgiou
  
wiki/old/gsoc_proposal_cigri_scheduler.txt ยท Last modified: 2013/07/10 22:55 (external edit)
Recent changes RSS feed GNU Free Documentation License 1.3 Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki