Monitoring and reporting for greener computing

From WikiOAR

Revision as of 09:32, 8 March 2010 by Rcavagna (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

See original proposal here

Student, please read carefully this page...


Student: Erick Meneses

Mentor: Romain Cavagna

Co-Mentors: Yiannis Georgiou, Joseph Emeras



Contents

Student: Things to do before starting

Don't hesitate to contact me if you need help for this.

How to start the project

  • The first thing to do will be to setup and configure the energy saving feature already existing in OAR. It will be necessary to test it on the current version used upon Grid'5000 (2.2) and the trunk version (2.4). This feature should be activated upon Grid'5000 as soon as possible.
  • Look at SLURM (version 2.0.0) to know what is done concerning energy saving and experiment it upon Grid'5000. Image lenny2.6.26-OAR2.4.0-SLURM2.0.0-BLCR (upon genepi,nancy,rennes)
  • Play with watt-meters installed in Genepi cluster
nc -u -l -p 1234 alpes


--Ygeorgiou 10:34, 23 May 2009 (UTC)

Project's specifications

The main part of the project is centered on the monitoring/reporting of the cluster for the needs of administrators and users. In fact in the project we will have go beyond monitoring/reporting for energy consumption. We want to treat cases like monitoring of memory usage or network communication performance for specific jobs or users for specific time periods. We could then want to be able to provide specific accounting and even invoicing (facturation) per user or per project. It is obvious that we will be needing to exploit and evolve the accounting functionalities of OAR as well. Perhaps the notion of "karma" introduced in OAR for the use upon the -fair sharing- scheduling, can be used and/or integrated on the whole context.

What is needed to start:

  • The state of the art of other systems concerning accounting/monitoring/reporting... Especially SLURM, PBSpro, kaspied (used in Grid5000) and perhaps Condor and Torque
  • Understand the already existed mechanism of accounting (Tables of accounting in the database that use karma) and monitoring (oarmonitor command). Moreover we need to check out kaspied (an accounting tool written in Ruby, which is used in Grid5000). Perhaps the integration of kaspied inside OAR can be interesting. We need to see if we have to evolve the already existing functionalities or if we need to redefine a new framework.
  • Create all interesting use cases and try to define different levels of complexities. Our approach has to be general so that we can be flexible enough to integrate complex use cases in the future. But we will deal with only the simpler cases in the beginning.

Roadmap (and Timeline)

Official gsoc date: 23rd May to 17th August.

TODO list

Mentor

  • Construct or adapt an existing Grid5000 image with the OAR version currently used in Grid5000 (2.2.15).
  • Create, if not exist, an image with the trunk version of OAR (2.4). This image will be used for the GSoC internship. http://oar-wiki.imag.fr/index.php/Deploying_OAR_cluster_upon_Grid5000 --Ygeorgiou 13:52, 17 May 2009 (UTC)
  • Check if the energy saving feature already included in OAR works with OAR 2.2 and 2.4 upon Grid5000.
  • Construct a Grid5000 image with SLURM 2.0 (including energy saving). Maybe used the same image to include SLURM and OAR .. image

lenny2.6.26-OAR2.4.0-SLURM2.0.0-BLCR in nancy, rennes and grenoble --Ygeorgiou 13:52, 17 May 2009 (UTC)

  • Integrate the energy saving feature created by Kamal on the current trunk OAR version and test it.
  • Provide an official energy saving mechanism for production usage upon Grid5000.

Research Interests

  • Benchmarks and tests to be used to experiment with the efficiency of the energy saving features. Test the Green500 benchmark and check out if it can be interesting to use it or even adapt it to our context: http://www.green500.org/resources.php#run_rules

Links to look at

Personal tools