Monitoring and reporting for greener computing
From WikiOAR
Student, please read carefully this page...
Student: Erick Meneses
Mentor: Romain Cavagna
Co-Mentors: Yiannis Georgiou, Joseph Emeras
Contents |
Student: Things to do before starting
-
get an account on grid5000 : https://www.grid5000.fr/mediawiki/index.php/Grid5000:Get_an_account or if you already have one, ask to your mentor to extend your validity account -
get a svn account on the inria gforge: https://gforge.inria.fr/account/register.php (Mescal team) -
connect to the g5k jabber and add mentor as contact
Don't hesitate to contact me if you need help for this.
How to start the project
- The first thing to do will be to setup and configure the energy saving feature already existing in OAR. It will be necessary to test it on the current version used upon Grid'5000 (2.2) and the trunk version (2.4). This feature should be activated upon Grid'5000 as soon as possible.
- Look at SLURM (version 2.0.0) to know what is done concerning energy saving and experiment it upon Grid'5000. Image lenny2.6.26-OAR2.4.0-SLURM2.0.0-BLCR (upon genepi,nancy,rennes)
- Play with watt-meters installed in Genepi cluster
- Watt-meters results on the web: https://helpdesk.grid5000.fr/wattm/grenoble/index.html
- Command-line watt-meters results: Use the following command on chartreuse cluster:
nc -u -l -p 1234 alpes
--Ygeorgiou 10:34, 23 May 2009 (UTC)
Project's specifications
The main part of the project is centered on the monitoring/reporting of the cluster for the needs of administrators and users. In fact in the project we will have go beyond monitoring/reporting for energy consumption. We want to treat cases like monitoring of memory usage or network communication performance for specific jobs or users for specific time periods. We could then want to be able to provide specific accounting and even invoicing (facturation) per user or per project. It is obvious that we will be needing to exploit and evolve the accounting functionalities of OAR as well. Perhaps the notion of "karma" introduced in OAR for the use upon the -fair sharing- scheduling, can be used and/or integrated on the whole context.
What is needed to start:
- The state of the art of other systems concerning accounting/monitoring/reporting... Especially SLURM, PBSpro, kaspied (used in Grid5000) and perhaps Condor and Torque
- Understand the already existed mechanism of accounting (Tables of accounting in the database that use karma) and monitoring (oarmonitor command). Moreover we need to check out kaspied (an accounting tool written in Ruby, which is used in Grid5000). Perhaps the integration of kaspied inside OAR can be interesting. We need to see if we have to evolve the already existing functionalities or if we need to redefine a new framework.
- Create all interesting use cases and try to define different levels of complexities. Our approach has to be general so that we can be flexible enough to integrate complex use cases in the future. But we will deal with only the simpler cases in the beginning.
Roadmap (and Timeline)
Official gsoc date: 23rd May to 17th August.
TODO list
Mentor
- Construct or adapt an existing Grid5000 image with the OAR version currently used in Grid5000 (2.2.15).
-
Create, if not exist, an image with the trunk version of OAR (2.4). This image will be used for the GSoC internship.http://oar-wiki.imag.fr/index.php/Deploying_OAR_cluster_upon_Grid5000 --Ygeorgiou 13:52, 17 May 2009 (UTC) - Check if the energy saving feature already included in OAR works with OAR 2.2 and 2.4 upon Grid5000.
-
Construct a Grid5000 image with SLURM 2.0 (including energy saving). Maybe used the same image to include SLURM and OAR ..image
lenny2.6.26-OAR2.4.0-SLURM2.0.0-BLCR in nancy, rennes and grenoble --Ygeorgiou 13:52, 17 May 2009 (UTC)
- Integrate the energy saving feature created by Kamal on the current trunk OAR version and test it.
- Provide an official energy saving mechanism for production usage upon Grid5000.
Research Interests
- Benchmarks and tests to be used to experiment with the efficiency of the energy saving features. Test the Green500 benchmark and check out if it can be interesting to use it or even adapt it to our context: http://www.green500.org/resources.php#run_rules
Links to look at
- "OAR Energy Saving:" http://oar.imag.fr/works/gsoc/2008/gsoc_energy_saving.html
- "Grid5000 Experimental Platform:" https://www.grid5000.fr
- "Moab Green Computing:" http://www.clusterresources.com/solutions/green-computing.php
- "Green-Net Project:" http://www.ens-lyon.fr/LIP/RESO/Projects/GREEN-NET/
- "Electrical Consumption on Grid'5000 Grenoble site:" https://helpdesk.grid5000.fr/wattm/grenoble/index.html
- "PowerTop:" http://www.lesswatts.org/projects/powertop/
- "oarmonitor command:" http://oar.imag.fr/admins/admin_documentation.html#oarmonitor
- "Green500": http://www.green500.org/
- Monitoring and Forecasting tool "Network Weather service", already integrated with Condor , could be interesting for the monitoring part : http://nws.cs.ucsb.edu/ewiki/nws.php?id=Introduction