Profiling
From WikiOAR
Procedure to profile OAR perl code using the Perl profiling tool Devel::DProf. To profile a code run: 1) add the -d:DProf argument to your Perl command line let it go...2) the tmon.out file is produced on the same directory where the Perl command is executed...3) once the run is finished check out the results with dprofpp....
For more details check out here: http://www.perl.com/pub/a/2004/06/25/profiling.html
Here is a description of the procedure to profile OAR code upon Grid5000...
Contents |
Modifications for profiling
1) Deploy one node, with OAR installed and configured, but not start it yet. 2) Change the first line of the command you want to profile from this:
#!/usr/bin/perl to this: #!/usr/bin/perl -d:DProf
In my case I profile 2 commands on the same time "Nodechangestate" and "oar_meta_sched"...
3) Construct a script that will be called in the place of the profiled command which will be responsible to call the specific profiled command, copy the output of the profiling ("tmon.out" file) on another place and finally erase "tmon.out" file for the next time the command will be executed...
for example:
cat profiled_scheduler #!/bin/sh
DATE=`date +%s` cd /usr/lib/oar/ date +%s > /tmp/profiled_sched/tmondate_$DATE ./oar_meta_sched &
wait date +%s >> /tmp/profiled_sched/tmondate_$DATE
cp tmon.out /tmp/profiled_sched/tmon_$DATE rm tmon.out
exit 1
and you also have to change Almighty line from this:
my $scheduler_command = $binpath."oar_meta_sched"; to this: my $scheduler_command = $binpath."profiled_sched";
so that the profiler can be executed.
4) we have to allow the writing on /usr/lib/oar/ directory so that tmon.out can be created....so this will do the job:
sudo chmod a+w /usr/lib/oar
and we have to create the directory where we want to store the various "tmon.out" files that we are going to have, with the right to write of course:
sudo mkdir /tmp/profiled_scheduler sudo chmod a+w /tmp/profiled_scheduler
5) we start OAR server like this:
cd /usr/lib/oar/ sudo Almighty
6) you collect all results in /tmp/profiled_scheduler/ and you can review the results by copying the specific tmon.out_$DATE that interests you on tmon.out and then execute dprofpp which reads tmon.out and shows a result like this:
Total Elapsed Time = 13.30464 Seconds User+System Time = 0.184640 Seconds Exclusive Times Time ExclSec? CumulS Calls sec/call Csec/c Name 32.5 0.060 0.129 10 0.0060 0.0129 main::BEGIN 15.1 0.028 0.047 1156 0.0000 0.0000 DBI::st::fetchrow_hashref 10.8 0.020 0.020 6 0.0033 0.0033 IO::Socket::BEGIN 10.2 0.019 0.067 2 0.0097 0.0333 iolib::list_resources 5.42 0.010 0.010 3 0.0033 0.0033 AutoLoader?::import 5.42 0.010 0.010 11 0.0009 0.0009 DynaLoader?::dl_load_file 5.42 0.010 0.010 11 0.0009 0.0009 Exporter::export 5.42 0.010 0.010 13 0.0008 0.0008 DBI::BEGIN 5.42 0.010 0.060 10 0.0010 0.0060 iolib::BEGIN 5.42 0.010 0.010 56 0.0002 0.0002 Exporter::import 5.42 0.010 0.010 305 0.0000 0.0000 DBI::st::fetchrow_array 4.87 0.009 0.009 1156 0.0000 0.0000 DBI::st::fetch 4.87 0.009 0.009 1160 0.0000 0.0000 DBI::common::FETCH 0.00 0.000 0.000 1 0.0000 0.0000 Exporter::Heavy::heavy_export_ok_t ags 0.00 0.000 0.000 1 0.0000 0.0000 AutoLoader?::AUTOLOAD
where you can see in which functions the command pass most of its time
Modifications for testing with simulation a specific database trace of a cluster by using only one node
Method 1st Using cosystem jobs
- In /etc/oar/oar.conf we have to put the
FINAUD_FREQUENCY="0"
and comment the pingchecker lines like this one:
#PINGCHECKER_TAKTUK_ARG_COMMAND="-t 3 broadcast exec true?"
which is frequency for checking Alive and Suspected resources and 0 means never..
- and then we have to arrange somethings on the trace of the database so that the simulation can work on only one node
sudo /etc/init.d/oar/oar-server stop
charge the database with the trace
sudo mysql oar < oar_sophia.sql
1)Using the job_submitter
- to fix the difference of database versions....
ALTER TABLE jobs ADD initial_request TEXT AFTER job_id; ALTER TABLE jobs ADD scheduler_info VARCHAR( 255 ) AFTER message;
- use the oar2swf.rb tool to extract from the database trace the information for the jobs to be submitted on an .swf file....(still working on this tool)
- delete the jobs from the database so that we can keep only the configuration of resources and inject the jobs from the exact point that we want following the next step
delete from jobs;
- use the job_submitter.rb tool to inject the .swf file which will do the necessary oarsub of cosystem jobs on the OAR server...
2) Using only the database trace (still bugs)
- delete from job_types where job_types.types_index="LOG";
- delete from jobs where state="Error";
- delete from jobs where state="Terminated";
- update jobs set info_type="pastel-9.toulouse.grid5000.fr:6666";
- using the name of your machine
- update job_types set job_types.type="COSYSTEM";
- insert into job_types (job_id,type,types_index) VALUES(302835,"COSYSTEM","CURRENT");
- do the inserts for all the job_id that exist
- update jobs set job_type="INTERACTIVE";
- update jobs set command="";
- update jobs set start_time=0 where reservation="Scheduled" and state="Waiting";
- update jobs set reservation="None" where reservation="Scheduled" and state="Waiting";
- update jobs set job_user="g5k";
- update jobs set launching_directory="/home/g5k/";
- update jobs set submission_time=1208168214;
- using the current time (minus one hour for exemple)
- update jobs set start_time=1208169414 where job_id=302835;
WARNING: not all jobs should have the same submission time...possible error ...
change the stat_time of jobs that are running to be in accordance with the changed submitted time