Profiling

From WikiOAR

Revision as of 10:44, 4 December 2011 by Auguste (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Procedure to profile OAR perl code using the Perl profiling tool Devel::DProf. To profile a code run: 1) add the -d:DProf argument to your Perl command line let it go...2) the tmon.out file is produced on the same directory where the Perl command is executed...3) once the run is finished check out the results with dprofpp....

For more details check out here: http://www.perl.com/pub/a/2004/06/25/profiling.html

Here is a description of the procedure to profile OAR code upon Grid5000...

Contents

Modifications for profiling

1) Deploy one node, with OAR installed and configured, but not start it yet. 2) Change the first line of the command you want to profile from this:

#!/usr/bin/perl
to this:
#!/usr/bin/perl -d:DProf

In my case I profile 2 commands on the same time "Nodechangestate" and "oar_meta_sched"...

3) Construct a script that will be called in the place of the profiled command which will be responsible to call the specific profiled command, copy the output of the profiling ("tmon.out" file) on another place and finally erase "tmon.out" file for the next time the command will be executed...

for example:

cat profiled_scheduler
#!/bin/sh
DATE=`date +%s`
cd /usr/lib/oar/
date +%s > /tmp/profiled_sched/tmondate_$DATE
./oar_meta_sched &
wait
date +%s >> /tmp/profiled_sched/tmondate_$DATE
cp tmon.out /tmp/profiled_sched/tmon_$DATE
rm tmon.out
exit 1

and you also have to change Almighty line from this:

my $scheduler_command = $binpath."oar_meta_sched"; to this: my $scheduler_command = $binpath."profiled_sched";

so that the profiler can be executed.

4) we have to allow the writing on /usr/lib/oar/ directory so that tmon.out can be created....so this will do the job:

sudo chmod a+w /usr/lib/oar

and we have to create the directory where we want to store the various "tmon.out" files that we are going to have, with the right to write of course:

sudo mkdir /tmp/profiled_scheduler
sudo chmod a+w /tmp/profiled_scheduler

5) we start OAR server like this:

cd /usr/lib/oar/
sudo Almighty

6) you collect all results in /tmp/profiled_scheduler/ and you can review the results by copying the specific tmon.out_$DATE that interests you on tmon.out and then execute dprofpp which reads tmon.out and shows a result like this:

Total Elapsed Time = 13.30464 Seconds
 User+System Time = 0.184640 Seconds
Exclusive Times
Time ExclSec? CumulS  Calls sec/call Csec/c  Name
32.5   0.060  0.129     10   0.0060 0.0129  main::BEGIN
15.1   0.028  0.047   1156   0.0000 0.0000  DBI::st::fetchrow_hashref
10.8   0.020  0.020      6   0.0033 0.0033  IO::Socket::BEGIN
10.2   0.019  0.067      2   0.0097 0.0333  iolib::list_resources
5.42   0.010  0.010      3   0.0033 0.0033  AutoLoader?::import
5.42   0.010  0.010     11   0.0009 0.0009  DynaLoader?::dl_load_file
5.42   0.010  0.010     11   0.0009 0.0009  Exporter::export
5.42   0.010  0.010     13   0.0008 0.0008  DBI::BEGIN
5.42   0.010  0.060     10   0.0010 0.0060  iolib::BEGIN
5.42   0.010  0.010     56   0.0002 0.0002  Exporter::import
5.42   0.010  0.010    305   0.0000 0.0000  DBI::st::fetchrow_array
4.87   0.009  0.009   1156   0.0000 0.0000  DBI::st::fetch
4.87   0.009  0.009   1160   0.0000 0.0000  DBI::common::FETCH
0.00   0.000  0.000      1   0.0000 0.0000  Exporter::Heavy::heavy_export_ok_t
                                            ags
0.00   0.000  0.000      1   0.0000 0.0000  AutoLoader?::AUTOLOAD

where you can see in which functions the command pass most of its time

Modifications for testing with simulation a specific database trace of a cluster by using only one node

Method 1st Using cosystem jobs

  • In /etc/oar/oar.conf we have to put the
FINAUD_FREQUENCY="0"

and comment the pingchecker lines like this one:

#PINGCHECKER_TAKTUK_ARG_COMMAND="-t 3 broadcast exec true?"

which is frequency for checking Alive and Suspected resources and 0 means never..

  • and then we have to arrange somethings on the trace of the database so that the simulation can work on only one node
sudo /etc/init.d/oar/oar-server stop

charge the database with the trace

sudo mysql oar < oar_sophia.sql

1)Using the job_submitter

  • to fix the difference of database versions....
ALTER TABLE jobs  ADD initial_request TEXT  AFTER job_id;
ALTER TABLE jobs ADD scheduler_info VARCHAR( 255 ) AFTER message;
  • use the oar2swf.rb tool to extract from the database trace the information for the jobs to be submitted on an .swf file....(still working on this tool)
  • delete the jobs from the database so that we can keep only the configuration of resources and inject the jobs from the exact point that we want following the next step
delete from jobs;
  • use the job_submitter.rb tool to inject the .swf file which will do the necessary oarsub of cosystem jobs on the OAR server...

2) Using only the database trace (still bugs)

  • delete from job_types where job_types.types_index="LOG";
  • delete from jobs where state="Error";
  • delete from jobs where state="Terminated";
  • update jobs set info_type="pastel-9.toulouse.grid5000.fr:6666";
  • using the name of your machine
    • update job_types set job_types.type="COSYSTEM";
    • insert into job_types (job_id,type,types_index) VALUES(302835,"COSYSTEM","CURRENT");
    • do the inserts for all the job_id that exist
    • update jobs set job_type="INTERACTIVE";
    • update jobs set command="";
    • update jobs set start_time=0 where reservation="Scheduled" and state="Waiting";
    • update jobs set reservation="None" where reservation="Scheduled" and state="Waiting";
    • update jobs set job_user="g5k";
    • update jobs set launching_directory="/home/g5k/";
    • update jobs set submission_time=1208168214;
  • using the current time (minus one hour for exemple)
    • update jobs set start_time=1208169414 where job_id=302835;

WARNING: not all jobs should have the same submission time...possible error ...

change the stat_time of jobs that are running to be in accordance with the changed submitted time

Method 2nd Disabling the execution on the Runner

Personal tools