Construction of the Grid5000 image 'lenny2.6.26-OAR2.4.0-SLURM2.0.0-BLCR'
From WikiOAR
lenny2.6.26-OAR2.4.0-SLURM2.0.0-BLCR
is derived from Lenny-x64-base-0.9
to provide a minimal debian environment with more recent software.
Contents |
Identification sheet
lenny2.6.26-OAR2.4.0-SLURM2.0.0-BLCR |
---|
|
Kernel version 2.6.26 from kernel.org for AMD-64 |
Authentication
|
Applications |
Misc
|
Build
Here are explanations on how the system was installed and tuned starting from the content of the Lenny-x64-base-0.9 environment.
Kernel improvements
sources
We need to modify the kernel.
We download the 2.6.26 kernel release from kernel.org.
wget http://www.fr.kernel.org/pub/linux/kernel/v2.6/linux-2.6.26.tar.bz2
The archive need to be extracted at the good place.
cd /usr/src tar jxf /root/linux-2.6.26.tar.bz2 ln -sf /usr/src/linux-2.6.26 /usr/src/linux
And some configuration should be done with :
cd /usr/src/linux cp /boot/config-2.6.24.3 .config
We then make our choices for the modules we want to use, specially for power management: cpufreq and acpi...
make make modules_install
mkinitramfs -o /boot/initrd.img-2.6.26 2.6.26
Environment
Creating image's archive
As for Lenny-x64-base-0.9, system archive creation and retrieving is done with TGZ-G5K:
tgz-g5k
ygeorgiou@chartreuse:~/Images/lenny2.6.26-OAR2.4.0-SLURM2.0.0-BLCR.tgz
Recording environment
Recording environment can be done from a description file. So we create lenny-x64-base-0.9.dsc
:
name = lenny2.6.26-OAR2.4.0-SLURM2.0.0-BLCR description = based on https://www.grid5000.fr/index.php/Lenny-x64-base-0.9 author = yiannis.georgiou@imag.fr filebase = file:///home/grenoble/ygeorgiou/lenny-x64-base-0.9.tgz filesite = file:///grid5000/postinstalls/lenny-x64-base-0.9-post.tgz size = 1000 initrdpath = /boot/initrd.img-2.6.26 kernelpath = /boot/vmlinuz-2.6.26 fdisktype = 83 filesystem = ext2
With karecordenv
, the new environment can be known by Kadeploy:
karecordenv
-felenny2.6.26-OAR2.4.0-SLURM2.0.0-BLCR.dsc
This is the procedure followed to construct the image used for the comparison experiments of OAR with other Resource Management Systems.
MPI libraries
OPENMPI
Checkpoint/Restart libraries
BLCR
DMTCP
Resource Management Systems
OAR2.4.0
Installation
- Installation of Taktuk package (debian stable). Used by OAR for optimal nodes communication.
apt-get install taktuk
- Installation of mysql-server
- Installation of perl, perl-base, perl-dbi, libmysql, perl-suid, perl-mysql, openssh
even if most of them will be automatically installed when installing the OAR debian packages
- Installation of latest OAR unstable packages: oar-common, oar-doc, oar-user, oar-node, oar-server
downloaded from: http://oar.imag.fr/debian/unstable/2.4/
- add:
environment="OAR_KEY=1"
at the beginning of the public key in the ~oar/.ssh/authorized_keys file
Configuration
oar.conf file
Since we will be using taktuk for nodes communication we have to uncomment the following lines in the oar.conf file
TAKTUK_CMD="/usr/bin/taktuk -t 30 -s" PINGCHECKER_TAKTUK_ARG_COMMAND="broadcast exec timeout 5 kill 9 [ true ]"
CPUSETs
The cpuset feature will allow OAR to restrict the use of one (or a group) of cpu for a job. Each computing resource has a cpuset field that will refer to the cpu id that will run the job for this resource.
In order to use this feature, we have to uncomment the following lines in the oar.conf file.
JOB_RESOURCE_MANAGER_PROPERTY_DB_FIELD="cpuset" JOB_RESOURCE_MANAGER_FILE="/etc/oar/job_resource_manager.pl" CPUSET_PATH="/oar"
Automatic Cluster Configuration
- In order to configure your OAR cluster upon Grid5000 you need to execute launch_OAR.sh script on the server with the names of the computing nodes as arguments.
It could be done with the following command directly from the initial sites frontal:
sort -u $OAR_NODEFILE | tail -5 | ssh -l g5k $(sort -u $OAR_NODEFILE | head -1) "xargs ~g5k/launch_OAR.sh"
if we imagine that we have 1 server and 5 computing nodes. the same thing could be achieved if we do :
./launch_OAR.sh node1 node2 node3 node4 node5
upon the possible OAR server of our cluster
In the case of 'lenny2.6.26-OAR2.4.0-SLURM2.0.0-BLCR' image the script launch_OAR.sh contains the following:
#!/bin/sh sudo /etc/init.d/nfs-kernel-server start sudo /etc/init.d/mysql start sudo /etc/init.d/oar-server restart #start nfs, mysql and oar servers #prepare the arguments to use as parameters in taktuk command for faster execution of configuration commands upon the nodes taktuk_parameters="" for node in $@ do taktuk_parameters="$taktuk_parameters -m $node" echo $node >> /tmp/hostfile done HOSTNAME=$(hostname) #stop oar-server on all the computing nodes of the cluster taktuk $taktuk_parameters broadcast exec [ 'sudo /etc/init.d/oar-server stop' ] #declare the oar-server , nfs-server for all the computing nodes taktuk $taktuk_parameters broadcast exec [ "sudo mount $HOSTNAME:/home/g5k /home/g5k" ] #prepare oar-node for energy saving feature upon each computing node, comment next 7 lines if energy saving feature not used sudo sh -c "sed s/localhost/$HOSTNAME/ /home/g5k/conf_files/oar-node > /tmp/oar-node" taktuk $taktuk_parameters broadcast exec [ "sudo chown g5k /etc/default" ] taktuk $taktuk_parameters broadcast exec [ "sudo chown g5k /etc/default/oar-node" ] taktuk $taktuk_parameters broadcast put [ /tmp/oar-node ] [ /etc/default/oar-node ] taktuk $taktuk_parameters broadcast exec [ "sudo chown root /etc/default" ] taktuk $taktuk_parameters broadcast exec [ "sudo chown root /etc/default/oar-node" ] taktuk $taktuk_parameters broadcast exec [ "sudo chgrp root /etc/default/oar-node" ] #the following line checks the cluster resources and declares them on the OAR database using oarnodesetting command sudo /usr/sbin/oar_resources_init /tmp/hostfile # ecological cluster set cm_availability=2147483646 comment if energy saving feature not used sudo echo "UPDATE resources set cm_availability=2147483646" | mysql -uroot oar
SLURM2.0.0
- Install Munge... easy using .deb files
- adduser slurm
- cd slurm_dir