Table of content
Many tools are available to visualize the cluster state.
oarnodes: this command shows the resources states. Warning: in our context, a resource is not necessary a machine. It is generally a cpu, a core or a host, but it can be much more... like licence tokens, memory banks... The oarnodes command gives information about the network address where is located this resource, its type, its state and many other (interesting) information.
oarstat: this command shows information about running or planned jobs. (The -f option shows full infomation)
Monika: this web page shows current resources states and jobs information. On this page you can have more information about a particular resource or job.
DrawGantt: this web page shows the gantt diagram of the scheduling. It represents the current, former and future jobs.
To submit on interactive job we use the "oarsub" command with the "-I" option:
submachine:~$> oarsub -I
OAR returns then an unique job ID that will identify your job in the system:
Once the job is scheduled, when the requested resources are available, OAR connects you to the first allocated node. OAR initiates environment variables that inform you of your submission properties:
node:~$> env | grep OAR
Particularly, the allocated nodes list is contained in the $OAR_NODEFILE:
node:~$> cat $OAR_NODEFILE
You can get information about your job by looking at the Monika or DrawGantt interfaces or by typing in a command line console:
submachine:~$> oarstat -fj OAR_JOB_ID
To terminate an interactive job you just have to disconnect from the resource:
You can likewise kill the job by typing:
submachine:~$> oardel JOB_ID
In this case, the session will be violently killed.
The "-l" option allows to specify wanted resources. For example, if we need to work in interactive mode on 2 cpu for a max duration of 30 minutes we will ask:
submachine:~$> oarsub -I -l /cpu=2,walltime=00:30:00
The walltime is the job's max duration. If the job overruns its walltime, it will be killed by the system. Thus, you better have to set your walltime correctly depending on how long will take your job to prevent beeing killed if the walltime has been set too short or beeing scheduled later if it is too long. Then, once the job is scheduled and started, OAR connects you on the first reserved node. You still can access the list of the other resources via the $OAR_NODEFILE env variable.
OAR allows to execute scripts in "passive mode". In this mode, the user specifies a script at the submission time. This script will be executed on the first reserved node. It's within this script that the user will define the way to operate parallel resources. All the $OAR_* env variables are reachable within the script.
In this case, the principle is the same that interactive submission, just replace the "-I" option with the path of your script:
submachine:~$> oarsub -l /cpu=2,walltime=00:30:00 ./hello_mpi.sh
In passive mode, OAR creates 2 files: OAR..stdout for the stdout and OAR..stderr for the stderr.
You can connect a running job with the "-C" option to oarsub:
submachine:~$> oarsub -C <OAR_JOB_ID>
Thus, you will be connected to the first reserved node.
Until now we only asked for immediate start for our submission. However it is also possible to plan a job in the future. This feature is available through the "-r " option:
submachine:~$> oarsub -r '2008-03-07 16:45:00' -l nodes=2,walltime=0:10:00 ./hello_mpi.sh