Using OAR - Basic steps

Visualising the cluster State

Many tools are available to visualize the cluster state.

Shell commands:

  • oarstat: this command shows information about running or planned jobs. (The -f option shows full infomation)
  • oarnodes: this command shows the resources states. Warning: in our context, a resource is not necessary a machine. It is generally a cpu, a core or a host, but it can be much more… like licence tokens, vlan, … The oarnodes command gives information about the network address where is located this resource, its type, its state and many other (interesting) information.

Graphical tools:

  • Monika: this web page shows current resources states and jobs information. On this page you can have more information about a particular resource or job.
  • DrawGantt: this web page shows the gantt diagram of the scheduling. It represents the current, former and future jobs.

Submitting a job in an interactive shell

Submission

To submit an interactive job we use the “oarsub” command with the “-I” option:

frontend:~$> oarsub -I

OAR returns then an unique job ID that will identify your job in the system:

OAR_JOB_ID=1234

Once the job is scheduled, when the requested resources are available, OAR connects you to the first allocated node. OAR initiates environment variables that inform you of your submission properties:

node:~$> env | grep OAR

Particularly, the allocated nodes list is contained in the $OAR_NODEFILE:

node:~$> cat $OAR_NODEFILE

Visualisation

You can get information about your job by looking at the Monika or DrawGantt interfaces or by typing in a command line console:

frontend:~$> oarstat -fj OAR_JOB_ID

Exiting the job

To terminate an interactive job you just have to disconnect from the resource:

node:~$> exit

You can likewise kill the job by typing:

frontend:~$> oardel OAR_JOB_ID

In this case, the session will be killed (“kill -9”).

Interactive submission on many resources

The “-l” option allows to specify wanted resources. For example, if we need to work in interactive mode on 2 cpu for a max duration of 30 minutes we will ask:

frontend:~$> oarsub -I -l /cpu=2,walltime=00:30:00

The walltime is the job’s max duration. If the job overruns its walltime, it will be killed by the system. Thus, you better have to set your walltime correctly depending on how long will take your job to prevent being killed if the walltime has been set too short or being scheduled later if it is too long. Then, once the job is scheduled and started, OAR connects you on the first reserved node. You still can access the list of the other resources via the $OAR_NODEFILE env variable.

Batch submission

OAR allows to execute scripts in “passive mode”. In this mode, the user specifies a script at the submission time. This script will be executed on the first reserved node. It’s within this script that the user will define the way to operate parallel resources. All the $OAR_* env variables are reachable within the script.

The script must be executable.

Submission

In this case, the principle is the same that interactive submission, just replace the “-I” option with the path of your script:

frontend:~$> oarsub -l /cpu=2,walltime=00:30:00 ./hello_mpi.sh

Getting the results of the submission

In passive mode, OAR creates 2 files: OAR.<OAR_JOB_ID>.stdout for the stdout and OAR.<OAR_JOB_ID>.stderr for the stderr. The name of these 2 files can be changed (see “man oarsub”).

Connecting a running job

You can connect a running job with the “-C” option to oarsub:

frontend:~$> oarsub -C <OAR_JOB_ID>

Thus, you will be connected to the first reserved node.

Reservations

Until now we only asked for immediate start for our submission. However it is also possible to plan a job in the future. This feature is available through the “-r <date>” option:

frontend:~$> oarsub -r '2008-03-07 16:45:00' -l nodes=2,walltime=0:10:00 ./hello_mpi.sh