Scheduling and managing jobs on Saguaro
Saguaro uses the Moab scheduler on top of the Torque resource manager to
handle the workload of its users. The cluster is currently configured with
three main queues for users use.
Queues
Serial: for single or dual processor jobs that only need to run on a single
node.
Medium: for jobs that require 3 processors or more up to 128 processors.
Large: for jobs that require more than 128 processors
To see all available* queues use the command:
qstat -q
*please note that you may not have access to all queues.
Here is a simple guide to interfacing with the scheduler.
To submit jobs into the cluster for scheduling it is best to use the command:
qsub (options) (job_script)
the options can be included in the job script or on the command line. A
simple job script could look like:
#PBS -N Sleep_test
#PBS -q "serial"
#PBS -l nodes=1:ppn=1
#PBS -l advres=bmaxwell.525
#PBS -l walltime=00:10:00
sleep 600
In this simple example the job script is asking for one node with at least
one processor(ppn), using an advance reservation named bmaxwell.525(these are
setup by request), with a wall time of ten minutes. The -N option is the name
of the job, and the -q specifies the queue name.
Once a job has been submitted to the scheduler, there are a few ways to keep
track of whats going on.
to see jobs that are running, pending or blocked use the command:
qstat
or
showq -u
Both will show the jobs that you the user have submitted to the cluster. If
you want to see all jobs that have been submitted then use the command:
showq
by itself it will show all the jobs that are running by all users.
If you see your job is blocked or has been pending for a long time type the
command:
checkjob -v (job_id)
or
qstat -f (job_id)
this will give you a very verbose output with the reasons why the job can not
currently start.
If you job is pending and you would like to see when it might start, or at
least when the scheduler thinks it might start, use the command
showstart (job_id)
If you decide for one reason or another that you want to stop your job
immediately then use the command:
mjobctl -c (job_id)
or
qdel (job_id)
|