Three methods for executing jobs:
### Job name (Or delete to use executable name) #PBS -N MyPBSJob ### Send mail with status (job 'b'egins, 'e'nds, 'a'borts) ### or delete this to only get email on job abort #PBS -m bae ### Number of nodes ### If you need both CPUs per job, use 'nodes=1:ppn=2' ### (ppn=processor per node) ### or delete to get one job per cpu, spread across nodes #PBS -l nodes=1 /path/to/your_program_here -with -args(See the qsub man page for other options)
qsub -N MyPBSJob -m bae -l nodes=1
/path/to/your_program_here -with -args
is an equivalent call.Advantages:Some measure of job control. Optional email
notification of errors and completion. Ability to migrate jobs
around nodes
Disadvantages:Have to write a tiny wrapper script to
start process
Make sure your command is available in your PATH env variable or gexec will give you a 'Bad filename' error.
Advantages: Quick to use, some measure of control as to
where jobs are executed
Disadvantages: No job control or checkpointing. No
queueing behaviour provided.
Vendor docs: http://ganglia.sourceforge.net/docs/
MPICH home page: http://www-unix.mcs.anl.gov/mpi/mpich/
Manuals from the MPICH website:
http://www-unix.mcs.anl.gov/mpi/mpich/docs.html
(We are using the ch_p4 model)
To run your MPI-aware test:
NOTE that this does not give you the benefits of OpenPBS. To
integrate the two, simply call mpirun as the program submitted to
qsub. You will most likely want to use 'nodes=#:ppn=#' to allocate
the proper number of CPUs.
screen is available on unsat. This provides the text-mode
equivalent of terminal services. That is, allows you to connect,
start some process that is tied to the display, and disconnect
without killing your processes. You can connect at a later time and
be right back where you were. Last updated May 23, 2005.Common Problems
Use 'qsig -s 0 jobid' to get the manager to notice the processes
are gone
Quick Start guide: