Our system has C, C++, and Fortran compilers which support both serial and parallel jobs. Parallel jobs require the use of parallel function calls within the source code. There are two well-established protocols for which libraries of parallel functions are commonly available, MPI and PVM. We support two flavors of MPI (LAM and MPICH) and can also accomodate PVM (although MPI is the preferred protocol in our systems). If you are intending to run only serial jobs you can continue reading the instructions below. Likewise, if you are already familiar with parallel proramming and are ready to submit a parallel job, you can also start reading the instructions below. If, however, you are not familiar with parallel programming you should first have a look at our parallel programming tutorial page.
Job Preparation:
For Serial Jobs:
Use fcc <C_code.c> , FCC <C++_code.c> or frt <F77_code.f> to compile your code.
You can then use -Kfast
to optimise your fortran code. For some of the fortran 90/95 codes, you might need -X9. #!/bin/csh
#$ -l qty.eq.1
#$ -N Your_Job_Name
#$ -A Your_Account_Name
#$ -cwd
./a.out
echo "End of Job"
This will run the job a.out on a single node.
For MPI parallel jobs:
We have two different MPI implementations in our
cluster---MPICH and LAM.
The HPF users will have to use "gmdhpf" to convert his/her HPF code
to LAM-MPI executable file.
The compilation and submission of such programs
is outlined below:
For LAM users:
Use hcc <lam_c_code.c>
or hf77 <lam_f77_code.f> to compile your
LAM MPI code.
Create a dqs script file like following:
#!/bin/bash
#$ -cwd
#$ -l qty.eq.4
#$ -N Your_Job_Name
#$ -A Your_Account_Name
lamboot $HOSTS_FILE
lmpirun -c $NUM_HOSTS -c2c
./a.out
wipe $HOSTS_FILE
echo "End of Job"
This script will run a.out on 4 CPUs.
For MPICH users:
Use mpicc <mpi_c_code.c>
or mpif77 <mpi_f77_code.f> to compile your
MPI code.
Create a dqs script file like following:
#!/bin/bash
#$ -cwd
#$ -l qty.eq.4
#$ -N Your_Job_Name
#$ -A Your_Account_Name
mpirun -np $NUM_HOSTS -machinefile $HOSTS_FILE ./a.out
echo "End of Job"
This script will run a.out on 4 CPUs.
For HPF users:
Use gmdhpf -v -o <program>
<program.hpf> to compile program.hpf.
This command will generate a
LAM-executable file named "program".
After this conversion, use LAM script to run the
program.
Job Submission:
After a submission script has been created, the "qsub32"
command is used
to submit the job script to the queuing system,
e.g.:
qsub32 script_name
This will submit your job to
the queuing system for validation and scheduling.
Job Monitoring:
Users can use "qstat32"
to check the status of their jobs.
e.g.:
qstat32
This command will produce an output similar to:
username scr 9 0 : 1 r RUNNING 08/22/96 08:00:36
^
^
^ ^^^^^^^^
^
^
^^^^^^^^^^^^^^^^^
|
|
|
|
\
|
|
|
|
|
|
\
|
|
|
|
|
|
\
|
|
owner
job
job scheduling
\ |
submission
name name
number parameters
\ |
time
job status
indicators
Possible values for job status:
r RUNNING
s SUSPENDED
q QUEUED
w WAITING
Job Removal:
Users can use the "qdel32" command to delete a specific job from the queuing system, e.g.:
qdel32 123
This command will delete job
number 123 from the queuing system.
Refer to the qstat32 output
to find the job number for your job.
Please refer to the DQS, MPICH, and LAM user's guide for more information.