Find topic
General webs
Web tools
Help
|
LONI Grid Computing Notes
To facilitate the submission and execution of compute jobs in this heterogeneous compute environment, SUN's Grid Engine (SGE) is used to virtualize the resources above into a compute service. A grid layer sits atop the compute resources and submits jobs to available resources according to user-defined criteria such as CPU type, processor count, etc.
Cranium Cluster
LONI's Cranium cluster is comprised of two types of compute nodes: Sun Microsystems V20z servers and Sun Microsystems X2200 M1 servers. [ More ]
Basic Grid Engine Commands and Info
In this section, you can find additional information regarding basic commands used on the grid as well as useful tips about jobs status and reasons for job failures. [ More ]
How to monitor the Cranium cluster?
To monitor the health status of the Cranium cluster, you have to be vpn-ed into the LONI private network. If you do, please visit this site.
|
Cranium Grid Technical Details
The Cranium cluster has a total of 1152 available processors and a total of 4048 defined slots to run neuro-imaging jobs. The cluster is comprised of two types of compute nodes, 296 Sun Microsystems V20z servers with dual 64-bit 2.4GHz AMD Opteron processors and 8Gb of memory, and 80 Sun Microsystems X2200 M1 servers with a dual 64-bit Quad-Core 2.2GHz AMD Opteron processor and 16GB of memory.
The cluster is a based Rocks Cluster 4.5 configuration using Sun Grid Engine 6.1u4 to schedule, distribute and administer the submitted jobs. All compute nodes and servers are running CentOS release 4.5 as operating system. The cluster is structured as follow:
- 2 SGE Master nodes in shadowed configuration (not accessible to the general public),
- 2 SGE Submit nodes,
- cerebro-lsh1.loni.ucla.edu
- cerebro-lsh2.loni.ucla.edu
- 1 Compiler node,
- cerebro-dev.loni.ucla.edu
- SGE Execution nodes,
- cerebro-1/8-101 through cerebro-1/8-137 (V20z servers)
- cerebro-23/24-101 through cerebro-23/24-137 (X2200 M1 servers)
- Mounted NFS filesystems
- All /ifs directories (e.g. /ifs/tmp, /ifs/four_d, etc.)
- /nethome
Submitting to Cranium Grid Engine
Summary: Compile code for GNU/Linux, kernel 2.6.9-55.ELsmp. Put executable and data files in /ifs directory. Run qsub on script containing job arguments.
The Sun Grid Engine (SGE) is a job management service intended to accept, schedule, dispatch, and manage remote execution of a large number of standalone, parallel or interactive user jobs. The current version for the Cerebro cluster is 6.1u4. Documentation can be found at http://gridengine.sunsource.net.
- Place all code and data files you wish to use into the appropriate /ifs NFS directory. By default, you can use /ifs/tmp. (NOTE: files older than 28 days are deleted off /ifs/tmp nightly)
- Compile code on cerebro-dev.loni.ucla.edu using GCC, PGI or Intel compiler
- GCC can be found in /usr/local/bin/gcc or /usr/bin/gcc4,
- Compiler tools for GCC v4.2 can be found in /usr/local/gcc-4.2.0_64bit
- PGI compilers can be found in /usr/local/pgi
- Intel compilers can be found in /usr/local/intel*
- Login to one of the submit nodes above via SSH. Note that an external user must
a) log into an SSH proxy first, either autarch.loni.ucla.edu or inire.loni.ucla.edu prior to establishing an SSH connection to a submit node or
b) VPN into LONI network. Instructions on how to establish a VPN connection can be found in http://www.loni.ucla.edu/twiki/bin/view/Infrastructure/GettingOnline
- Source the configuration script
- 'source /usr/sge/loni/common/settings.sh' if your shell is sh/bash
- 'source /usr/sge/loni/common/settings.csh' if your shell is tcsh/csh
- Wrap the executable (line 4) into a shell script containing Grid interpreter primitives (lines 2 & 3)
- Example shell script:
- ) #!/bin/sh
- ) #$ -S /bin/sh
- ) #$ -j y -o /tmp
- ) /ifs/tmp/myexecutable /ifs/tmp/inputfile /ifs/tmp/outputfile
-
Queues: containers for different categories of jobs. Queues provide the corresponding resources for concurrent execution of multiple jobs that belong to the same category. A single queue can group a large set of hosts and a particular host can belong to different queues. The following queues are defined for the Cranium cluster:
1) short.q: This queue uses the 100% of the resources of the Cranium Cluster. It uses two slots for each V20z servers in the cluster and eight slots for the X2200 servers with a total of 1152 slots available. There is a hard limit set at 2 CPU hours for submitted jobs for this queue and a resource quota of 75% per user; a user can use only the 75% of all the resources in this queue. This queue has the priority level set to 5.
Submitting jobs:
prompt> qsub -q short.q {name of the script}
Subordinated Queues:
- long.q with 75% of the available slots (172 out of 230),
- medium.q with 75% of the available slots (260 out of 346).
2) medium.q: 346 slots are available (~30% of the resources of the cluster). The medium.q queue uses only one slot per V20z server and four slots per X2200 server. There is a hard limit of 12 CPU hours for submitted jobs running in this queue and a resource quota of 75% per user for all resources in the queue. This queue has a priority level set to 10.
Submitting jobs:
prompt> qsub -q medium.q {name of the script}
Subordinated Queues:
- None
3) long.q: 230 slots are available (~20% of the resources of the cluster). The long.q queue uses only one slot per V20z server and four slots per X2200 server. There is no hard limit set for this queue. Jobs running in this queue can have unlimited CPU time and a resource quota of 60% per user. This queue has the lowest priority level in the Cranium cluster, set to 15.
Submitting jobs:
prompt> qsub -q long.q {name of the script}
Subordinates Queues:
- None
4) pipeline.q: This queue uses the 90% of the resources of the cluster. The pipeline.q uses five slots for each V20z computer node and twelve slots for each X2200 compute node in the cluster for a total of 2320 slots. There is no hard time limit set for this queue. No user is allowed to use this queue unless jobs are submitted from the Pipeline application. This queue has the highest priority level in the Cranium cluster, set to 0.
Submitting jobs:
Using Pipeline application
Subordinate Queues:
- long.q with 75% of the available slots (172 out of 230),
- medium.q with 75% of the available slots (260 out of 346).
- short.q with 90% of the available slots (1040 out of 1156).
-
Subordinate Queues: Ensures that the resources of other queues are utilized, when available, by 'overflowing' jobs in a said queue to a subordinate queue. In our case, we defined the subordinate queues in term of percentages of available resources. For example, for the short queue, 75% of the available resources of the medium as well as the long queues, both subordinate to the short queue, will be suspended and used by the short queue if this is 100% utilized and the two subordinate queues are available.
The Cranium cluster is using a utility which dynamically prioritizes jobs currently running on a cluster compute node based on the amount of CPU time the process has consumed. This effectively makes long running jobs, take a little bit longer, but short jobs will be able to get through in a reasonable amount of time. If there are no other jobs running, the long running jobs will receive normal priority again, and continue processing at full speed.
Useful Grid Engine commands:
- 'qsub' - Allows grid users to submit job to the cluster.
Options:
- -cwd: Run from current working directory,
- -o {output file}: where to send the standard output,
- -e {error file}: where to send the standard error,
- -N (job name}: Job name for reporting,
- -j y: To merge the output and error messages into a specific file
- -p {value}: Priority level of the submitted job(s).
- 'qstat' - Allows grid users to see the status of their submitted job.
Options:
- -f: Display full information of the job,
- -u {username}: Show only username's jobs.
- -j {job-ID} -explain E: Why is my job in Eqw state?
- 'qdel' - Allows grid users to delete jobs from the queue.
Options:
- -u {username}: Allows grid users to delete all their jobs from the queue.
- -f {job-ID}: Allows grid users to delete a job from the queue by specifying the SGE job-ID.
- 'qacct -o {username}' - Allows grid users to extract more specific, runtime status of jobs including error messages
- 'qalter' - Allows grid users to modify a pending batch job,
- 'qhold' - Allows grid users to hold back a pending submitted job,
- 'qhost' - Allows grid users to get information about execution hosts,
- 'qconf' - Allows grid users to get information about the cluster and the queues.
Jobs Status:
- 'qw' - Queued and waiting,
- 'w' - Job waiting,
- 's' - Job suspended,
- 't' - Job transferring and about to start,
- 'r' - Job running,
- 'h' - Job hold,
- 'R' - Job restarted,
- 'd' - Job has been marked for deletion,
- 'Eqw' - An error occurred with the job.
The state E(rror) appears for pending jobs that couldn't be started due to job properties. Why is my job in a 'Eqw' state? Use the commands:
qstat -j job_ID -explain E
or
qacct -j job_ID
Check My Job Status:
- qstat -s prs -u $USER --- check jobs that are pending, running, or suspended
- qstat -t -u $USER --- display the nodes where the job is running
- qstat -ext -s p -u $USER --- display extended information for my pending jobs
- qstat -s h --- jobs in hold status
- qstat -s r --- jobs in running status
- qstat -s r -u $USER --- jobs that are mine and running
- qstat -s s -u $USER --- jobs that are mine and suspended
Host State:
-
'au' - Host is in alarm and unreachable,
-
'u' - Host is unreachable. Usually SGE is down or the machine is down. Check this out.
-
'a' - Host is in alarm. It is normal on if the state of the node is full, it means, if on the node is using most of its resources.
-
'aS' - Host is in alarm and Suspended. If the node is using most of its resources, SGE suspends this node to take any other job unless resources are available.
-
'd' - Host is disabled,
-
'E' - ERROR. This requires the command `qmod -c` to clear the error state.
Common reasons for a job to fail
-
Submission script not formatted properly,
-
SGE cannot find the binary file specified in the job script,
-
Required input files are missing from the startup directory,
-
Environment variables are not set or are set incorrectly,
-
The program is not running out of time,
-
The program's ever starting - it's easy to make typing errors when naming your program,
-
Hardware failure.
To be considered when using the Cranium cluster
-
It is not permitted to run jobs locally on the submit nodes, cerebro-lsn1 and cerebro-lsn2, as well as the master nodes. Doing so will negatively impact the entire cluster. Jobs that inadvertently run on these machines will be subject to immediate termination to ensure proper functioning of the Cranium cluster.
-
Use the appropriate queue when submitting jobs. Remember, the short.q is only for jobs of 2 CPU hours (or less) in duration. If you submit a job longer than 2 CPU hours into this queue, the system automatically kills your job.
-
If your job hasn't started for days because someone has dozens of jobs ahead of yours, it's worth mailing the SysAdmin team,
-
Do not run jobs on the cluster except through SGE,
-
Do not hog large memory machines.
Feel free to contact the SysAdmin team to the email: clusteradmins@loni.ucla.edu if you have any question or concern.
|