Find topic
General webs
Web tools
Help
|
LONI Grid Computing Notes
To facilitate the submission and execution of compute jobs in this heterogeneous compute environment, SUN's Grid Engine (SGE) is used to virtualize the resources above into a compute service. A grid layer sits atop the compute resources and submits jobs to available resources according to user-defined criteria such as CPU type, processor count, etc.
Cranium Cluster
LONI's Cranium cluster is comprised of two types of compute nodes: Sun Microsystems V20z servers and Sun Microsystems X2200 M1 servers. [ More ]
Basic Grid Engine Commands and Info
In this section, you can find additional information regarding basic commands used on the grid as well as useful tips about jobs status and reasons for job failures. [ More ]
How to monitor the Cranium cluster?
To monitor the health status of the Cranium cluster, you have to be VPN-ed into the LONI private network. Once logged in, please visit this site.
|
Cranium Grid Technical Details
The Cranium cluster has a total of 1152 available processors, with a total of 1152 defined slots to run compute jobs. Note, however, this number may be in flux due to hardware maintenance or brief periods of stress-testing either nodes or new Pipeline releases (during which time the slots will seemingly "disappear" into a private development cluster). The cluster is comprised of two types of compute nodes, 296 Sun Microsystems V20z servers with dual 64-bit 2.4GHz AMD Opteron processors and 8Gb of memory, and 80 Sun Microsystems X2200 M1 servers with a dual 64-bit Quad-Core 2.2GHz AMD Opteron processor and 16GB of memory.
The cluster is configured using Rocks Cluster 4.5 and Sun Grid Engine 6.1u4 to schedule, distribute and administer the submitted jobs. All compute nodes and servers are running CentOS release 4.5 as the operating system. The cluster is structured as follows:
- 2 SGE Master nodes in fault-tolerant shadowed configuration (not accessible to the general public)
- 1 SGE Submit node
- qsub.loni.ucla.edu (gui.loni.ucla.edu)
- 1 Pipeline Server node
- 1 Compiler node
- cerebro-dev.loni.ucla.edu
- cerebro-rcc.loni.ucla.edu is our legacy development node, for transitioning any incompatible code to the new grid. Do not use this machine for any current development/compilation, as it is considered end of life and the /usr/local mount is a static snapshot of its state when cerebro-dev was implemented
- SGE Execution nodes
- cerebro-1/8-101 through cerebro-1/8-137 (V20z servers)
- cerebro-23/24-101 through cerebro-23/24-137 (X2200 M2 servers)
- Mounted NFS filesystems
- All /ifs directories (e.g. /ifs/tmp, /ifs/four_d, etc.)
- /ifshome for user home directories
- /usr/local for all officially supported binaries and scripts
Submitting to Cranium Grid Engine
Summary: Compile code for GNU/Linux, kernel 2.6.9-55.ELsmp. Put executable and data files in /ifs directory. Run qsub on script containing job arguments.
The Sun Grid Engine (SGE) is a job management service intended to accept, schedule, dispatch, and manage remote execution of a large number of standalone, parallel or interactive user jobs. The current version for the Cerebro cluster is 6.1u4. Documentation can be found at http://gridengine.sunsource.net.
- Place all code and data files you wish to use into the appropriate /ifs NFS directory. By default, you can use /ifs/tmp. (NOTE: files older than 28 days are deleted off /ifs/tmp nightly)
- Compile code on cerebro-rcc.loni.ucla.edu using GCC, PGI or Intel compiler
- GCC can be found in /usr/local/bin/gcc or /usr/bin/gcc4,
- Compiler tools for GCC v4.2 can be found in /usr/local/gcc-4.2.0_64bit
- PGI compilers can be found in /usr/local/pgi
- Intel compilers can be found in /usr/local/intel*
- Login to qsub.loni.ucla.edu via SSH. Note that an external user must
a) Log into an SSH proxy such as ssh.loni.ucla.edu first, prior to establishing an SSH connection to a submit node or
b) VPN into LONI network. Instructions on how to establish a VPN connection can be found in http://www.loni.ucla.edu/twiki/bin/view/Infrastructure/GettingOnline
- Source the configuration script
- 'source /usr/sge/loni/common/settings.sh' if your shell is sh/bash
- 'source /usr/sge/loni/common/settings.csh' if your shell is tcsh/csh
- Wrap the executable (line 4) into a shell script containing Grid interpreter primitives (lines 2, 3, and 4)
- Example shell script:
- ) #!/bin/sh
- ) #$ -S /bin/sh
- ) #$ -j y -o /tmp
- ) #$ -l h_vmem=3G
- ) /ifs/tmp/myexecutable /ifs/tmp/inputfile /ifs/tmp/outputfile
- If your job requires more than 2G of memory (up to 8G), specify the amount using "-l h_vmem=[memory]" -- units of K(ilobytes), M(egabytes) or G(igabytes) as in line 3 of the above script. If your job uses 2G or less, you need not specify the h_vmem resource. Jobs exceeding their expected (or default, if unspecified) memory usage, will be terminated immediately. This ensures that we track memory usage of every job carefully, mitigating the likelihood that one process takes down a compute node running multiple jobs from multiple users.
-
Queues: containers for different categories of jobs. Queues provide the corresponding resources for concurrent execution of multiple jobs that belong to the same category. A single queue can group a large set of hosts and a particular host can belong to different queues. The following queues are defined for the Cranium cluster:
1) short.q: There is a 2-CPU-hour hard limit set for this queue. If the job exceeds this time limit, it will be automatically terminated. Any one user can only user 75% of all the slots (CPUs) in this queue. There are approximately 160 slots alloted to short.q.
Submitting jobs:
prompt> qsub -q short.q {name of the script}
2) medium.q: There is a 12-CPU-hour hard limit set for the medium.q. As with the short.q above, if the job exceeds the time limit, it will be automatically terminated. Any one user can only user 75% of its slots (CPUs). There are approximately 104 slots alloted to the queue.
Submitting jobs:
prompt> qsub -q medium.q {name of the script}
3) long.q: There is NO CPU time limit set for the long.q. Subsequently, it has the least number of slots, with 64 CPUs available. Any one user can only user 75% of all the CPUs in this queue.
Submitting jobs:
prompt> qsub -q long.q {name of the script}
4) pipeline.q: This queue is used by the Pipeline server ONLY. There is NO CPU time limit set for this queue.
Submitting jobs:
Using the Pipeline Execution Environment application found here.
The Cranium cluster does not oversubscribe its CPUs; there is a 1-to-1 relationship between a compute job and a CPU. Experience showed that CPU oversubscription, where a single CPU is shared by multiple jobs, caused unpredictability and instability. By default, each CPU (thus, each process) gets 2GB of memory.
Useful Grid Engine commands:
- 'qsub' - Allows grid users to submit job to the cluster.
Options:
- -cwd: Run from current working directory,
- -o {output file}: where to send the standard output,
- -e {error file}: where to send the standard error,
- -N (job name}: Job name for reporting,
- -j y: To merge the output and error messages into a specific file
- -p {value}: Priority level of the submitted job(s).
- -l h_vmem={value}: Memory limit of the job (if unspecified, {value}=2G by default); units of K(ilobytes), M(egabytes) or G(igabytes)
- 'qstat' - Allows grid users to see the status of their submitted job.
Options:
- -f: Display full information of the job,
- -u {username}: Show only username's jobs.
- -j {job-ID} -explain E: Why is my job in Eqw state?
- 'qdel' - Allows grid users to delete jobs from the queue.
Options:
- -u {username}: Allows grid users to delete all their jobs from the queue.
- -f {job-ID}: Allows grid users to delete a job from the queue by specifying the SGE job-ID.
- 'qacct -o {username}' - Allows grid users to extract more specific, runtime status of jobs including error messages
- 'qalter' - Allows grid users to modify a pending batch job,
- 'qhold' - Allows grid users to hold back a pending submitted job,
- 'qhost' - Allows grid users to get information about execution hosts,
- 'qconf' - Allows grid users to get information about the cluster and the queues.
Jobs Status:
- 'qw' - Queued and waiting,
- 'w' - Job waiting,
- 's' - Job suspended,
- 't' - Job transferring and about to start,
- 'r' - Job running,
- 'h' - Job hold,
- 'R' - Job restarted,
- 'd' - Job has been marked for deletion,
- 'Eqw' - An error occurred with the job.
The state E(rror) appears for pending jobs that couldn't be started due to job properties. Why is my job in a 'Eqw' state? Use the commands:
qstat -j job_ID -explain E
or
qacct -j job_ID
Check My Job Status:
- qstat -s prs -u $USER --- check jobs that are pending, running, or suspended
- qstat -t -u $USER --- display the nodes where the job is running
- qstat -ext -s p -u $USER --- display extended information for my pending jobs
- qstat -s h --- jobs in hold status
- qstat -s r --- jobs in running status
- qstat -s r -u $USER --- jobs that are mine and running
- qstat -s s -u $USER --- jobs that are mine and suspended
Host State:
-
'au' - Host is in alarm and unreachable,
-
'u' - Host is unreachable. Usually SGE is down or the machine is down. Check this out.
-
'a' - Host is in alarm. It is normal on if the state of the node is full, it means, if on the node is using most of its resources.
-
'aS' - Host is in alarm and Suspended. If the node is using most of its resources, SGE suspends this node to take any other job unless resources are available.
-
'd' - Host is disabled,
-
'E' - ERROR. This requires the command `qmod -c` to clear the error state.
Common reasons for a job to fail
-
Submission script not formatted properly,
-
SGE cannot find the binary file specified in the job script,
-
Required input or output files are missing from the startup directory,
-
Environment variables are not set or are set incorrectly,
-
Your memory request was not large enough,
-
An executable that uses threads MUST make an h_stack request (e.g. Matlab applications using the boost library). "-l h_stack=64M" is typically safe, but 128M may be necessary--attempt with 64M first.
-
The program is not running out of time,
-
The program's ever starting - it's easy to make typing errors when naming your program,
-
Hardware failure.
To be considered when using the Cranium cluster
-
It is not permitted to run jobs locally on the submit nodes, cerebro-lsh4 (qsub/gui) and cerebro-lsn2. Doing so will negatively impact the entire cluster. Jobs that inadvertently run on these machines will be subject to immediate termination without warning to ensure proper functioning of the Cranium cluster.
-
Use the appropriate queue when submitting jobs. Remember, the short.q is only for jobs of 2 CPU hours (or less) in duration. If you submit a job longer than 2 CPU hours into this queue, the system automatically terminates your job.
-
If your job hasn't started for an unreasonable amount of time because someone has dozens of jobs ahead of yours (note that when the cluster is at 100% capacity, a wait time of a couple hours is not always unreasonable), please e-mail the SysAdmin team via support@loni.ucla.edu
-
Do not run jobs on the cluster except through SGE.
-
Do not hog large memory machines by blindly requesting h_vmem=8G for all jobs
-
Conversely, selecting the smallest h_vmem that will allow your job to run will likely get your job out of the queue faster. To determine this number, run an example job, note the SGE job ID, and once finished, run qacct -j $SGE_job_id | grep maxvmem. Add a fudge factor to that value, and all future submissions of analogous jobs should run without any issue of termination.
Feel free to contact the SysAdmin team to the email: support@loni.ucla.edu if you have any question or concern.
|