LSF mini reference

From ScientificComputing
Jump to: navigation, search

Below is an overview of the most important LSF commands and their frequently used options.

bsub

Submit a job to the batch system.

Option Description
-W HH:MM Wall-clock time required by the job. Can also be expressed in minutes.
-n N Number of cores required by the job.
-R "rusage[mem=X]" Amount of memory (in MB per core) required by the job.
-o outfile Append the job's output (stdout) to outfile. The keyword "%J" is interpreted as the job's numerical ID.
-e errfile Append the job's error (stderr) to errfile. By default, stderr is merged with stdout.
-oo outfile Write the job's output (stdout) to outfile, overwriting it if it already exists.
-eo errfile Write the job's error (stderr) to errfile, overwriting it if it already exists.
-I / -Ip / -Is Run the job interactively. Input/output are redirected from/to your terminal. Use -Ip to create a pseudo-terminal, and -Is to enable shell support.
-J jobname Assign a (non necessarily unique) name to the job. Used to define job chains. To avoid confusion with numerical job IDs, jobname should contain at least one letter.
-w "depcond" Wait (do not start the job) until the specified dependency condition is satisfied. For example: "done(jobID)", "ended(jobname)". Quotes are recommended.
-B / -N Send an e-mail to the job's owner (username@ethz.ch) when the job begins / ends.
-u user Send e-mail to user instead of the job's owner. The recipient's address must be inside the ETH domain. The firewall blocks e-mail sent to other addresses. (Note: This switch alone does not imply -B nor -N.)
-r Indicate that the job is re-runnable. If the compute node where your job is running crashes, LSF will automatically re-run it from the beginning on a different node.
-G share_name Use the share_name shareholder share to run this job

bjobs

Monitor one or more batch jobs.

Option Description
(no option) List all your jobs — running, pending or suspended.
-l / -w Long / wide format (mutually exclusive).
-r Show only running jobs.
-p Show only pending jobs, and the reason why they are pending.
-d Show only jobs that ended recently (done).
-x Show jobs that have triggered an exception (e.g. "idle").
-q queue Show jobs in the specified batch queue.
-u user Show jobs submitted by another user (or "all").
-J jobname Show information about the specified job(s).
jobID(s) Show information about the specified job(s). This must be the last argument.

bkill

Kill (or signal) one or more jobs.

Option Description
jobID(s) Kill the specified job(s).
0 Kill all jobs submitted by you (that's a zero, not the letter "O").
-J jobname Kill the last job submitted under that name.
-J jobname 0 Kill all jobs submitted under that name. Used to kill a series of jobs at once.
-s signal Send a signal (e.g. URG, USR2) to the job. Your job must be designed to handle that signal, for example to save data. Sending the wrong signal may be fatal.

bmod

Modify a job's parameters.

Option Description
-w "depcond" Modify a job's dependency condition. Use -wn to remove it.
jobID ID of the job to be modified.

bqueues

Show information about one or more batch queues.

Option Description
(no option) List all queues (name, priority, status, limits, number of pending/running jobs).
-w The same, with slightly more details.
-l queue Show a long description of one or more queues (all by default).

lsload

Show processor load.

Option Description
host(s) Show load information for the specified hosts (all by default). Can be used in conjunction with bjobs to see how much memory a job is using, for example.