Difference between revisions of "Job monitoring"

From ScientificComputing
Jump to: navigation, search
Line 43: Line 43:
 
  $ bbjobs
 
  $ bbjobs
 
  Job information
 
  Job information
   Job ID                       : 161182479
+
   Job ID                   : 161182479
   Status                       : PENDING
+
   Status                   : PENDING
   User                         : jarunanp
+
   User                     : jarunanp
   Queue                       : normal.4h
+
   Queue                     : normal.4h
   Command                     : sleep 10; echo hello
+
   Command                   : sleep 10; echo hello
   Working directory           : $HOME/-
+
   Working directory         : $HOME/-
 
  Requested resources
 
  Requested resources
   Requested cores             : 1
+
   Requested cores           : 1
   Requested runtime           : 4 h 0 min
+
   Requested runtime         : 4 h 0 min
   Requested memory             : 1024 MB per core
+
   Requested memory         : 1024 MB per core
   Requested scratch           : not specified
+
   Requested scratch         : not specified
   Dependency                   : -
+
   Dependency               : -
 
  Job history
 
  Job history
   Submitted at                 : 06:03 2021-01-22
+
   Submitted at             : 06:03 2021-01-22
   Queue wait time             : 18 sec
+
   Queue wait time           : 18 sec
 
</td>
 
</td>
 
<td style="width: 3%; background: white;">
 
<td style="width: 3%; background: white;">
 
</td>
 
</td>
 
<td style="width: 50%; background: white;">
 
<td style="width: 50%; background: white;">
 +
 
==== RUNNING status ====
 
==== RUNNING status ====
 
  $ bbjobs
 
  $ bbjobs

Revision as of 07:29, 22 January 2021

bjobs

After submitting a job, the job will wait in a queue to be run on a compute node and has the PENDING status.

$ bjobs
JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
161182423  jarunan PEND  normal.4h  eu-login-43             *cho hello Jan 22 06:01

When the job is running on a compute node, it has the RUNNING status.

$ bjobs
JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
161182423  jarunan RUN   normal.4h  eu-login-43 eu-ms-005-0 *cho hello Jan 22 06:01
bjobs options Description
(no option) list all your jobs in all queues
-p list only pending(waiting) jobs and indicate why they are pending
-r list only running jobs
-d list only done job (finished within the last hour)
-l display status in long format
-w display status in wide format
-o "format" use custom output format (see LSF documentation for details)
-J jobname show only job(s) called jobname
-q queue show only jobs in a specific queue
job-ID(s) list of job-IDs (this must be the last option)

bbjobs

bbjobs displays more human-friendly information than bjobs. Here are examples in PENDING and RUNNING status.

PENDING status

$ bbjobs
Job information
  Job ID                    : 161182479
  Status                    : PENDING
  User                      : jarunanp
  Queue                     : normal.4h
  Command                   : sleep 10; echo hello
  Working directory         : $HOME/-
Requested resources
  Requested cores           : 1
  Requested runtime         : 4 h 0 min
  Requested memory          : 1024 MB per core
  Requested scratch         : not specified
  Dependency                : -
Job history
  Submitted at              : 06:03 2021-01-22
  Queue wait time           : 18 sec

RUNNING status

$ bbjobs
Job information
  Job ID                        : 161182479
  Status                        : RUNNING
  Running on node               : eu-ms-025-27 
  User                          : jarunanp
  Queue                         : normal.4h
  Command                       : sleep 10; echo hello
  Working directory             : $HOME/-
Requested resources
  Requested cores               : 1
  Requested runtime             : 4 h 0 min
  Requested memory              : 1024 MB per core
  Requested scratch             : not specified
  Dependency                    : -
Job history
  Submitted at                  : 06:03 2021-01-22
  Started at                    : 06:03 2021-01-22
  Queue wait time               : 20 sec
Resource usage
  Updated at                    : 06:04 2021-01-22
  Wall-clock                    : 4 sec
  Tasks                         : 4
  Total CPU time                : 0 sec
  CPU utilization               : 0.0 %
  Sys/Kernel time               : 0.0 %
  Total resident Memory         : 2 MB
  Resident memory utilization   : 0.2 % 

bkill

Use bkill to terminate a submitted job

$ bkill 161182774
Job <161182774> is being terminated
bkill options Description
job-ID kill job-ID
0 kill all jobs (yours only)
-J jobname kill most recent job called jobname
-J jobname 0 kill all jobs called jobname
-q queue kill most recent job in queue
-q queue 0 kill all jobs in queue

Job control commands

Job control commands Description
busers user limits, number of pending and running jobs
bqueues queues status (open/closed; active/inactive)
bjobs more or less detailed information about pending and running jobs, and recently finished jobs
bbjobs better bjobs
bhist info about jobs finished in the last hours/days
bpeek display the standard output of a given joblsf_loadshow the CPU load of all nodes used by a job
bjob_connect login to a node where your job is running
bkill kill a job