Difference between revisions of "Job monitoring"

From ScientificComputing
Jump to: navigation, search
Line 1: Line 1:
 
__NOTOC__
 
__NOTOC__
= Check job status =
 
 
== bjobs ==
 
== bjobs ==
 
After submitting a job, the job will wait in a queue to be run on a compute node and has the PENDING status.
 
After submitting a job, the job will wait in a queue to be run on a compute node and has the PENDING status.
Line 35: Line 34:
 
| job-ID(s) || list of job-IDs (this must be the last option)
 
| job-ID(s) || list of job-IDs (this must be the last option)
 
|}
 
|}
 
  
 
== bbjobs ==
 
== bbjobs ==
Line 97: Line 95:
 
</table>
 
</table>
  
 +
== bkill ==
 +
Use bkill to terminate a submitted job
 +
$ bkill 161182774
 +
Job <161182774> is being terminated
  
 +
{| class="wikitable"
 +
! bjobs options || Description
 +
|-
 +
| job-ID || kill job-ID
 +
|-
 +
| 0 || kill all jobs (yours only)
 +
|-
 +
| -J jobname || kill most recent job called jobname
 +
|-
 +
| -J jobname 0 || kill all jobs called jobname
 +
|-
 +
| -q queue || kill most recent job in queue
 +
|-
 +
| -q queue 0 || kill all jobs in queue
 +
|}
 +
 +
== Job control commands ==
 
{| class="wikitable"
 
{| class="wikitable"
 
! Job control commands || Description
 
! Job control commands || Description
Line 116: Line 135:
 
|-
 
|-
 
| bkill || kill a job
 
| bkill || kill a job
|}
 
 
 
 
== bkill ==
 
{| class="wikitable"
 
! bjobs options || Description
 
|-
 
| job-ID || kill job-ID
 
|-
 
| 0 || kill all jobs (yours only)
 
|-
 
| -J jobname || kill most recent job called jobname
 
|-
 
| -J jobname 0 || kill all jobs called jobname
 
|-
 
| -q queue || kill most recent job in queue
 
|-
 
| -q queue 0 || kill all jobs in queue
 
 
|}
 
|}

Revision as of 05:26, 22 January 2021

bjobs

After submitting a job, the job will wait in a queue to be run on a compute node and has the PENDING status.

$ bjobs
JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
161182423  jarunan PEND  normal.4h  eu-login-43             *cho hello Jan 22 06:01

When the job is running on a compute node, it has the RUNNING status.

$ bjobs
JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
161182423  jarunan RUN   normal.4h  eu-login-43 eu-ms-005-0 *cho hello Jan 22 06:01
bjobs options Description
(no option) list all your jobs in all queues
-p list only pending(waiting) jobs and indicate why they are pending
-r list only running jobs
-d list only done job (finished within the last hour)
-l display status in long format
-w display status in wide format
-o "format" use custom output format (see LSF documentation for details)
-J jobname show only job(s) called jobname
-q queue show only jobs in a specific queue
job-ID(s) list of job-IDs (this must be the last option)

bbjobs

bbjobs displays more human-friendly information than bjobs. Here are examples in PENDING and RUNNING status.

PENDING status

$ bbjobs
Job information
  Job ID                       : 161182479
  Status                       : PENDING
  User                         : jarunanp
  Queue                        : normal.4h
  Command                      : sleep 10; echo hello
  Working directory            : $HOME/-
Requested resources
  Requested cores              : 1
  Requested runtime            : 4 h 0 min
  Requested memory             : 1024 MB per core
  Requested scratch            : not specified
  Dependency                   : -
Job history
  Submitted at                 : 06:03 2021-01-22
  Queue wait time              : 18 sec

RUNNING status

$ bbjobs
Job information
  Job ID                        : 161182479
  Status                        : RUNNING
  Running on node               : eu-ms-025-27 
  User                          : jarunanp
  Queue                         : normal.4h
  Command                       : sleep 10; echo hello
  Working directory             : $HOME/-
Requested resources
  Requested cores               : 1
  Requested runtime             : 4 h 0 min
  Requested memory              : 1024 MB per core
  Requested scratch             : not specified
  Dependency                    : -
Job history
  Submitted at                  : 06:03 2021-01-22
  Started at                    : 06:03 2021-01-22
  Queue wait time               : 20 sec
Resource usage
  Updated at                    : 06:04 2021-01-22
  Wall-clock                    : 4 sec
  Tasks                         : 4
  Total CPU time                : 0 sec
  CPU utilization               : 0.0 %
  Sys/Kernel time               : 0.0 %
  Total resident Memory         : 2 MB
  Resident memory utilization   : 0.2 % 

bkill

Use bkill to terminate a submitted job

$ bkill 161182774
Job <161182774> is being terminated
bjobs options Description
job-ID kill job-ID
0 kill all jobs (yours only)
-J jobname kill most recent job called jobname
-J jobname 0 kill all jobs called jobname
-q queue kill most recent job in queue
-q queue 0 kill all jobs in queue

Job control commands

Job control commands Description
busers user limits, number of pending and running jobs
bqueues queues status (open/closed; active/inactive)
bjobs more or less detailed information about pending and running jobs, and recently finished jobs
bbjobs better bjobs
bhist info about jobs finished in the last hours/days
bpeek display the standard output of a given joblsf_loadshow the CPU load of all nodes used by a job
bjob_connect login to a node where your job is running
bkill kill a job