Difference between revisions of "Job monitoring"
From ScientificComputing
Line 1: | Line 1: | ||
__NOTOC__ | __NOTOC__ | ||
− | |||
== bjobs == | == bjobs == | ||
After submitting a job, the job will wait in a queue to be run on a compute node and has the PENDING status. | After submitting a job, the job will wait in a queue to be run on a compute node and has the PENDING status. | ||
Line 35: | Line 34: | ||
| job-ID(s) || list of job-IDs (this must be the last option) | | job-ID(s) || list of job-IDs (this must be the last option) | ||
|} | |} | ||
− | |||
== bbjobs == | == bbjobs == | ||
Line 97: | Line 95: | ||
</table> | </table> | ||
+ | == bkill == | ||
+ | Use bkill to terminate a submitted job | ||
+ | $ bkill 161182774 | ||
+ | Job <161182774> is being terminated | ||
+ | {| class="wikitable" | ||
+ | ! bjobs options || Description | ||
+ | |- | ||
+ | | job-ID || kill job-ID | ||
+ | |- | ||
+ | | 0 || kill all jobs (yours only) | ||
+ | |- | ||
+ | | -J jobname || kill most recent job called jobname | ||
+ | |- | ||
+ | | -J jobname 0 || kill all jobs called jobname | ||
+ | |- | ||
+ | | -q queue || kill most recent job in queue | ||
+ | |- | ||
+ | | -q queue 0 || kill all jobs in queue | ||
+ | |} | ||
+ | |||
+ | == Job control commands == | ||
{| class="wikitable" | {| class="wikitable" | ||
! Job control commands || Description | ! Job control commands || Description | ||
Line 116: | Line 135: | ||
|- | |- | ||
| bkill || kill a job | | bkill || kill a job | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|} | |} |
Revision as of 05:26, 22 January 2021
bjobs
After submitting a job, the job will wait in a queue to be run on a compute node and has the PENDING status.
$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 161182423 jarunan PEND normal.4h eu-login-43 *cho hello Jan 22 06:01
When the job is running on a compute node, it has the RUNNING status.
$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 161182423 jarunan RUN normal.4h eu-login-43 eu-ms-005-0 *cho hello Jan 22 06:01
bjobs options | Description |
---|---|
(no option) | list all your jobs in all queues |
-p | list only pending(waiting) jobs and indicate why they are pending |
-r | list only running jobs |
-d | list only done job (finished within the last hour) |
-l | display status in long format |
-w | display status in wide format |
-o "format" | use custom output format (see LSF documentation for details) |
-J jobname | show only job(s) called jobname |
-q queue | show only jobs in a specific queue |
job-ID(s) | list of job-IDs (this must be the last option) |
bbjobs
bbjobs displays more human-friendly information than bjobs. Here are examples in PENDING and RUNNING status.
PENDING status$ bbjobs Job information Job ID : 161182479 Status : PENDING User : jarunanp Queue : normal.4h Command : sleep 10; echo hello Working directory : $HOME/- Requested resources Requested cores : 1 Requested runtime : 4 h 0 min Requested memory : 1024 MB per core Requested scratch : not specified Dependency : - Job history Submitted at : 06:03 2021-01-22 Queue wait time : 18 sec |
RUNNING status$ bbjobs Job information Job ID : 161182479 Status : RUNNING Running on node : eu-ms-025-27 User : jarunanp Queue : normal.4h Command : sleep 10; echo hello Working directory : $HOME/- Requested resources Requested cores : 1 Requested runtime : 4 h 0 min Requested memory : 1024 MB per core Requested scratch : not specified Dependency : - Job history Submitted at : 06:03 2021-01-22 Started at : 06:03 2021-01-22 Queue wait time : 20 sec Resource usage Updated at : 06:04 2021-01-22 Wall-clock : 4 sec Tasks : 4 Total CPU time : 0 sec CPU utilization : 0.0 % Sys/Kernel time : 0.0 % Total resident Memory : 2 MB Resident memory utilization : 0.2 % |
bkill
Use bkill to terminate a submitted job
$ bkill 161182774 Job <161182774> is being terminated
bjobs options | Description |
---|---|
job-ID | kill job-ID |
0 | kill all jobs (yours only) |
-J jobname | kill most recent job called jobname |
-J jobname 0 | kill all jobs called jobname |
-q queue | kill most recent job in queue |
-q queue 0 | kill all jobs in queue |
Job control commands
Job control commands | Description |
---|---|
busers | user limits, number of pending and running jobs |
bqueues | queues status (open/closed; active/inactive) |
bjobs | more or less detailed information about pending and running jobs, and recently finished jobs |
bbjobs | better bjobs |
bhist | info about jobs finished in the last hours/days |
bpeek | display the standard output of a given joblsf_loadshow the CPU load of all nodes used by a job |
bjob_connect | login to a node where your job is running |
bkill | kill a job |