Difference between revisions of "Job submission"

From ScientificComputing
Jump to: navigation, search
 
(28 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
__NOTOC__
 
__NOTOC__
 +
<table style="width: 100%;">
 +
<tr valign=top>
 +
<td style="width: 30%; text-align:left">
 +
< [[Job management with LSF]]
 +
</td>
 +
<td style="width: 35%; text-align:center">
 +
[[Main_Page | Home]]
 +
</td>
 +
<td style="width: 35%; text-align:right">
 +
[[Parallel job submission | Submit a parallel job]] >
 +
</td>
 +
</tr>
 +
</table>
 +
 +
 +
The three ways to access computing resources are
 +
# [[Job_submission#Basic_job_submission|'''Submit a job through command line''']]
 +
# [[Job_submission#Job_script_and_#BSUB_pragmas|'''Submit a job through a job bash script using #BSUB pragmas''']]
 +
# [[Job_submission#Interactive_session_on_a_compute_node|'''Get an interactive session on a compute node''']]
 +
 
== Basic job submission ==
 
== Basic job submission ==
 
A basic BSUB job submission command consists of three parts:
 
A basic BSUB job submission command consists of three parts:
Line 6: Line 26:
 
| style="width: 50px; background: #BFE5D9; text-align: center;"| bsub  
 
| style="width: 50px; background: #BFE5D9; text-align: center;"| bsub  
 
| style="width: 100px; background: #B2D9EA;text-align: center; "| LSF options
 
| style="width: 100px; background: #B2D9EA;text-align: center; "| LSF options
| style="width: 50px; background: #FFBDAF; text-align: center"| job  
+
| style="width: 50px; background: #FFBDAF; text-align: center"| Job
 +
|}
 +
 
 +
where
 +
{| style="color: black;"
 +
|-
 +
| style="width: 50px; background: #BFE5D9; text-align: center;"| bsub ||  &nbsp; is the LSF submit command.
 +
|-
 +
| style="width: 100px; background: #B2D9EA;text-align: center; "| LSF options || &nbsp; are for requesting resources and defining job-related options.
 +
|-
 +
| style="width: 50px; background: #FFBDAF; text-align: center"| Job || &nbsp; is a computing job to be submitted.
 
|}
 
|}
  
# The BSUB executable command
 
# LSF options requesting resources and defining job-related options
 
# A job to be submitted
 
  
 
Here is an example:
 
Here is an example:
Line 17: Line 44:
 
|-
 
|-
 
| style="width: 50px; background: #BFE5D9; text-align: center;"| bsub  
 
| style="width: 50px; background: #BFE5D9; text-align: center;"| bsub  
| style="width: 250px; background: #B2D9EA;text-align: center; "| -n 1 -W 4:00 -R "rusage[mem=4096]"
+
| style="width: 350px; background: #B2D9EA;text-align: center; "| -n 1 -W 4:00 -R "rusage[mem=4096]"
| style="width: 150px; background: #FFBDAF; text-align: center"| "python myscript.py"  
+
| style="width: 200px; background: #FFBDAF; text-align: center"| "python myscript.py"  
 
|}
 
|}
  
Line 25: Line 52:
 
  Generic job.
 
  Generic job.
 
  Job <8146539> is submitted to queue <normal.4h>
 
  Job <8146539> is submitted to queue <normal.4h>
# Job type, e.g., Generic Job or MPI Job
+
The output includes
# Job ID, e.g., 8146539
+
# Job type, e.g., Generic Job, MPI Job or Abaqus Job
# The queue, e.g., normal.4h
+
# Unique JobID, e.g., 8146539
 +
# The queue, e.g., normal.4h, normal.24h, or normal.120h
 +
 
 +
'''Note''': The '''JobID''' is important for monitoring or reporting issues. It is also integrated in the name of the LSF output file. Please don't delete the LSF output file unless you are sure that the job was running fine.
  
=== Job ===
+
{|style="color: black;"
 +
|-
 +
| style="width: 150px; color: black;background: #FFBDAF; text-align: center"| '''Job'''
 +
|}
 
A job can be one of the following:
 
A job can be one of the following:
{| class="wikitable"
+
{| class="wikitable" | style="background:white;"
 
! Job || Command || Examples of job submission command
 
! Job || Command || Examples of job submission command
 
|-
 
|-
Line 51: Line 84:
 
|}
 
|}
  
=== LSF options ===
+
 
 +
{|style="color: black;"
 +
|-
 +
| style="width: 150px; color: black;background: #B2D9EA; text-align: center"| '''LSF options'''
 +
|}
 
==== Requesting resources ====
 
==== Requesting resources ====
{| class="wikitable"
+
{| class="wikitable" | style="background:white;"
! Resources || Format || Default values
+
! Resources || Format || Default values  
 
|-
 
|-
| Maximum run time || -W HH:MM || 04:00 (4 hours)
+
| Maximum run time || -W HH:MM || 04:00 (4h), max. 360h
 
|-
 
|-
 
| Number of processors || -n nprocs  || 1 processor
 
| Number of processors || -n nprocs  || 1 processor
Line 62: Line 99:
 
| Memory || -R "rusage[mem=2048]" || 1024 MB per core
 
| Memory || -R "rusage[mem=2048]" || 1024 MB per core
 
|-
 
|-
| Scratch space || -R "rusage[scratch=10000]"  
+
| Local scratch space || -R "rusage[scratch=10000]" || 0 MB
 
|}
 
|}
 +
 +
It is possible to combine memory and scratch requirements:
 +
 +
-R "rusage[mem=2048, scratch=10000]"
 +
 +
'''Note''': Unlike memory, the batch system '''does not''' reserve any disk space for this scratch directory by default. If your job is expected to write large amounts of temporary data (say, more than 250 MB) into $TMPDIR — or anywhere in the local /scratch file system — you '''must''' request enough scratch space
  
 
==== Other LSF options ====
 
==== Other LSF options ====
{| class="wikitable"
+
{| class="wikitable" | style="background:white;"
 
|-
 
|-
 
| -o outfile || append job’s standard output to outfile
 
| -o outfile || append job’s standard output to outfile
Line 87: Line 130:
  
 
== Job script and #BSUB pragmas ==
 
== Job script and #BSUB pragmas ==
Create a job script called job_script.bsub
+
Create a job script called ''job_script.bsub''
  
 
  #!/bin/bash
 
  #!/bin/bash
Line 98: Line 141:
 
  #BSUB -N
 
  #BSUB -N
 
   
 
   
  module load gcc/6.3.0 openmpi/3.0.2
+
source /cluster/apps/local/env2lmod.sh  # Switch to the new software stack
  cd /path/to/execution/folder
+
  module load gcc/6.3.0 openmpi/4.0.2     # Load modules
  mpirun myprogram arg1
+
  cd /path/to/execution/folder           # Change directory
 +
  mpirun myprogram arg1                   # Execute the program
  
 
Submit a job
 
Submit a job
  bsub < job_script.bsub
+
  $ bsub < job_script.bsub
 +
 
 +
The options specified on the command line take precedence over the options in the job script.
 +
 
 +
$ bsub -n 36 < job_script.bsub
  
 
== Interactive session on a compute node ==
 
== Interactive session on a compute node ==
To run a quick test or a benchmark, you can request an interactive session on a compute node by using the BSUB option -I, -Ip or -Is, for example:
+
To run a quick test or a benchmark, you can request an interactive session on a compute node by using the BSUB option  
 +
-I, -Ip or -Is
 +
 
 +
For example:
 
  [jarunanp@eu-login-38 ~]$ bsub -n 4 -W 01:00 -Is bash
 
  [jarunanp@eu-login-38 ~]$ bsub -n 4 -W 01:00 -Is bash
 
  Generic job.
 
  Generic job.
Line 115: Line 166:
  
 
== Further reading ==
 
== Further reading ==
 +
* [[Using the batch system|User guide: Using the batch system]]
 
* [[Job arrays]]
 
* [[Job arrays]]
 
* [[Job chaining]]
 
* [[Job chaining]]
* [[Using the batch system|The complete guide: Using the batch system]]
+
* [[X11 forwarding batch interactive jobs ]]
 
+
* [[Multiple shareholder groups]]
== Job array ==
+
* [[LSF_mini_reference|LSF mini reference]]
Multiple similar jobs can be submitted at once using a so-called “job array”
 
* All jobs in an array share the same JobID
 
* Use job index between brackets to distinguish between individual jobs in an array
 
* LSF stores job index and array size in environment variables
 
* Each job can have its own standard output
 
  
Submit N jobs at once
+
== Helper ==
bsub-J "array_name[1-N]" ./program
+
* [https://scicomp.ethz.ch/lsf_submission_line_advisor/ LSF Submission Line Advisor]
  
Monitor jobs
 
bjobs -J array_name          # all jobs in an array
 
bjobs -J jobID                # all jobs in an array
 
bjobs -J array_name[index]    # specific job in an array
 
bjobs -J jobID[index]        # specific job in an array
 
  
=== Examples ===
+
<table style="width: 100%;">
 
+
<tr valign=top>
[sfux@eu-login-03 ~] bsub -J "hello[1-8]"
+
<td style="width: 30%; text-align:left">
bsub> echo "Hello, I am job $LSB_JOBINDEX of $LSB_JOBINDEX_END"
+
< [[Job management with LSF]]
bsub> ctrl-D
+
</td>
Job array.
+
<td style="width: 35%; text-align:center">
Job <29976045> is submitted to queue <normal.4h>.
+
[[Main_Page | Home]]
 
+
</td>
[sfux@eu-login-03 ~]$ bjobs
+
<td style="width: 35%; text-align:right">
JOBID      USER  STAT  QUEUE      FROM_HOST  EXEC_HOST  JOB_NAME  SUBMIT_TIME
+
[[Parallel job submission | Submit a parallel job]] >
29976045  sfuxPEND  normal.4h  euler03                hello[1]  Oct 10 11:03
+
</td>
29976045  sfuxPEND  normal.4h  euler03                hello[2]  Oct 10 11:03
+
</tr>
29976045  sfuxPEND  normal.4h  euler03                hello[3]   Oct 10 11:03
+
</table>
29976045  sfuxPEND  normal.4h  euler03                hello[4]   Oct 10 11:03
 
29976045  sfuxPEND  normal.4h  euler03                hello[5]  Oct 10 11:03
 
29976045  sfuxPEND  normal.4h  euler03                hello[6]  Oct 10 11:03
 
29976045  sfuxPEND  normal.4h  euler03                hello[7]  Oct 10 11:03
 
29976045  sfuxPEND  normal.4h  euler03                hello[8]  Oct 10 11:03
 
 
 
[leonhard@euler03 ~]$ bjobs -J hello[6]
 
JOBID      USER  STAT  QUEUE      FROM_HOST  EXEC_HOST  JOB_NAME  SUBMIT_TIME
 
29976045  sfuxPEND  normal.4h  euler03                hello[6]  Oct 10 11:03
 
 
 
== Lightweight jobs ==
 
Light-weight jobs are jobs that do not consume a lot of CPU time, for example
 
* Master process in some type of parallel jobs
 
* File transfer program
 
* Interactive shell
 
 
 
=== Example ===
 
Submit a 15-minute interactive bash shell and logout (type “logout” or “exit”) when you’re done.
 
[sfux@eu-login-03 ~]$ bsub-W 15 -Is -R light /bin/bash
 
Generic job.
 
Job <27877012> is submitted to queue <light.5d>.
 
<<Waiting for dispatch ...>>
 
<<Starting on eu-c7-133-05>>
 
 
[sfux@eu-c7-133-05 ~]$ pwd/cluster/home/sfux
 
[sfux@eu-c7-133-05 ~]$ hostname
 
eu-c7-133-05
 
[sfux@eu-c7-133-05 ~]$ exit
 
exit
 
[sfux@eu-login-03 ~]$
 

Latest revision as of 13:06, 8 November 2021

< Job management with LSF

Home

Submit a parallel job >


The three ways to access computing resources are

  1. Submit a job through command line
  2. Submit a job through a job bash script using #BSUB pragmas
  3. Get an interactive session on a compute node

Basic job submission

A basic BSUB job submission command consists of three parts:

bsub LSF options Job

where

bsub   is the LSF submit command.
LSF options   are for requesting resources and defining job-related options.
Job   is a computing job to be submitted.


Here is an example:

bsub -n 1 -W 4:00 -R "rusage[mem=4096]" "python myscript.py"

When the job is submitted, LSF shows job's information:

$ bsub -n 1 -W 4:00 -R "rusage[mem=4096]" "python myscript.py"
Generic job.
Job <8146539> is submitted to queue <normal.4h>

The output includes

  1. Job type, e.g., Generic Job, MPI Job or Abaqus Job
  2. Unique JobID, e.g., 8146539
  3. The queue, e.g., normal.4h, normal.24h, or normal.120h

Note: The JobID is important for monitoring or reporting issues. It is also integrated in the name of the LSF output file. Please don't delete the LSF output file unless you are sure that the job was running fine.

Job

A job can be one of the following:

Job Command Examples of job submission command
a single Linux command cmd
a program with its path /path/to/myprogram bsub ./bin/hello
a command or program with its arguments cmd arg1 arg2 bsub echo hello
multiple commands "cmd1 ; cmd2" bsub "date; pwd; ls -l"
piped command "cmd1 | cmd2"
a command with I/O redirection, quote "cmd<in >out" bsub "du -sk /scratch > du.out"
a here document, passed via "<<" << EOF ... EOF
a shell script, passed via "<" < script bsub < hello.sh


LSF options

Requesting resources

Resources Format Default values
Maximum run time -W HH:MM 04:00 (4h), max. 360h
Number of processors -n nprocs 1 processor
Memory -R "rusage[mem=2048]" 1024 MB per core
Local scratch space -R "rusage[scratch=10000]" 0 MB

It is possible to combine memory and scratch requirements:

-R "rusage[mem=2048, scratch=10000]"

Note: Unlike memory, the batch system does not reserve any disk space for this scratch directory by default. If your job is expected to write large amounts of temporary data (say, more than 250 MB) into $TMPDIR — or anywhere in the local /scratch file system — you must request enough scratch space

Other LSF options

-o outfile append job’s standard output to outfile
-e errfile append job’s error messages to errfile
-R "rusage[...]" advanced resource requirement (memory,...)
-J jobname assign a jobname to the job
-w "depcond" wait until dependency condition is satisfied
-Is submit an interactive job with pseudo-terminal
-B /-N send an email when the job begins/ends
-u user@domain use this address instead of username@ethz.ch

LSF submission line advisor can assist your to find LSF options you need.

Job script and #BSUB pragmas

Create a job script called job_script.bsub

#!/bin/bash
#BSUB -n 24                     # 24 cores
#BSUB -W 8:00                   # 8-hour run-time
#BSUB -R "rusage[mem=4000]"     # 4000 MB per core
#BSUB -J analysis1
#BSUB -o analysis1.out
#BSUB -e analysis1.err
#BSUB -N

source /cluster/apps/local/env2lmod.sh  # Switch to the new software stack
module load gcc/6.3.0 openmpi/4.0.2     # Load modules
cd /path/to/execution/folder            # Change directory
mpirun myprogram arg1                   # Execute the program

Submit a job

$ bsub < job_script.bsub

The options specified on the command line take precedence over the options in the job script.

$ bsub -n 36 < job_script.bsub

Interactive session on a compute node

To run a quick test or a benchmark, you can request an interactive session on a compute node by using the BSUB option

-I, -Ip or -Is

For example:

[jarunanp@eu-login-38 ~]$ bsub -n 4 -W 01:00 -Is bash
Generic job.
Job <161197292> is submitted to queue <normal.4h>.
<<Waiting for dispatch ...>>
<<Starting on eu-ms-001-15>>
[jarunanp@eu-ms-001-15 ~]$

Further reading

Helper


< Job management with LSF

Home

Submit a parallel job >