Difference between revisions of "Using the batch system"

From ScientificComputing
Jump to: navigation, search
Line 18: Line 18:
<noinclude>==Resource requirements==</noinclude><includeonly>===Resource requirements===</includeonly>
<noinclude>==Resource requirements==</noinclude><includeonly>===Resource requirements===</includeonly>
<noinclude>==Parallel job submission==</noinclude><includeonly>===Parallel job submission===</includeonly>
<noinclude>==Job control/monitoring==</noinclude><includeonly>===Job control/monitoring===</includeonly>

Revision as of 06:27, 18 August 2016


On our HPC cluster, we use the IBM LSF (Load Sharing Facility) batch system. A basic knowledge of LSF is required if you would like to work on the HPC clusters. The present article will show you how to use LSF to execute simple batch jobs and give you an overview of some advanced features that can dramatically increase your productivity on a cluster.

Using a batch system has numerous advantages:

  • single system image — all computing resources in the cluster can be accessed from a single point
  • load balancing — the workload is automatically distributed across all available processors
  • exclusive use — many computations can be executed at the same time without affecting each other
  • prioritization — computing resources can be dedicated to specific applications or people
  • fair share — a fair allocation of those resources among all users is guaranteed

In fact, our HPC clusters contains so many processors (30,000) and are used by so many people (more than 2,000) that it would be impossible to use it efficiently without a batch system.

All computations on our HPC cluster must be submitted to the batch system. Please do not run any job interactively on the login nodes, except for testing or debugging purposes.

Basic job submission

Resource requirements

Parallel job submission

Job control/monitoring