Difference between revisions of "Too much space is used by your output files"
(→How to solve this problem) |
|||
Line 20: | Line 20: | ||
bsub [LSF options] "program [arguments] > program.error" | bsub [LSF options] "program [arguments] > program.error" | ||
+ | |||
+ | (Note that the quote above are necessary; otherwise the redirection | ||
+ | operator would apply to bsub instead of the program.) | ||
{| class="wikitable | {| class="wikitable | ||
Line 34: | Line 37: | ||
| redirect stdout and stderr | | redirect stdout and stderr | ||
|} | |} | ||
− | |||
− | |||
− | |||
In case of a job array, all individual jobs will write their output into the same file, which may not be desirable. This can be avoided using the run-time <tt>$LSB_JOBINDEX</tt> variable, e.g.: | In case of a job array, all individual jobs will write their output into the same file, which may not be desirable. This can be avoided using the run-time <tt>$LSB_JOBINDEX</tt> variable, e.g.: |
Revision as of 10:09, 23 February 2017
Introduction
On our clusters, data written to stdout/stderr are buffered in a shadow file system with a small quota of 2 GB per user. When this quota is reached, all jobs would crash. We have therefore recently modified the batch system to detect this condition and preemptively reject new jobs until the data stored by these jobs in the shadow file system have been removed.
Error message
Users receive then an error message
Too much space is used by your output files in the LSF batch system's temporary directory.
You cannot clean up your files in the shadow file system yourself. If you receive this error message, then please contact cluster support.
How to solve this problem
Writing so much data to stdout or stderr does not only fill up the shadow file system; it also slows down your jobs. You should therefore:
- Kill all jobs to prevent further problems
- Modify the program to NOT write all the output to stdout
- Resubmit all jobs
Modifying the program to NOT write all these output to stdout might not be possible in all cases. For such cases you can redirect the program's stderr/stdout to a file using a command like:
bsub [LSF options] "program [arguments] > program.error"
(Note that the quote above are necessary; otherwise the redirection operator would apply to bsub instead of the program.)
Redirection operator | Description |
---|---|
> | redirect stdout |
2> | redirect stderr |
&> | redirect stdout and stderr |
In case of a job array, all individual jobs will write their output into the same file, which may not be desirable. This can be avoided using the run-time $LSB_JOBINDEX variable, e.g.:
bsub [LSF options] "program [arguments] 2> program_\$LSB_JOBINDEX.error"
Be careful though: writing a lot of data to stdout or stderr is aways a BAD IDEA because it slows down the program and overloads the cluster's file system. The "shadow" file system and its 2 GB quota are a protection against misbehaving jobs; you are bypassing them at your own risks. The BEST solution is to modify the program to reduce or eliminate this unnecessary I/O.