Using Gaussian on Euler
Contents
Introduction
Gaussian is a computer program for computational chemistry initially released in 1970 by John Pople.
License agreement
The use of Gaussian is subject to the following conditions (this is just a summary; the license agreement is several pages long):
Warning -- This program may not be used in any manner that competes with the business of Gaussian, Inc. or will provide assistance to any competitor of Gaussian, Inc. The licensee of this program is prohibited from giving any competitor of Gaussian, Inc. access to this program. By using this program, the user acknowledges that Gaussian, Inc. is engaged in the business of creating and licensing software in the field of computational chemistry and represents and warrants to the licensee that it is not a competitor of Gaussian, Inc. and that it will not use this program in any manner prohibited above.
These conditions have caused some controversy, not least because the company Gaussian, Inc. has taken the unusual step of banning individuals and organizations that did not respect them. You will find the arguments of banned users here and the response of Gaussian, Inc. here.
Since we have no intention to get drawn into a legal dispute, access to Gaussian is restricted to people who have explicitly accepted the conditions above. If you want to use Gaussian on Euler, simply run the command get-access and accept the license.
Setting up your environment
In order to use Gaussian on Euler, you must configure your environment for it by loading the gaussian module
module load gaussian/09d1
In order to load the default Gaussian 09 version, you can omit the version number and type only:
module load gaussian
This module works only if you are authorized to use Gaussian on Euler. If not, you will get an error:
ERROR: You are not authorized to use Gaussian on Euler. Please contact 'cluster-support@id.ethz.ch' for assistance.
Simple test case
You may run small computations interactively for testing and debugging purposes. Do not, under any circumstances, run large and/or long computations this way. All production jobs must be submitted to the cluster's batch system.
Let's take one of the standard Gaussian test cases, test000.com, which you can find on Euler here:
/cluster/apps/gaussian/g09/tests/com/test000.com
The contents of this file looks like:
# SP, RHF/STO-3G punch=archive trakio scf=conventional Gaussian Test Job 00 Water with archiving 0 1 O H 1 0.96 H 1 0.96 2 109.471221
You will find more information about Gaussian's input files in the user manual.
Normally, you would run this test case using the command:
g09 test000
To run the same computation in batch, all you have to do is precede that command with bsub:
module load gaussian/g09d1 sbatch --wrap="g09 test000"
In this case the computation will not start immediately but will be sent to a batch queue, and will run once a processor core becomes available on the cluster. (You can use the bjobs command to check the status of this job.)
In both cases the results of the simulation will be stored in a file called test000.log.
Real computations
This test case was trivial. More complex Gaussian computations may run for hours or days, use multiple processor cores, and may also need lots of memory and scratch space. The procedure to run such computations on Euler is therefore slightly more complicated.
Run time
All batch jobs on Euler are subject to run-time limits — 4 hour (default), 24 hours or 5 days. If your computation takes more than 4h, you must indicate it when you submit it, using the option "-W" (= wall-clock time):
sbatch --time=HH:MM:SS --wrap="..."
Number of CPUs
On Euler, Gaussian jobs can be executed in parallel using the shared-memory model only. Distributed memory is not supported because Linda, the tool used by Gaussian to distribute a computation over multiple nodes, is not available.
Two things are necessary to run a Gaussian computation in parallel on N CPUs (N ≤ 16). First, your Gaussian input file must contain a command like:
%NProcShared=N
Second, you must indicate the number of CPUs needed by Gaussian when your submit your job, using the option:
sbatch -n N --wrap="..."
Memory
By default, Gaussian uses 256 MB of RAM. If it needs more memory, Gaussian writes some temporary data (scratch) to disk, which is extremely slow compared to RAM. You should therefore tell Gaussian that it can use more than 256 MB. To do that, your Gaussian input file must contain a command like:
%Mem=1024MB ← do not put a space between size (1024) and unit (MB)
This size (1024 MB) corresponds to the memory that Euler allocates by default to single-core jobs. You can of course indicate a different size. However, if you need more memory, say 2048 MB, you must request this memory when you submit your job, using the option:
sbatch --mem-per-cpu=2048 --wrap="..." ← do not indicate a unit here: size is always in MB
You may want request a bit more memory (say 25% more) just in case Gaussian uses more RAM that it's supposed to.
Memory for parallel computations
Please note that the memory size indicated in Gaussian's input file is for the whole computation, whereas the size indicated in the bsub command is per CPU. This difference does not matter if you are using only one CPU. However, if you are doing a parallel computation, you must adjust the size accordingly.
Let's assume that your computation needs 4 CPUs and 8 GB (8192 MB) of memory:
%NProcShared=4 %Mem=8192MB
You should therefore request 8192 MB / 4 CPUs + 25% safety margin = 2560 MB / CPU, hence:
sbatch -n 4 --mem-per-cpu=2560 --wrap="..."
Scratch space
As mentioned earlier, Gaussian stores temporary data to disk. They include:
- a checkpoint file: Gau-pid.chk
- a read-write file: Gau-pid.rwf
- a two-electron integral file: Gau-pid.int (empty by default)
- a two-electron integral derivative file: Gau-pid.d2e (empty by default)
where pid is the process ID of the Gaussian program. Normally these files are stored in the current directory, or in a scratch directory specified by the GAUSS_SCRDIR environment variable.
Since these files can become very large, storing them in the current directory (typically your home directory) is a very bad idea. For this reason, if GAUSS_SCRDIR is undefined, Euler automatically sets it to:
GAUSS_SCRDIR=$TMPDIR
where TMPDIR is defined by the batch system and contains the path of a temporary directory located on the compute node where your job is running. This directory is created automatically when your job starts, and is deleted when the job ends.
Using a local scratch directory offers some significant advantages:
- no risk to fill up your home directory
- very high I/O performance
but it has a few drawbacks:
- limited disk size (between 400 GB and 800 GB, depending on the type of compute node)
- all data — including the checkpoint file — are deleted when the job ends (or crashes!)
Disk size is the most critical. If your computation needs more than 100 MB of scratch, you must request it when you submit your job:
bsub --tmp=XXX --wrap="..."
where XXX is the amount of scratch needed by your job, in megabytes (MB). Like memory, this size is per compute node, so you will need to adjust it if you are using Gaussian in parallel.
Checkpoint file
Since the local scratch directory is deleted when your job ends, you may want to store Gaussian's checkpoint file — Gau-pid.chk — in a different location, to be able to restart your computation in case of failure. This can be done using one of the following commands in your Gaussian input file:
%Chk=/path/to/checkpoint_directory/checkpoint_file %Chk=/path/to/checkpoint_directory/
Please refer to the Gaussian user manual for details.
Creating cube file from Gaussian checkpoint file
In order to visualize your results, GaussView requires that the results are converted to the Gaussian cube file format. In the first step, the binary checkpoint file (.chk) is converted to a formatted checkpoint file (.fchk) employing the formchk utility:
formchk result.chk result.fchk
The formatted checkpoint file can then be used as an input for the cubegen command of Gaussian
cubegen memory kind fchkfile cubefile npts format
More detailed information on the cubegen command, as for instance the meaning of the particular parameters, can be found in the Gaussian user manual.
Using GaussView from Student computer rooms
The student computer rooms are running under Fedora 22. Users experienced some Problems when starting Gaussview on the cluster via X11 forwarding. If you would like to start Gaussview, then please add the option mesagl
gview -mesagl
This should solve the problem.