MATLAB PCT

From ScientificComputing
Jump to: navigation, search

MATLAB's Parallel Computing Toolbox (PCT) lets you run suitably-written programs in parallel or as a set of independent jobs. Several cores calculate different parts of a problem, possibly at the same time, to reduce the total time-to-solution.

A trivial program that uses a parpool (a pool of workers) is shown below. It calculates the the squares of the first ten integers in parallel and stores them in an array:

squares = zeros(10,1);
pool = parpool(4);
parfor i = 1:10
    squares(i) = i^2;
end
disp(squares)
pool.delete()

You can use the Parallel Computing Toolbox (PCT) on Euler in two ways, though the best way depends on the properties of the program. One is to submit a job that requests multiple cores to the batch system and use the local parpool. The parallel part of your program (for exmaple, the parfor loop above) will run within your job. The other is to submit a single-core master job and use the LSF parpool. MATLAB will itself submit a parallel job to compute just the parallel part of your program.

Use a local parpool

Illustration of a typical parallel job using the local pool. The job has three computational parts: A, B, and C, where part B can run in parallel. Gray rectangles show busy cores. White rectangles show idle cores (wasted time).

When you use the local parpool, you submit a multi-core job to LSF. MATLAB will run additional worker processes within your multi-core job to process the parallel part of your program. A diagram of this is shown to the right.

A trivial parallel program (simulation.m) is shown below:

squares = zeros(10,1);
local_job = parcluster('local');
pool = parpool(local_job, 4);
parfor i = 1:10
    squares(i) = i^2;
end
disp(squares)
pool.delete()

To submit this program, pass the number of cores to bsub's -n argument. This should match the size of the pool requested in your MATLAB script (e.g., 4).

bsub -n 4 -W "1:00" -R "rusage[mem=2048]" matlab -nodisplay -singleCompThread -r simulation

You must not use the -nojvm MATLAB argument but you should include the -singleCompThread MATLAB argument. MATLAB is quite memory-hungry, so request at least 2 GB of memory per core as shown above.

The local parpool is limited to 12 cores in releases up to R2016a (8.7/9.0). From release R2016b (9.1) on, you can use all the cores of an Euler node (effectively 24).

Older versions of MATLAB used a matlabpool instead of a parpool.

LSF parpool

Set up MATLAB to use the LSF parpool

One-time preparation: Before using the LSF job pool for the first time, you need to import a cluster profile. We provide a settings template/cluster/apps/matlab/support/EulerLSF8h.settings that you can import into MATLAB. There are instructions provided by MATLAB or you can import it from the prompt:

parallel.importProfile('/cluster/apps/matlab/support/EulerLSF8h.settings')

Do this only once (or every once in a while, to get possible updates). Do not import the profile in every job you run. These settings allow you to use up to 144 cores but the default pool size is 24 cores since guest users on Euler can use no more than 48 cores.

Use an LSF parpool

Illustration of a typical parallel job using the LSF pool. The job has three computational parts: A, B, and C, where part B can run in parallel. Gray rectangles show busy cores. White rectangles show idle cores (wasted time).

When you use the LSF parpool, you submit a single-core job to LSF. MATLAB will submit an additional parallel job to run the MATLAB workers to process the parallel part of your program. A diagram of this is shown to the right.

A trivial parallel program (simulation.m) is shown below:

squares = zeros(10,1);
batch_job = parcluster('EulerLSF8h');
pool = parpool(batch_job, 4);
parfor i = 1:10
    squares(i) = i^2;
end
disp(squares)
pool.delete()

To submit this program, just submit your MATLAB program (the master job) as a serial (single-core) job:

bsub -R light -n 1 -W "120:00" -R "rusage[mem=2048]" matlab -nodisplay -singleCompThread -r simulation

The master job is assumed to not need much CPU power; however, it may need to run for a long time since it needs to wait for the parallel pool job to start and run. This is why you should submit the master job with the “-R light” option and request ample runtime.

You must not use the -nojvm MATLAB argument but you should include the -singleCompThread MATLAB argument. MATLAB is quite memory-hungry, so request at least 2 GB of memory as shown above.

Older versions of MATLAB used a matlabpool instead of a parpool.

Change the settings of an LSF parpool

You can change the settings of the LSF jobs that the LSF parpool will submit, such as requesting more time or memory. To do this, you must edit the LSF paramaters MATLAB will use when submitting the parallel job. For example, if you want to request 24 hours and 4 GB of RAM per worker in the above example, then add the line listed below in bold:

squares = zeros(10,1);
batch_job = parcluster('EulerLSF8h');
batch_job.SubmitArguments = '-W 24:00 -R "rusage[mem=4000]"'
pool = parpool(batch_job, 4);
parfor i = 1:10
    squares(i) = i^2;
end
disp(squares)
pool.delete()

Troubleshoot parallel jobs

Using parallel pools often results in hard-to-diagnose errors. Many of these errors are related to running several pools at the same time, which is not what MATLAB expects. If you encounter persistent problems starting pools, try to perform one of these commands. Before running them, make sure that you do not have a MATLAB processes running.

  1. Remove the matlab_metadat.mat file in your current working directory.
  2. Remove the $HOME/.matlab/local_cluster_jobs directory.
  3. Remove the entire $HOME/.matlab directory. Warning: Your MATLAB settings on Euler will be lost and you will need to re-import the pool profile.