# MATLAB PCT

MATLAB's Parallel Computing Toolbox (PCT) lets you run suitably-written programs in parallel or as a set of independent jobs. Several cores calculate different parts of a problem, possibly at the same time, to reduce the total time-to-solution.

A trivial program that uses a *parpool* (a pool of workers) is shown below. It calculates the the squares of the first ten integers in parallel and stores them in an array:

squares = zeros(10,1); pool = parpool(4); parfor i = 1:10 squares(i) = i^2; end disp(squares) pool.delete()

You can use the Parallel Computing Toolbox (PCT) on Euler in two ways, though the best way depends on the properties of the program. One is to submit a job that requests multiple cores to the batch system and use the **local** parpool. The parallel part of your program (for example, the parfor loop above) will run within your job. The other is to submit a single-core master job and use the **SLURM** parpool. MATLAB will itself submit a parallel job to compute *just* the parallel part of your program.

## Contents

## Local parpool

### Set up MATLAB to use SLURM local parpool

**One-time** preparation: Before using the SLURM job pool for the first time, you need to import a cluster profile. for that, start MATLAB and then call configCluster . For each cluster, configCluster only needs to be called once per version of MATLAB. Please be aware that running this command more than once per version will reset your cluster profile back to default settings and erase any saved modifications to the profile.

### Use a local parpool

When you use the local parpool, you submit a multi-core job to SLURM. MATLAB will run additional worker processes within your multi-core job to process the parallel part of your program. A diagram of this is shown to the right.

A trivial parallel program (`simulation.m`) is shown below:

squares = zeros(10,1); local_job = parcluster('local'); pool = parpool(local_job, 4); parfor i = 1:10 squares(i) = i^2; end disp(squares) pool.delete()

To submit this program, pass the number of cores to the sbatch `--cpus-per-task` argument. This should be greater or equal to the size of the pool requested in your MATLAB script (e.g., 4).

sbatch --tasks=1 --cpus-per-task=4 --time=1:00:00 --mem-per-cpu=2g --wrap="matlab -nodisplay -singleCompThread -r simulation"

You must *not* use the `-nojvm` MATLAB argument but you *should* include the `-singleCompThread` MATLAB argument. MATLAB is quite memory-hungry, so request at least 2 GB of memory per core as shown above.

The **local** parpool is limited to 12 cores in releases up to R2016a (8.7/9.0). From release R2016b (9.1) on, you can use all the cores of an Euler node (effectively up to 128).

Older versions of MATLAB used matlabpool instead of parpool.

## SLURM parpool

### Set up MATLAB to use SLURM parpool

**One-time** preparation: Before using the SLURM job pool for the first time, you need to import a cluster profile. for that, start MATLAB and then call configCluster . For each cluster, configCluster only needs to be called once per version of MATLAB. Please be aware that running this command more than once per version will reset your cluster profile back to default settings and erase any saved modifications to the profile.

### Use a SLURM parpool

When you use the SLURM parpool, you submit a single-core job to SLURM. MATLAB will submit an additional parallel job to run the MATLAB workers to process the parallel part of your program. A diagram of this is shown to the right.

A trivial parallel program (`simulation.m`) is shown below:

squares = zeros(10,1); batch_job = parcluster; pool = parpool(batch_job, 4); parfor i = 1:10 squares(i) = i^2; end disp(squares) pool.delete()

To submit this program, just submit your MATLAB program (the master job) as a serial (single-core) job:

sbatch -n 1 --time=120:00:00 --mem-per-cpu=2g --wrap="matlab -nodisplay -singleCompThread -r simulation"

The master job is assumed to not need much CPU power; however, it may need to run for a long time since it needs to wait for the parallel pool job to start and run.

You must *not* use the `-nojvm` MATLAB argument but you *should* include the `-singleCompThread` MATLAB argument. MATLAB is quite memory-hungry, so request at least 2 GB of memory as shown above.

Older versions of MATLAB used a matlabpool instead of a parpool.

### Change the settings of a SLURM parpool

You can change the settings of SLURM jobs that the SLURM parpool will submit, such as requesting more time or memory. To do this, you must edit the SLURM parameters in MATLAB. Here are a few examples:

>> % First, get a handle to the cluster : >> c = parcluster; >> % Specify the account to use >> c.AdditionalProperties.AccountName = 'account-name'; >> % Request email notification of job status >> c.AdditionalProperties.EmailAddress = 'user-id@id.ethz.ch'; >> % Specify GPU options >> c.AdditionalProperties.GpusPerNode = 1; >> c.AdditionalProperties.GpuMem = '10g'; >> % Specify memory to use, per core (default: 4gb) >> c.AdditionalProperties.MemUsage = '6gb'; >> % Specify the wall time (e.g., 5 hours) >> c.AdditionalProperties.WallTime = '05:00:00';

Save changes after modifying AdditionalProperties for the above changes to persist between MATLAB sessions.

>> c.saveProfile

To see the values of the current configuration options, display AdditionalProperties.

>> % To view current properties >> c.AdditionalProperties

## Submit an independent batch job

Use the batch command to submit asynchronous jobs to the cluster. The batch command will return a job object which is used to access the output of the submitted job. See the MATLAB documentation for more help on batch.

>> % First, get a handle to the cluster >> c = parcluster; >> % Then submit job to query where MATLAB is running on the cluster >> job = c.batch(@pwd, 1, {}, 'CurrentFolder','.'); >> % Query job for state >> job.State >> % If state is finished, fetch the results >> job.fetchOutputs{:} >> % Delete the job after results are no longer needed >> job.delete

To retrieve a list of currently running or completed jobs, call parcluster to retrieve the cluster object. The cluster object stores an array of jobs that were run, are running, or are queued to run:

>> c = parcluster; >> jobs = c.Jobs;

Then, to fetch results for job with ID 2 :

>> job2.fetchOutputs{:}

To view results of a previously completed job :

>> % Get a handle to the job with ID 2 >> job2 = c.Jobs(2);

To see how to submit parallel workflows with the batch command, let’s use the following example, which is saved as parallel_example.m.

function [t, A] = parallel_example(iter) if nargin==0 iter = 8; end disp('Start sim') t0 = tic; parfor idx = 1:iter A(idx) = idx; pause(2) idx end t = toc(t0); disp('Sim completed') save RESULTS A end

This time when we use the batch command, to run a parallel job, we’ll also specify a MATLAB Pool.

>> % Get a handle to the cluster >> c = parcluster; >> % Submit a batch pool job using 4 workers for 16 simulations >> job = c.batch(@parallel_example, 1, {16}, 'Pool',4, 'CurrentFolder','.'); >> % View current job status >> job.State >> % Fetch the results after a finished state is retrieved >> job.fetchOutputs{:} ans = 8.8872

The job ran in 8.89 seconds using four workers. Note that these jobs will always request N+1 CPU cores, since one worker is required to manage the batch job and pool of workers. For example, a job that needs eight workers will consume nine CPU cores.

## Troubleshoot parallel jobs

Using parallel pools often results in hard-to-diagnose errors. Many of these errors are related to running several pools at the same time, which is not what MATLAB expects. If you encounter persistent problems starting pools, try to perform one of these commands. Before running them, make sure that you do not have a MATLAB processes running.

- Remove the
`matlab_metadat.mat`file in your current working directory. - Remove the
`$HOME/.matlab/local_cluster_jobs`directory. - Remove the entire
`$HOME/.matlab`directory.**Warning**: Your MATLAB settings on Euler will be lost.

If a parallel job produces an error, call the getDebugLog method to view the error log file :

>> c.getDebugLog(job)

When troubleshooting a job, the cluster admin request the scheduler ID of the job. This can be derived by calling schedID :

>> schedID(job) ans = 25539