Difference between revisions of "Distributed computing in R with Rmpi"

From ScientificComputing
Jump to: navigation, search
Line 10: Line 10:
  
 
== Run R in an interactive session ==
 
== Run R in an interactive session ==
Rmpi assigns one processor to be the master and other processors to be workers. Here we would like to use 4 processors for computation. Therefore, we request 5 processors
+
Rmpi assigns one processor to be the master and other processors to be workers. Here we would like to use 5 processors on 2 nodes for computation. Therefore, we request 6 processors
   $ bsub -n 5 -W 02:00 -I bash
+
   $ bsub -n 6 -R "span[ptile=3]" -Is bash
 
   Generic job.
 
   Generic job.
 
   Job <155200980> is submitted to queue <normal.4h>.
 
   Job <155200980> is submitted to queue <normal.4h>.
 
   <<Waiting for dispatch ...>>
 
   <<Waiting for dispatch ...>>
 
   <<Starting on eu-c7-105-05>>
 
   <<Starting on eu-c7-105-05>>
 
Define available global number of processors with the environment parameter MPI_UNIVERSE_SIZE.
 
  $ export MPI_UNIVERSE_SIZE=5
 
 
Start R
 
  $ R
 
  >
 
  
 
== Use Rmpi ==
 
== Use Rmpi ==

Revision as of 15:01, 6 October 2021

< Examples

Load modules and install Rmpi

Change to the new software stack and load required modules. Here we need MPI and R libraries.

$ env2lmod
$ module load gcc/6.3.0 openmpi/2.1.1 r/4.0.2
$ R
> install.packages("Rmpi")

Run R in an interactive session

Rmpi assigns one processor to be the master and other processors to be workers. Here we would like to use 5 processors on 2 nodes for computation. Therefore, we request 6 processors

 $ bsub -n 6 -R "span[ptile=3]" -Is bash
 Generic job.
 Job <155200980> is submitted to queue <normal.4h>.
 <<Waiting for dispatch ...>>
 <<Starting on eu-c7-105-05>>

Use Rmpi

1. Load Rmpi which calls mpi.initialize()

 > library(Rmpi)

2. Spawn R-slaves to the host. nslaves = requested number of processors - 1

 > usize <- as.numeric(Sys.getenv("MPI_UNIVERSE_SIZE"))
 > ns <- usize - 1
 > mpi.spawn.Rslaves(nslaves=ns)

3. Set up a variable array

 > var = c(11.0, 22.0, 33.0)

4. Root sends state variables and parameters to other ranks

 > mpi.bcast.data2slave(var, comm = 1, buffunit = 100)

5. Get the rank number of that processor

 > mpi.bcast.cmd(id <- mpi.comm.rank())

6. Check if each rank can use its own value

 > mpi.remote.exec(paste("The variable on rank ",id," is ", var[id]))

7. Root orders other ranks to calculate

 > mpi.bcast.cmd(output <- var[id]*2)

8. Root orders other ranks to gather the output

 > mpi.bcast.cmd(mpi.gather(output, 2, double(1)))

9. Root gathers the output from other ranks

 > mpi.gather(double(1), 2, double(usize))

10. Close down and quit

 > mpi.close.Rslaves(dellog = FALSE)
 > mpi.quit()

Exercises

  1. Try replacing mpi.scatter.Robj() instead of mpi.bcast.data2slave() in point 4
  2. Create an R script using Rmpi and submit a batch job through BSUB command line
  3. Create a BSUB job script and submit a batch job

Further reading

https://cran.r-project.org/web/packages/Rmpi/Rmpi.pdf

< Examples