The goal of this tutorial is to provide information on high performance computing using R. This tutorial was given during a We R: Penn State R Users’ Group meet-up by Rahim Charania who is an HPC Software Specialist and Graduate Research Assistant at Penn State.
Multicore
and doMC
foreach
, parallel
, doMC
, doSNOW
In Penn State’s Supercomputers:
Type | Total RAM Capacity | Max Memory you want to assign to this node |
---|---|---|
Basic Memory | 128 GB | ~120 GB |
Standard Memory | 256 GB | ~245 GB |
High Memory | 1 TB | ~950 GB |
You will need:
Below is a sample PBS script with each line explained in detail further. Please have a look at this and feel free to use this as a template if you are unfamiliar with PBS scripts for your first few job submissions (this is a really basic one, though).
#!/bin/bash -l
#PBS -N JOBNAME
#PBS -l walltime=1:00:00
#PBS -l nodes=2:ppn=10
#PBS -l pmem=6gb
#PBS -j oe
module load r/version
cd $PBS_O_WORKDIR
R --file=storage/home/USERID/Code.R
Helpful link that explains all the code syntax of the cryptic PBS file:
#PBS -l walltime=1:00:00
This line requests 1 hour of processing time (‘wall-time’). This is independent of how long your job is in the queue. However, once your job starts it will be killed once this time limit is reached regardless of whether or not it has finished running your script. Thus, it would be wise to pad you wall-time request with some amount (this would depend on your code) so it doesn’t get killed if it takes a slight bit longer for expected or unexpected reasons. Let’s say you want to run a recursive code or one with several iterations, you could run a sample code with a couple of iterations and average those over the number of iterations you will be running and can set that as your wall-time. However please note that if you are running jobs on the open queue you will be limited to 24 hours. There is however no such cap on your PI’s Priority queue (if he/she has one).
#PBS -l nodes=2:ppn=10
This line specifies the number of nodes and processors desired for this job. In this case, it requests that your job be run on 2 nodes using 10 processors on each node. This would sum up to 20 processors used for your job in total. However, please keep in mind the kind of node you will be using to fully utilize this option.(Link to types of nodes on our cluster.) Keeping this in mind, you need to understand inter node communication may slow your code down. Hence if you don’t need more processors than there are on a single node and feel confident that RAM constraints meet them too, I would suggest using just 1 node if your program permits.
We provide Compute resources in following configurations: Basic Memory, Standard Memory, and High Memory. Please keep in mind these constraints when using the cluster for job submission since your job won’t run if you exceed these system constraints (given in the table above).
So, in conclusion, for memory calculations please base them on these and leave a cushion of approximately >=5% of RAM for system processes. Doing this should not cause any memory constraint issues.
#PBS -l pmem=16gb
Assigns a 6GB memory to each process run. Hence, 10 processes per node, makes it 60 GB per node for 10 processes and 120GB for the entire program. You can also use mem instead of pmem assigning 1 value for all your code and let the scheduler divide it up. However, in my opinion, pmem seems to be more efficient given that it explicitly assigns memory to each process.
module load r/version
This loads R version you requested. But please make sure we have that specific version using the command “module spider r” first in the terminal.
cd $PBS_O_WORKDIR
Since, PBS scripts execute in your home directory by default, not the directory from which they were submitted. The above line places you in the directory from which the job was submitted.
R --file=storage/home/USERID/Code.R
This line uses the version of R load on with the line above to run the R script located at the file path storage/home/USERID/Code.R on the Cluster.
You can do a lot more with PBS scripts including but not limited to sending yourself emails when a job starts, finishes, catches an exception (using commands like echo and flags like -M, but I’ll explain more on this in the upcoming files since this is just for the basics and job submission techniques.)
$ qsub -A open mypbsscript.pbs
$ qsub -A PRIORITY_ALLOCATION_NAME mypbsscript.pbs
$ dos2unix mypbsscript.pbs
This should convert it into a UNIX based file with a readable extension.
storage/home/USERID/filepath
\(Speedup = \frac{t_{s}}{t_{e}}\)
A simple model says intuitively that a computation runs p times faster when split over p processors. More realistically, a problem has a fraction f of its computation that can be parallelized. Thus, the remaining fraction 1 - f is inherently sequential.
Amdahl’s law:
\(Maximum Speedup = \frac{1}{f/p + (1-f)}\)
Problems with f = 1 are called embarrassingly parallel. Some problems are (or seem to be) embarrassingly parallel: computing column means, bootstrapping, etc.
Shared Memory:
Function | Description | Example |
---|---|---|
detectCores | detect the number of CPU cores | ncores <- detectCores() |
mclapply | parallelized version of lapply | mclapply(1:5, runif, mc.cores = ncores) |
Distributed Memory:
Function | Description | Example |
---|---|---|
makeCluster | start the cluster | cl <- makeCluster(19, type = “MPI”) |
clusterSetRNGStream | set seed on cluster | clusterSetRNGStream(cl, 321) |
clusterExport | exports variables to the workers | clusterExport(cl, list(a=1:10, x=runif(10))) |
clusterEvalQ | evaluates expressions on workers | clusterEvalQ(cl, {x <- 1:3 myFun <- function(x) runif(x)}) |
clusterCall | calls a function on all workers | clusterCall(cl, function(y) 3 + y, 2) |
parLapply | parallelized version of lapply | parLapply(cl, 1:100, Sys.sleep) |
parLapplyLB | parLapply with load balancing | parLapplyLB(cl, 1:100, Sys.sleep) |
stopCluster | stop the cluster | stopCluster(cl) |
There are a lot of packages continuously being updated on R CRAN HPC that can be used. You should choose the one most suited for your research/ work to be done.
Running R in Parallel:
Penn State ACI Cluster:
Tutorials: