High Performance Computing in R

Overview

The goal of this tutorial is to provide information on high performance computing using R. This tutorial was given during a We R: Penn State R Users’ Group meet-up by Rahim Charania who is an HPC Software Specialist and Graduate Research Assistant at Penn State.

Why High Performance Computing?

Moore’s Law for processors
Exponentially growing datasets
Led to increasing compute time
To cut this down: use HPC/ Super-Computing Infrastructure
How to do this - divide code into smaller chunks/ parallelize to run multiple functions simultaneously
Then communicate and bam! You have your output faster than before.

Basic Idea

Split the problem into pieces
Execute the pieces in parallel
Combine the results back together

How does R do this?

Several packages make it easier!
For single node (no inter node communication): Multicore and doMC
For multi node (with internode communication): foreach, parallel, doMC, doSNOW

Terminology

1 Node = multiple processors
1 Processor = multiple cores
Wall-time = amounto of time your code runs on the cluster
Memory = amount of memory yoru job will require to run.

In Penn State’s Supercomputers:

Type	Total RAM Capacity	Max Memory you want to assign to this node
Basic Memory	128 GB	~120 GB
Standard Memory	256 GB	~245 GB
High Memory	1 TB	~950 GB

Submitting R Jobs to the Cluster

You will need:

An account on ACI
A terminal, like PuTTy to ssh into ACI-B
The R script (.R file) you want to run
A job script (.pbs file) that tells the scheduler how to run your R script
File(s) containing any data your R script needs to run

The PBS File

Below is a sample PBS script with each line explained in detail further. Please have a look at this and feel free to use this as a template if you are unfamiliar with PBS scripts for your first few job submissions (this is a really basic one, though).

#!/bin/bash -l
#PBS -N JOBNAME
#PBS -l walltime=1:00:00
#PBS -l nodes=2:ppn=10
#PBS -l pmem=6gb
#PBS -j oe

module load r/version

cd $PBS_O_WORKDIR

R --file=storage/home/USERID/Code.R

Helpful link that explains all the code syntax of the cryptic PBS file:

https://www.westgrid.ca/files/PBS%20Script_0.pdf

More In Depth Explanation of the Above Sample PBS Code

#PBS -l walltime=1:00:00

This line requests 1 hour of processing time (‘wall-time’). This is independent of how long your job is in the queue. However, once your job starts it will be killed once this time limit is reached regardless of whether or not it has finished running your script. Thus, it would be wise to pad you wall-time request with some amount (this would depend on your code) so it doesn’t get killed if it takes a slight bit longer for expected or unexpected reasons. Let’s say you want to run a recursive code or one with several iterations, you could run a sample code with a couple of iterations and average those over the number of iterations you will be running and can set that as your wall-time. However please note that if you are running jobs on the open queue you will be limited to 24 hours. There is however no such cap on your PI’s Priority queue (if he/she has one).

#PBS -l nodes=2:ppn=10

This line specifies the number of nodes and processors desired for this job. In this case, it requests that your job be run on 2 nodes using 10 processors on each node. This would sum up to 20 processors used for your job in total. However, please keep in mind the kind of node you will be using to fully utilize this option.(Link to types of nodes on our cluster.) Keeping this in mind, you need to understand inter node communication may slow your code down. Hence if you don’t need more processors than there are on a single node and feel confident that RAM constraints meet them too, I would suggest using just 1 node if your program permits.

We provide Compute resources in following configurations: Basic Memory, Standard Memory, and High Memory. Please keep in mind these constraints when using the cluster for job submission since your job won’t run if you exceed these system constraints (given in the table above).

So, in conclusion, for memory calculations please base them on these and leave a cushion of approximately >=5% of RAM for system processes. Doing this should not cause any memory constraint issues.

#PBS -l pmem=16gb

Assigns a 6GB memory to each process run. Hence, 10 processes per node, makes it 60 GB per node for 10 processes and 120GB for the entire program. You can also use mem instead of pmem assigning 1 value for all your code and let the scheduler divide it up. However, in my opinion, pmem seems to be more efficient given that it explicitly assigns memory to each process.

module load r/version

This loads R version you requested. But please make sure we have that specific version using the command “module spider r” first in the terminal.

cd $PBS_O_WORKDIR

Since, PBS scripts execute in your home directory by default, not the directory from which they were submitted. The above line places you in the directory from which the job was submitted.

R --file=storage/home/USERID/Code.R

This line uses the version of R load on with the line above to run the R script located at the file path storage/home/USERID/Code.R on the Cluster.

You can do a lot more with PBS scripts including but not limited to sending yourself emails when a job starts, finishes, catches an exception (using commands like echo and flags like -M, but I’ll explain more on this in the upcoming files since this is just for the basics and job submission techniques.)

Submitting a Job on the Cluster

You may have more than one option to submit a job on the cluster
You can do so in the Open Queue using the following syntax:

$ qsub -A open mypbsscript.pbs

Or, if your PI has provided you access to his/her priority queue, you can use their allocation and submit the job. Note that this will count against the allocation balances.
You can do so by using the following syntax:

$ qsub -A PRIORITY_ALLOCATION_NAME mypbsscript.pbs

Common Trouble Shooting Steps in Submitting Jobs

If you are importing a notepad file or something like that containing your pbs script and it shows you an error saying dos file unable to read in inux just run the command dos2unix on your script and you should be good. An example of that is:

$ dos2unix mypbsscript.pbs

This should convert it into a UNIX based file with a readable extension.

If your .R file needs to load some data you will need to copy this across to the Cluster and have say a read.table( ), line in your .R file that specifies the location of the data on the Cluster with a file path example:

storage/home/USERID/filepath

It is a generally good practice to have your Working directory set to be scratch, since it’s a GPFS file system, and it will load files faster and won’t hinder the performance of your code as opposed to if you run the same from work.

Benchmarking

\(Speedup = \frac{t_{s}}{t_{e}}\)

A simple model says intuitively that a computation runs p times faster when split over p processors. More realistically, a problem has a fraction f of its computation that can be parallelized. Thus, the remaining fraction 1 - f is inherently sequential.

Amdahl’s law:

\(Maximum Speedup = \frac{1}{f/p + (1-f)}\)

Problems with f = 1 are called embarrassingly parallel. Some problems are (or seem to be) embarrassingly parallel: computing column means, bootstrapping, etc.

Shared Memory:

Function	Description	Example
detectCores	detect the number of CPU cores	ncores <- detectCores()
mclapply	parallelized version of lapply	mclapply(1:5, runif, mc.cores = ncores)

Distributed Memory:

Function	Description	Example
makeCluster	start the cluster	cl <- makeCluster(19, type = “MPI”)
clusterSetRNGStream	set seed on cluster	clusterSetRNGStream(cl, 321)
clusterExport	exports variables to the workers	clusterExport(cl, list(a=1:10, x=runif(10)))
clusterEvalQ	evaluates expressions on workers	clusterEvalQ(cl, {x <- 1:3 myFun <- function(x) runif(x)})
clusterCall	calls a function on all workers	clusterCall(cl, function(y) 3 + y, 2)
parLapply	parallelized version of lapply	parLapply(cl, 1:100, Sys.sleep)
parLapplyLB	parLapply with load balancing	parLapplyLB(cl, 1:100, Sys.sleep)
stopCluster	stop the cluster	stopCluster(cl)

What to use?

There are a lot of packages continuously being updated on R CRAN HPC that can be used. You should choose the one most suited for your research/ work to be done.

Running R in Parallel:

GPU
- gputools
Parallel
Rmpi
foreach
- doMC
- doSNOW
- doMPI

More Information and References

Computer Architecture Exploitation for Parallelizing: http://statmath.wu.ac.at/~schwendinger/HPC/HPC.pdf
Penn State’s HPC Resource - ICS Account creation: https://accounts.aci.ics.psu.edu/users/
R Packages & HPCS (official): https://cran.r-project.org/web/views/HighPerformanceComputing.html
Glenn Lockwood : http://www.glennklockwood.com/
Dirk E. : http://dirk.eddelbuettel.com/presentations/

Penn State ACI Cluster:

If you are having trouble with the Penn State ACI Cluster: Feel free to create a ticket on : https://iask.aci.ics.psu.edu/helpdesk/ OR shoot an email to iask@ics.psu.edu

Tutorials:

Thorough tutorial: http://statmath.wu.ac.at/~schwendinger/HPC/HPC.pdf
Another Tutorial: http://hpcuniversity.org/media/TrainingMaterials/12/RXSEDEApr2014b.pdf