Some Background on Functional Connectivity
- Brain networks and graph theory
- Overall Analytic Pipeline
A note about staying organized
Setting up the workspace
Getting subject information and motion scrubbing
Import adjacency matrices
Convert to igraph objects
Community assignment
Compute graph metrics
PCA across metrics and densities
Statistical tests between groups

Note: I will be demonstrating one standard approach to brain graph analysis and borrow some custom built functions that my advisor and I have written that perform different steps in the analytic pipeline. You can view and download our code from our lab GitHub.

Further documentation on using igraph can be found here.
With a great igraph tutorial here.
Although I have used it sparingly, the R package brainGraph has some useful functions that lend themselves to brain graph analysis in particular. But, it’s mostly built on igraph, which I prefer :)

I will skirt around a number of theoretical and methodological considerations and will focus on practicality: how do you actually do this? A brief background first:

Some Background on Functional Connectivity

Brain networks and graph theory

The brain can be considered a complex network that transmits a large ammount of information about an individual’s environment, internal state, and past experience. Although this is not a new idea, in recent years researchers have become interested in dissecting underlying patterns of connectivity that effectively allow humans to react and respond to their environment, as well as think, perceive, and feel. To do this in a principled way, the dominant analytic approach utilizes graph theory. Graph theory at it’s core is nothing more than the study of networks comprised of functionally distinct nodes and conections between these nodes, called edges. A graph is simply a collection of nodes and edges that we can subject to graph theoretical analysis.

While graph theory can be used to study networks at any level of abstraction (i.e. social networks, biochemical interaction networks in systems biology, or the internet), network neuroscientists have become interested in using graph theory to study the human connectome. The connectome is a recently coined term that refers that a comprehensive map of neural connections (both structural and functional) in the human brain. In order to do this, researchers construct a brain “graph” comprised of brain regions (nodes) and connections between these regions (edges). In fucntional connectivity, edges are defined as a statistical dependence (typically defined via a full pearson correlation) of BOLD timeseries between nodes. Structural connectivity typically utilizes fractional anisotrophy from diffusion weighted imaging. Since most of my experience is in the functional connectivity realm, I’ll stick to fucntional connectivity. However, the principles of graph theoretic analyses are always the same, we may just apply them differently to answer different questions.

Largely from fMRI studies of the brain in it’s resting state, we know that the brain has a modular structure and is comprised of a number of functionally distinct subnetworks:

(van den Heuvel & Sporns, 2013)

Here, we can see that the modular structure of this simple network indicates that some groups of nodes share more connections amongst each other than with other nodes. These nodes are said to be in the same “module”, “sub-network”, or “community.” Network scientists have developed numerous community detection algorithms that attempt to partition graphs into densely connected communities (neuroscientists generally use the louvain fast-unfolding algorithm, or the infomap algorithm). The above graphic plots the network in graph space. However, we could also plot our graphs in brain space, where each node corresponds to either EEG or MEG sensors, voxels, or structural or functional nodes derived from a brain parcellation (sometimes called an atlas). Below is a 90 node parcellation based on an Automated Anatomical Labelling (AAL) shown in graph and brain space:

(Wang, Zuo & He, 2010)

It is a bit tough to tell in this visualization, but we see five canonical brain networks emerge from a simpler 90 node parcellation consisting of the (I) sommatomotor network, (II) occipital/ primary visual network, (III) fronto-parietal executive control network, (IV) default mode network, and (V) limbic/subcortical network.

It is important to note that community detection is far from an exact science and requires a bit of interpretation when looking at the networks you are returned from a community detection algorithm. While numerous canonical networks can be detected in numerous studies, there is not a “true” functional community structure that has been reached by consensus. This is an active area of research. In light of this, some researchers (especially in applied research settings) will opt to “assign” modular structure to nodes based off of well-known community information. Two popular community assignment atlases come from Power et al (2011) and Yeo et al (2011).

Overall Analytic Pipeline

(Fallani et al., 2014)

As the authors above demonstrate, typically, brain graoph analysis can be lumped broadly into five steps:
1. Define nodes based on brain activity.
2. Define edges based on temporal correlation of the timeseries of nodes. This will give you an adjacency (connectivity) matrix with cells in the matrix corresponding to the strength of a connection between two nodes.
3. Construct a graph based on the adjacency matrix. Typically, researchers will threshold and binarize the adjacency matrix by retaining some proportion of the highest edges in a graph and setting edges below the cutoff to 0.
4. Compute graph metrics (in igraph!)
5. Conduct statistical tests on graph metrics between groups (clinical group vs control, or rest vs any sort of cognitive task).

In the interest of keeping the tutorial accesible to those who have little prior experience in graph analysis, I’ll focus on steps 3-5. This means we’ll start with adjecency matrices derived from 5 minutes of resting-state fMRI data. I’ll use adjacency matrices derived from a custom-built parcellation of the cortex, thalamus, striatum, and subnuclei for an ongoing clinical case-control study of individuals with borderline personality disorder (BPD). The groups, as I mentioned above, could simply be differences between rest and task, and the number of contrasts you’d like to test can increase quickly if you’d like to compare even more conditions (e.g. healthy controls vs BPD vs depression).

A note about staying organized

I have learned the hard way how important it is to follow good programming practices and write the steps in your pipeline into functions that hide quite a bit of the complexity of the background machinery from the user to allow for testing different decisions at different points along the analytic pipeline. If you are looking for some (decent) examples of how you might implement this into your own pipeline, I would check out the lab GitHub page. The script rs_initialize_pipeline.R acts as a script that compiles a few thousand lines of code into something much more compact and incremental, and thus more managable, updatable, sharable, and replicable :)

Setting up the workspace

The function setup_globals.R does a number of things as soon as I source it, including sourcing custom functions, loading packages I will use during the analysis, and specifying a few key inputs that tell the functions later on if there is anything special about the way I’ve specified things that entails running things slightly differently. Here are a few of the most important specifications:

basedir <- getwd()
source("functions/setup_globals.R") #this will setup details of the parcellation, conn_method, preproc_pipeline, and connection distance, loads helper functions, and required packages

## Initializing RS graph analysis with the following settings:

## Parcellation: schaefer422

## roiMat: /Users/natehall/Box Sync/DEPENd/Projects/RS_BPD_graph/bpd_rest/data/schaefer422_roiMat.txt

## atlas: schaefer422_atlas.csv

## preprocessing pipeline: aroma

## adj mat calculation conn_method: ridge.net_partial

## adjmats directory: /Users/natehall/Box Sync/DEPENd/Projects/RS_BPD_graph/bpd_rest/adjmats

## PCA method: a_priori

This includes some of the following commands:

######################################################################
###SPECIFY PIPELINE INPUTS
######################################################################
nnodes <- 422  #varying this will require you to change the .txt coordinate file you read in for roiMat (should be 4 X nnodes txt file labelled:"x", "y", "z", "roi") and the masterlookup atlas
parcellation <- "schaefer422"
roiFile <- file.path(basedir,  "data", "schaefer422_roiMat.txt")
roiMat <- read.table(roiFile, header=FALSE, col.names=c("x", "y", "z", "roi"))
atlasFile <- paste0(parcellation, "_atlas.csv")
atlas <- read.csv(file.path(basedir, "data", atlasFile), header = TRUE, stringsAsFactors = FALSE) #eventually would be nice to have a lockstep name here and re-use parcellationName
use.infomap <- 0 #from earlier community detection efforts, has since been decommissioned
use.yeo <-1
PCA.method <- "a_priori" #can be set to "all", "a_priori", or "metrics". This specifies how PCA is performed across densities and metrics
figure.dir <- file.path(basedir, "figures/")
node.file.dir <- file.path(basedir, "Viz_files")

preproc_pipeline <- "aroma" #method for data preprocessing. This corresponds to mni_5mm_aroma data. 

##UPDATE 8/8/17. no 5mm spatial smoothing re: Alakorkko et al 2017
conn_method <- "pearson" #Jun2017: aroma preprocessing, pearson correlations
#conn_method <- "cor.shrink" #Jun2017: aroma preprocessing, shrinkage estimator of correlation
#conn_method <- "pcor.shrink_partial" #Jun2017: aroma preprocessing, shrinkage estimator of *partial* correlation
#conn_method <- "ridge.net_partial" #Jun2017: aroma preprocessing, shrinkage estimator of *partial* correlation
#conn_method <- "dens.clime_partial" #Sept2017 aroma preprocessing, density-based approach for partial correlation estimation (uses Constrained L1-Minimization (CLIME)) from Wang et al (2016)
#conn_method <- "quic" #coming soon
#conn_method <- "pearson_fisherz" #uses older files (from wavelet 5mm on power 269)
#conn_method  <- "scotmi"  #decommissioned due to not enough time points in our data

roi.dist <- 20 #distance in mm to zero connections when setting up graphs (cf. Power 2011)

Getting subject information and motion scrubbing

subj_info <- get_subj_info(adjmats_base, parcellation, conn_method, preproc_pipeline, file_extension=".RData", fd.scrub = TRUE, allowCache = TRUE)

## Loading subject info from file: /Users/natehall/Box Sync/DEPENd/Projects/RS_BPD_graph/bpd_rest/cache/subjinfo_schaefer422_aroma_pearson.RData

subj_info$file <- rep("/your/path_to_adjmats", length(subj_info$file))
subj_info$mr_dir <- rep("/your/path_to_raw_data/with_motion_info", length(subj_info$file))

NUM_ID	SPECC_ID	Luna_ID	DOB	ScanDate	AgeAtScan	Female	HasRest	LunaMRI	HasClock	file	mr_dir	dirfound
1	001RA	11131	1993-06-16	2013-12-07	20.5	0	1	0	1	/your/path_to_adjmats	/your/path_to_raw_data/with_motion_info	TRUE
2	002HS	10997	1997-12-20	2014-03-08	16.2	1	1	1	1	/your/path_to_adjmats	/your/path_to_raw_data/with_motion_info	TRUE
3	003BU	10895	1992-05-04	2013-12-04	21.6	1	1	1	1	/your/path_to_adjmats	/your/path_to_raw_data/with_motion_info	TRUE

Note: allowCache, when set to true allows you to benchmark your progress by loading an .RData file from an earlier run through if you have a “cache” folder in your base directory.

The get_subj_info function starts to use some of your specifications from above. Notice that fd.scrub refers to scrubbing the data for subjects that have excessive head movement during the scan. This refers to another call within the function that allows users to specify what threshold of FD counts as a significant head movement, what propotion of significant head movements are required to remove the subject entirely, and the cutoff for super large head movements that signifies the level at which the subject is removed if their largest FD is above the cutoff:

SPECC_rest <- filter_movement(SPECC_rest, "/mnt/ics", 0.5, .20, 10) #0.5mm FD threshold, 20% of volumes at that threshold, or any 10mm+ movement. FYI SPECC is just the acronym for our study.

Import adjacency matrices

This function pulls adjacency matrices (i.e. correlation matrices to construct the graph based on) to be used in the rest of the pipeline. These adjacency matrices are based off of a full pearson correlation, which is the standard in the field. Whether or not to use partial correlations or measures of mutual information (Smith et al., 2011; Fallani et al., 2014) has become an area of contention in the field. Notice the argument rmShort is specified in the beginning of the pipeline and designates edges between nodes with a euclidean distance at that level or shorter are removed automatically from the pipeline. This is because neighboring nodes often share some level of statistical dependence even though we are usually interested in connections between nodes that aren’t literally touching. Import adjmats will return a 3D array with dimensions n_subjects x n_nodes x n_nodes:

allmats <- import_adj_mats(subj_info, rmShort = rmShort, allowCache=TRUE)

## Loading raw adjacency matrices from file: /Users/natehall/Box Sync/DEPENd/Projects/RS_BPD_graph/bpd_rest/cache/adjmats_schaefer422_aroma_pearson.RData

dim(allmats)

## [1]  83 422 422

allmats[1:5,1:5,1:5]

## , , roi2 = V1
## 
##        roi1
## id      V1 V2    V3    V4    V5
##   001RA  0  0 0.833 0.807 0.773
##   002HS  0  0 0.886 0.774 0.692
##   003BU  0  0 0.788 0.734 0.591
##   005AI  0  0 0.795 0.642 0.611
##   008JH  0  0 0.923 0.843 0.887
## 
## , , roi2 = V2
## 
##        roi1
## id      V1 V2      V3     V4     V5
##   001RA  0  0  0.7866 0.8568  0.747
##   002HS  0  0  0.8075 0.7402  0.655
##   003BU  0  0 -0.0365 0.0327 -0.156
##   005AI  0  0  0.7721 0.8035  0.714
##   008JH  0  0  0.6959 0.7857  0.711
## 
## , , roi2 = V3
## 
##        roi1
## id         V1      V2 V3 V4 V5
##   001RA 0.833  0.7866  0  0  0
##   002HS 0.886  0.8075  0  0  0
##   003BU 0.788 -0.0365  0  0  0
##   005AI 0.795  0.7721  0  0  0
##   008JH 0.923  0.6959  0  0  0
## 
## , , roi2 = V4
## 
##        roi1
## id         V1     V2 V3 V4 V5
##   001RA 0.807 0.8568  0  0  0
##   002HS 0.774 0.7402  0  0  0
##   003BU 0.734 0.0327  0  0  0
##   005AI 0.642 0.8035  0  0  0
##   008JH 0.843 0.7857  0  0  0
## 
## , , roi2 = V5
## 
##        roi1
## id         V1     V2 V3 V4 V5
##   001RA 0.773  0.747  0  0  0
##   002HS 0.692  0.655  0  0  0
##   003BU 0.591 -0.156  0  0  0
##   005AI 0.611  0.714  0  0  0
##   008JH 0.887  0.711  0  0  0

Convert to igraph objects

The setup_graphs() function pulls a 3D array and outputs a list containing a) weighted graph objects b) weighted graph objects with negative edges removed and c) density thresholded binary graphs. I like to pull them apart after running the function for simplicity’s sake. Usually, researchers use a density thresholded binary graph, so I’ll stick with the allg_density object.

gobjs <- setup_graphs(allmats, allowCache=TRUE)

## Loading weighted graph objects from file: /Users/natehall/Box Sync/DEPENd/Projects/RS_BPD_graph/bpd_rest/cache/weightedgraphs_schaefer422_aroma_pearson.RData

## Loading weighted non-negative graph objects from file: /Users/natehall/Box Sync/DEPENd/Projects/RS_BPD_graph/bpd_rest/cache/weightedgraphsnoneg_schaefer422_aroma_pearson.RData

## Loading aggregated graph (DEFAULT MEAN + NO NEG) from file: /Users/natehall/Box Sync/DEPENd/Projects/RS_BPD_graph/bpd_rest/cache/agg.g_schaefer422_aroma_pearson.RData

## Loading density-thresholded graph objects from file: /Users/natehall/Box Sync/DEPENd/Projects/RS_BPD_graph/bpd_rest/cache/dthreshgraphs_schaefer422_aroma_pearson.RData

allg <- gobjs$allg; allg_noneg <- gobjs$allg_noneg; allg_density <- gobjs$allg_density; agg.g <- gobjs$agg.g

allg_density is a list of the length of subjects with each element containing multiple densities (I’ve told the script above to take densities of 5% to 25%.). So, it’s a list of lists with elements of the inner list being equal to igraph objects. This is your graph object that igraph computes graph analysis on. We can take a look at two of the igraph objects for the first subject:

allg_density[[1]][[1]]

## IGRAPH aee9e72 UN-- 422 888 -- 
## + attr: id (g/c), density (g/n), name (v/c), x.mni (v/n), y.mni
## | (v/n), z.mni (v/n), anat_label (v/c), hemi (v/c)
## + edges from aee9e72 (vertex names):
##  [1] V1--V162 V1--V201 V1--V202 V2--V112 V2--V161 V2--V162 V2--V191
##  [8] V2--V201 V2--V202 V3--V9   V3--V14  V3--V24  V3--V29  V3--V201
## [15] V3--V205 V3--V206 V3--V208 V3--V223 V3--V226 V3--V230 V3--V283
## [22] V4--V26  V4--V29  V4--V111 V4--V112 V4--V191 V4--V205 V4--V206
## [29] V4--V208 V4--V214 V4--V224 V4--V229 V4--V230 V4--V283 V4--V315
## [36] V5--V24  V5--V29  V5--V205 V5--V206 V5--V208 V5--V211 V5--V223
## [43] V5--V226 V5--V229 V5--V230 V5--V267 V5--V283 V6--V206 V6--V208
## + ... omitted several edges

allg_density[[1]][[20]]

## IGRAPH 705e309 UN-- 422 17766 -- 
## + attr: id (g/c), density (g/n), name (v/c), x.mni (v/n), y.mni
## | (v/n), z.mni (v/n), anat_label (v/c), hemi (v/c)
## + edges from 705e309 (vertex names):
##  [1] V1--V3   V1--V4   V1--V5   V1--V6   V1--V8   V1--V9   V1--V13 
##  [8] V1--V14  V1--V17  V1--V26  V1--V27  V1--V28  V1--V29  V1--V31 
## [15] V1--V32  V1--V33  V1--V34  V1--V35  V1--V36  V1--V37  V1--V38 
## [22] V1--V39  V1--V40  V1--V41  V1--V42  V1--V49  V1--V51  V1--V56 
## [29] V1--V58  V1--V60  V1--V61  V1--V65  V1--V71  V1--V73  V1--V78 
## [36] V1--V80  V1--V81  V1--V83  V1--V86  V1--V89  V1--V92  V1--V93 
## [43] V1--V94  V1--V96  V1--V98  V1--V100 V1--V102 V1--V103 V1--V106
## + ... omitted several edges

A few orienting remarks (more detail can be found in the tutorials above) are that the UN refers to the fact that these are undirected graphs (i.e. no arrows between nodes) and that vertices have a name attribute. The third and fourth dashes are for W for weighted graphs and B for bipartite graphs. The following numbers 422 and 888/17766 refer to number of nodes and number of edges. Attributes are assigned in this script and denote whether attributes are global, vertex (node), or edge attributes and in this case are either numeric or character values.

Community assignment

I mentioned above that this step does not have distinct guidelines and requires a trained eye to spot if community structure theoretically makes sense. This gets complicated in the case of density thresholded graphs of what the right density is to perform community detection at. Some elect to perform community detection on an aggregated mean graph from the sample. Luckily we have this from setting up our graphs. We can also use a preassigned structure (for example from Yeo et al (2011)) based on an earlier specification:

if(use.yeo == 1){ 
  yeo7 <- yeo7_community(agg.g) #note: this is my function. Pay no mind to the super low modularity
  print(yeo7)
  ##now we assign these communities back into our graph objects. This will work for both weighted and density thresholded graphs
  allg_noneg <- assign_communities(allg_noneg, yeo7, "community")
  allg_density <- assign_communities(allg_density, yeo7, "community")
} else {
  louvain_agg <- cluster_louvain(agg.g) #perform louvain fast-unfolding algorithm to partition aggregated graph into modules. There are numerous other algorithms included in igraph
  
  allg_noneg <- assign_communities(allg_noneg, louvain_agg, "community")
  allg_density <- assign_communities(allg_density, louvain_agg, "community")
}

## IGRAPH clustering Yeo_etal_2011_7Networks, groups: 7, mod: -0.0085
## + groups:
##   $`1`
##    [1] "V1"   "V2"   "V3"   "V4"   "V5"   "V6"   "V7"   "V8"   "V9"  
##   [10] "V10"  "V11"  "V12"  "V13"  "V14"  "V15"  "V16"  "V17"  "V18" 
##   [19] "V19"  "V20"  "V21"  "V22"  "V23"  "V24"  "V25"  "V26"  "V27" 
##   [28] "V28"  "V29"  "V30"  "V31"  "V201" "V202" "V203" "V204" "V205"
##   [37] "V206" "V207" "V208" "V209" "V210" "V211" "V212" "V213" "V214"
##   [46] "V215" "V216" "V217" "V218" "V219" "V220" "V221" "V222" "V223"
##   [55] "V224" "V225" "V226" "V227" "V228" "V229" "V230"
##   
##   $`2`
##   + ... omitted several groups/vertices

allg_density[[1]][[20]]

## IGRAPH 705e309 UN-- 422 17766 -- 
## + attr: id (g/c), density (g/n), name (v/c), x.mni (v/n), y.mni
## | (v/n), z.mni (v/n), anat_label (v/c), hemi (v/c), community
## | (v/n)
## + edges from 705e309 (vertex names):
##  [1] V1--V3   V1--V4   V1--V5   V1--V6   V1--V8   V1--V9   V1--V13 
##  [8] V1--V14  V1--V17  V1--V26  V1--V27  V1--V28  V1--V29  V1--V31 
## [15] V1--V32  V1--V33  V1--V34  V1--V35  V1--V36  V1--V37  V1--V38 
## [22] V1--V39  V1--V40  V1--V41  V1--V42  V1--V49  V1--V51  V1--V56 
## [29] V1--V58  V1--V60  V1--V61  V1--V65  V1--V71  V1--V73  V1--V78 
## [36] V1--V80  V1--V81  V1--V83  V1--V86  V1--V89  V1--V92  V1--V93 
## + ... omitted several edges

print(V(allg_density[[1]][[20]])$community)

##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2
##  [36] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3
##  [71] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [106] 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## [141] 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
## [176] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 1 1 1 1 1 1 1 1 1 1
## [211] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [246] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
## [281] 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [316] 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## [351] 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
## [386] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 5 5 5 6 2 3 7 6 2 3 7 2 4 5 6 7 2 4 5
## [421] 6 7

As you can see, when we assign communities, the nodes in the graph get a new numeric attribute. We can look at the community assignment by using the V() function, which asks igraph for vertex (node) properties (specified with the $). V(igraph_object) will just give you all of the nodes, which I’ve labelled V1-V422.

V(allg_density[[1]][[20]])

## + 422/422 vertices, named, from 705e309:
##   [1] V1   V2   V3   V4   V5   V6   V7   V8   V9   V10  V11  V12  V13  V14 
##  [15] V15  V16  V17  V18  V19  V20  V21  V22  V23  V24  V25  V26  V27  V28 
##  [29] V29  V30  V31  V32  V33  V34  V35  V36  V37  V38  V39  V40  V41  V42 
##  [43] V43  V44  V45  V46  V47  V48  V49  V50  V51  V52  V53  V54  V55  V56 
##  [57] V57  V58  V59  V60  V61  V62  V63  V64  V65  V66  V67  V68  V69  V70 
##  [71] V71  V72  V73  V74  V75  V76  V77  V78  V79  V80  V81  V82  V83  V84 
##  [85] V85  V86  V87  V88  V89  V90  V91  V92  V93  V94  V95  V96  V97  V98 
##  [99] V99  V100 V101 V102 V103 V104 V105 V106 V107 V108 V109 V110 V111 V112
## [113] V113 V114 V115 V116 V117 V118 V119 V120 V121 V122 V123 V124 V125 V126
## [127] V127 V128 V129 V130 V131 V132 V133 V134 V135 V136 V137 V138 V139 V140
## + ... omitted several vertices

Compute graph metrics

Now that we have our graphs with communities assigned we can let igraph do what it does best (note: if the graph metrics you are interested in computing do not involve communities you can skip the previous step)! We can calculate both global and nodal metrics using compute_global_metrics and compute_nodal_metrics:

globalmetrics_dthresh <- compute_global_metrics(allg_density, allowCache = TRUE, community_attr = "community")

## Loading density-thresholded global statistics from file: /Users/natehall/Box Sync/DEPENd/Projects/RS_BPD_graph/bpd_rest/cache/dthreshglobmetrics_schaefer422_aroma_pearson.RData

nodalmetrics_dthresh <- compute_nodal_metrics(allg_density, allowCache=TRUE, community_attr="community")

## Loading density thresholded nodal statistics from file: /Users/natehall/Box Sync/DEPENd/Projects/RS_BPD_graph/bpd_rest/cache/dthreshnodemetrics_schaefer422_aroma_pearson.RData

To show some of the inner workings of these functions, here is what graph metrics I’ve decided to compute. If there are others you’d like to use, the igraph and brainGraph teams try and keep things as up to date as possible. From compute_global_metrics:

 allmetrics.global <- foreach(subj=graphs, .packages = c("igraph", "brainGraph"), .export=c("calcGraph_global", "densities_desired")) %dopar% {
      #for (subj in graphs) { #put here for more fine-grained debugging
      dl <- lapply(subj, function(dgraph) {
        glist <- calcGraph_global(dgraph, community_attr=community_attr) #this will work for both weighted and unweighted graphs, right now modularity weighted components set to NULL but can change if desired. 
        glist$id <- dgraph$id #copy attributes for flattening to data.frame
        glist$density <- dgraph$density
        return(glist)
      })
      names(dl) <- paste0("d", densities_desired)
      dl
    }

calcGraph_global includes:

globmetrics[["characteristic_path_length"]] <- mean_distance(origgraph) #does not use edge weights
  globmetrics[["clustering_coefficient"]] <- transitivity(origgraph, type = "global") #uses edge weights if the graph has an edge weight attribute
  globmetrics[["small_worldness"]] <- (mean_distance(origgraph)/mean_distance(rewire(origgraph, with = keeping_degseq(loops = FALSE, niter = 1000))))/(transitivity(origgraph, type = "global")/transitivity(rewire(origgraph, with = keeping_degseq(loops = FALSE, niter = 1000)), type = "global")) 
  
  if (!is.null(vertex_attr(origgraph, community_attr))) {
    if(is.null(modular_weighted)){
    globmetrics[["modularity"]] <- modularity(origgraph, vertex_attr(origgraph, community_attr))
    } else {globmetrics[["modularity"]] <- modularity(origgraph, vertex_attr(origgraph, community_attr), weights = E(origgraph)$weight)}
  }

compute_nodal_metrics is similar, only it calls on calcGraph_nodal, which looks like this:

 ##centrality
  allmetrics$local.clustering <- transitivity(origgraph, type = "local")
  allmetrics$degree <- degree(origgraph, v = V(origgraph))
  evcent<- eigen_centrality(origgraph, directed = FALSE)
  allmetrics$eigen.cent <- evcent$vector
  allmetrics$closeness <- closeness(origgraph, mode = "all", normalized=TRUE)#, weights=1/E(origgraph)$weight)
  allmetrics$betweenness.node <- betweenness(origgraph)#, weights=1/E(origgraph)$weight)
  allmetrics$page.rank <- page.rank(origgraph, algo="prpack")$vector

  #only compute community-related statistics if the expected community attribute is available
  if (!is.null(vertex_attr(origgraph, community_attr))) {
    allmetrics$community.membership <- vertex_attr(origgraph, community_attr)
    
    ##nodal within vs between modules
    wibw <- wibw_module_degree(origgraph, community_attr=community_attr)
    allmetrics[["within.module.deg.zscore"]] <- wibw$z_within
    allmetrics[["between.module.deg.zscore"]] <- wibw$z_between
    allmetrics[["part.coeff"]] <- part_coeff(origgraph, allmetrics$community.membership)#, use.parallel = FALSE)
    allmetrics[["gateway.coeff.btw"]] <- gateway_coeff(origgraph, allmetrics$community.membership, centr = "btwn.cent")
    allmetrics[["gateway.coeff.degree"]] <- gateway_coeff(origgraph, allmetrics$community.membership, centr = "degree")    
    }

PCA across metrics and densities

In this case, I won’t worry about global metrics but will focus on node-level centrality estimates. In our example, we said earlier that we don’t want to have to choose a “correct” density to threshold at so we density thresholded and binarized at multiple densities (.05-.25). We could perform statisitical comparisons at each density, but then find ourselves left with some very unweildy results that we need to decide to interpret (e.g. why did node V415 have significantly higher degree in the task-positive condition at densities .10-.17 but stop being significant at higher or lower levels?). One solution is to use a dimensionality reduction technique such as principle components analysis (PCA) to identify latent “directions”" that explain large ammounts of covariation in the observed nodal metrics across densities:

  toanalyze <- PCA_all(nodalmetrics_dthresh$allmetrics.nodal.df, pcametrics, allowCache = TRUE, den = .05)

## Loading toanalyze from file: /Users/natehall/Box Sync/DEPENd/Projects/RS_BPD_graph/bpd_rest/cache/toanalyze.pca.schaefer422.aroma.pearson.a_priori.csv

head(toanalyze)

id	node	central	integration	within.mod	closeness	betweenness	Age
001RA	V1	1.299	1.077	-0.515	-0.378	1.485	20.5
001RA	V10	0.234	0.156	-0.606	-0.400	-0.648	20.5
001RA	V100	0.669	0.606	0.828	-0.386	0.850	20.5
001RA	V101	-0.986	-0.100	-0.508	-0.404	-0.678	20.5
001RA	V102	0.018	0.268	0.586	-0.392	-0.504	20.5
001RA	V103	0.915	0.380	1.211	-0.384	0.173	20.5

The output of this function gives principle component scores across densities and nodal metrics. Central denotes overall centrality (degree, eigenvector, betweenness, page rank), integration denotes the level of between-module connectivity (participation coefficient and gateway coefficient), the final two components denote within-module connectivity and closeness centrality. To see what goes on behind the scenes feel free to look at the code for the function and I’m happy to discuss what’s going on here.

Statistical tests between groups

Now that we have the data in a form that’s a bit more manageable, we can conduct simple t tests between groups to test for differences in centrality and can test for interactions with continuous variables as well. This can all be performed with the run_group_comparisons function. This will output results from group level t-tests and linear regression that can allow us to examine age x group effects:

sig_PCA_nodes <- run_group_comparisons_nodal(toanalyze, abst = 2.64, allowCache = TRUE)

#pull sig_PCA_nodes apart for simplicity
signod.lm <- sig_PCA_nodes$all.siglm.nodal.pca
signod.bpd <- sig_PCA_nodes$all.sigttest.nodal.pca

bpd.main.all <- plot_significant_groupeffects(signod.bpd) ##output from this function required for writing .node files for brainnet viewer

# plot_significant_ageeffects(signod.lm)
plot_significant_ixn(signod.lm)

The plots you will obtain from main effects of group comparisons will look like this: some text And group x age(or any continuous predictor, e.g. IQ scores or scores on a self-reported symptom scale) will like this:

Basics of brain graph analysis in R with igraph

We R: The Penn State R Users Group

Nate Hall

11/1/2017