Overview

Although mediation is extremely difficult to implement in a way that supports the desired inferences, there is some potential for using experience sampling to examine within-person mediation. This approach may provide for more sound inferences than the usual cross-sectional (between-person) mediation. In this tutorial, we illustrate how the within-person (1-1-1) mediation model if fit in a multilevel modeling framework. Generally, we follow the example in Bolger and Laurenceau (2013) Chapter 9: Within-subject Mediation Analysis, but also make use of other resources.

Mediation is said to occur when the effect of one variable (x) on another (y) is transmitted through an intervening variable (m).

There are many resources on mediation, but not so many R-based examples and implementations of within-person mediation. We’ve done our best to compile these together into a workable example. A good set of general resources is here (note also their caution about one section of code at the beginning) https://stats.idre.ucla.edu/r/faq/how-can-i-perform-mediation-with-multilevel-data-method-2/, and also on the sites of researchers studying mediation, e.g., http://quantpsy.org/, http://afhayes.com/index.html, http://davidakenny.net/cm/mediate.htm, https://psychology.clas.asu.edu/research/labs/research-prevention-laboratory-mackinnon.

Outline

In this tutorial, we will cover…

  1. Introduction to the Within-Person Mediation Model
  2. An Example using the Multivariate Multilevel Model
  3. Conclusion

Preliminaries

Loading Libraries

Loading libraries used in this script.

library(boot) #for bootstrap functions
library(ggplot2) #for data viz
library(lme4) #for multilevel models
library(nlme) #for multilevel moels
library(psych) #for describing data
library(reshape2) #for manipulating data 

1. Introduction to the Within-Person Mediation Model

In basic form, mediation occurs when the effect of an independent variable (x) on a dependent variable (y) is transmitted via a mediator variable (m). This “mediation effect” is often referred to as the indirect effect of X on Y through M. Mediation models allow researchers to test simple hypotheses about “causal processes.” Mediation models often involve parsing the total effect (c) of X on Y into a direct effect (c’) and an indirect effect (a × b). These coefficients can be derived by fitting a set of simultaneous equations to sample data using linear regression or path analysis. Specifically, in the within-person world,

\[M_{it} = d_{Mi} + a_{i}X_{it} + e_{Mit}\] \[Y_{it} = d_{Yi} + b_{i}M_{it} + c'_{i}X_{it} + e_{Yit}\] where \(d_{Mi}\) and \(d_{Yi}\) are person-specific intercept terms.

2. An Example using a Multivariate Multilevel Model

We follow the example from Bolger & Laurenceau (2013) Chapter 9.

In Session O we examined the extent to which daily work stressors were associated with daily relationship dissatisfaction. Here, we examine the extent to which work dissatisfaction can explain that association. For parsimony of presentation, we use data from the female partners only.

The data are daily repeated measures of work stress, work dissatisfaction, and relationship satisfaction for females within a dyad.

Loading Data

Loading data file.

#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/B&Lmediation.csv"
#read in the .csv file using the url() function
data <- read.csv(file = url(filepath), header = TRUE)

# Examine first few rows of the data set
head(data, 10)
##     id time timec  freldis   fwkdis fwkstr fwkstrc    fwkdisc    freldisc
## 1  101    1   -10 3.034483 5.590119      3    0.03  0.3601189 -1.60551724
## 2  101    2    -9 4.620690 5.535224      3    0.03  0.3052242 -0.01931034
## 3  101    3    -8 2.850575 3.888381      3    0.03 -1.3416194 -1.78942529
## 4  101    4    -7 6.398467 5.352242      4    1.03  0.1222415  1.75846743
## 5  101    5    -6 2.544061 4.483074      1   -1.97 -0.7469259 -2.09593870
## 6  101    6    -5 5.164751 3.339433      2   -0.97 -1.8905672  0.52475096
## 7  101    7    -4 2.704981 4.135407      3    0.03 -1.0945929 -1.93501916
## 8  101    8    -3 5.003831 5.800549      4    1.03  0.5705489  0.36383142
## 9  101    9    -2 4.099617 5.434584      3    0.03  0.2045837 -0.54038314
## 10 101   10    -1 5.471264 4.830741      2   -0.97 -0.3992589  0.83126437
##      fwkstrcb   fwkdiscb  freldiscb   fwkstrcw   fwkdiscw  freldiscw
## 1  -0.3033333 -0.6227591 -0.1634446  0.3333333  0.9828781 -1.4420726
## 2  -0.3033333 -0.6227591 -0.1634446  0.3333333  0.9279833  0.1441343
## 3  -0.3033333 -0.6227591 -0.1634446  0.3333333 -0.7188603 -1.6259807
## 4  -0.3033333 -0.6227591 -0.1634446  1.3333333  0.7450007  1.9219121
## 5  -0.3033333 -0.6227591 -0.1634446 -1.6666667 -0.1241668 -1.9324941
## 6  -0.3033333 -0.6227591 -0.1634446 -0.6666667 -1.2678081  0.6881956
## 7  -0.3033333 -0.6227591 -0.1634446  0.3333333 -0.4718337 -1.7715745
## 8  -0.3033333 -0.6227591 -0.1634446  1.3333333  1.1933081  0.5272760
## 9  -0.3033333 -0.6227591 -0.1634446  0.3333333  0.8273428 -0.3769385
## 10 -0.3033333 -0.6227591 -0.1634446 -0.6666667  0.2235002  0.9947090
##             x          m          y
## 1   0.3333333  0.9828781 -1.4420726
## 2   0.3333333  0.9279833  0.1441343
## 3   0.3333333 -0.7188603 -1.6259807
## 4   1.3333333  0.7450007  1.9219121
## 5  -1.6666667 -0.1241668 -1.9324941
## 6  -0.6666667 -1.2678081  0.6881956
## 7   0.3333333 -0.4718337 -1.7715745
## 8   1.3333333  1.1933081  0.5272760
## 9   0.3333333  0.8273428 -0.3769385
## 10 -0.6666667  0.2235002  0.9947090

The variables of interest in this example are fwkstrcw (the predictor: female work stress centered within person),fwkdiscw (the mediator: female work dissatisfaction centered within person),freldiscw (the outcome: female relationship dissatisfaction centered within person). Note that these variables are state variables. They have already been person-mean centered. We are not currently interested in between-person differences, so they have been separated and set aside. In sum, these data are already person-centered. It may take some steps to get your data to this form.

Copies of the three variables of interest have been made and conveniently labeled as x,m, and y for easy conceptualization and operationalization of the mediation model.

Descriptives of within-person variables

Describe the data.

#variables of interest
vars <- c("fwkstrcw", "fwkdiscw", "freldiscw", "x", "m", "y")
#descriptives
describe(data[, vars])
##           vars    n mean   sd median trimmed  mad   min  max range skew
## fwkstrcw     1 2100    0 1.00   0.00       0 1.20 -2.90 2.95  5.86 0.05
## fwkdiscw     2 2100    0 1.13  -0.01       0 1.15 -3.81 3.81  7.63 0.02
## freldiscw    3 2100    0 0.99   0.00       0 0.98 -3.52 3.22  6.74 0.02
## x            4 2100    0 1.00   0.00       0 1.20 -2.90 2.95  5.86 0.05
## m            5 2100    0 1.13  -0.01       0 1.15 -3.81 3.81  7.63 0.02
## y            6 2100    0 0.99   0.00       0 0.98 -3.52 3.22  6.74 0.02
##           kurtosis   se
## fwkstrcw     -0.25 0.02
## fwkdiscw     -0.11 0.02
## freldiscw     0.07 0.02
## x            -0.25 0.02
## m            -0.11 0.02
## y             0.07 0.02

As expected, the variables all have mean = 0, as they have been person-centered. Standard deviations near one are not purposeful, just coincidence.

Plots of within-person associations

We plot each leg of the model for a few individuals.

x –> y: The c path.
fwkstrcw –> freldiscw

#within-person regressions, x --> y
ggplot(data = data[which(data$id <= 106), ], 
       aes(x = fwkstrcw, y = freldiscw, group = id)) +
  geom_point(color = "black") + 
  geom_smooth(method = lm, se = TRUE, fullrange = TRUE, color = "black") + 
  xlab("x = Work Stress (within)") +
  ylab("y = Relationship Dissatisfaction (within)") +
  facet_wrap( ~ id)

x –> m: The a path. fwkstrcw –> fwkdiscw

#within-person regressions, x --> m
ggplot(data = data[which(data$id <= 106),], 
       aes(x = fwkstrcw, y = fwkdiscw, group = id)) +
  geom_point(color = "blue") + 
  geom_smooth(method = lm, se = TRUE, fullrange = TRUE, color = "blue") + 
  xlab("x = Work Stress (within)") +
  ylab("m = Work Dissatisfaction (within)") +
  facet_wrap( ~ id)

m –> y: The b path.
fwkdiscw –> freldiscw

#within-person regressions, m --> y
ggplot(data = data[which(data$id <= 106),], 
       aes(x = fwkdiscw, y = freldiscw, group = id)) +
  geom_point(color = "red") + 
  geom_smooth(method = lm, se = TRUE, fullrange = TRUE, color = "red") + 
  xlab("m = Work Dissatisfaction (within)") +
  ylab("y = Relationship Dissatisfaction (within)") +
  facet_wrap( ~ id)

Restructuring the Data Set into Double-entry data.

As noted in the introduction, the mediation model can be conceived as a multivariate multilevel model where both y and m are outcome variables.

Practically, fitting the bivariate outcome model is facilitated through some data-restructuring, specifically “double-entry” data.

Restructuring is done in just a few steps.

  1. Subset down to the variables of interest.
data1 <- data[ , c("id", "time", "timec", "fwkstrcw", "fwkdiscw", "freldiscw", "x", "m", "y")]

#look at data set
head(data1, 10)
##     id time timec   fwkstrcw   fwkdiscw  freldiscw          x          m
## 1  101    1   -10  0.3333333  0.9828781 -1.4420726  0.3333333  0.9828781
## 2  101    2    -9  0.3333333  0.9279833  0.1441343  0.3333333  0.9279833
## 3  101    3    -8  0.3333333 -0.7188603 -1.6259807  0.3333333 -0.7188603
## 4  101    4    -7  1.3333333  0.7450007  1.9219121  1.3333333  0.7450007
## 5  101    5    -6 -1.6666667 -0.1241668 -1.9324941 -1.6666667 -0.1241668
## 6  101    6    -5 -0.6666667 -1.2678081  0.6881956 -0.6666667 -1.2678081
## 7  101    7    -4  0.3333333 -0.4718337 -1.7715745  0.3333333 -0.4718337
## 8  101    8    -3  1.3333333  1.1933081  0.5272760  1.3333333  1.1933081
## 9  101    9    -2  0.3333333  0.8273428 -0.3769385  0.3333333  0.8273428
## 10 101   10    -1 -0.6666667  0.2235002  0.9947090 -0.6666667  0.2235002
##             y
## 1  -1.4420726
## 2   0.1441343
## 3  -1.6259807
## 4   1.9219121
## 5  -1.9324941
## 6   0.6881956
## 7  -1.7715745
## 8   0.5272760
## 9  -0.3769385
## 10  0.9947090
  1. Melt the two outcome variables (y and m) into a single z variable using the melt() function in the reshape2 library.
#trick to get variable names for easy cut and paste
dput(names(data1))
## c("id", "time", "timec", "fwkstrcw", "fwkdiscw", "freldiscw", 
## "x", "m", "y")
#melting data
datalong <- melt(data = data1,
               id.vars = c("id", "time", "timec", "fwkstrcw", "fwkdiscw", "freldiscw", "x"),  #("m","y") have been removed
               na.rm = FALSE, 
               variable.name = "dv",
               value.name = "z")

# Reorder rows for convenience 
datalong <- datalong[order(datalong$id, datalong$time, datalong$dv), ]
#look at updated data set
head(datalong, 10)
##       id time timec   fwkstrcw   fwkdiscw  freldiscw          x dv
## 1    101    1   -10  0.3333333  0.9828781 -1.4420726  0.3333333  m
## 2101 101    1   -10  0.3333333  0.9828781 -1.4420726  0.3333333  y
## 2    101    2    -9  0.3333333  0.9279833  0.1441343  0.3333333  m
## 2102 101    2    -9  0.3333333  0.9279833  0.1441343  0.3333333  y
## 3    101    3    -8  0.3333333 -0.7188603 -1.6259807  0.3333333  m
## 2103 101    3    -8  0.3333333 -0.7188603 -1.6259807  0.3333333  y
## 4    101    4    -7  1.3333333  0.7450007  1.9219121  1.3333333  m
## 2104 101    4    -7  1.3333333  0.7450007  1.9219121  1.3333333  y
## 5    101    5    -6 -1.6666667 -0.1241668 -1.9324941 -1.6666667  m
## 2105 101    5    -6 -1.6666667 -0.1241668 -1.9324941 -1.6666667  y
##               z
## 1     0.9828781
## 2101 -1.4420726
## 2     0.9279833
## 2102  0.1441343
## 3    -0.7188603
## 2103 -1.6259807
## 4     0.7450007
## 2104  1.9219121
## 5    -0.1241668
## 2105 -1.9324941
  1. Add dummy variables that indicate row entry.
#adding the double indicators 
datalong$dy <- ifelse(datalong$dv == "y", 1, 0)
datalong$dm <- ifelse(datalong$dv == "m", 1, 0)
datalong$dvnum <- ifelse(datalong$dv == "m", 1, 0)

#look at updated data set
head(datalong, 10)
##       id time timec   fwkstrcw   fwkdiscw  freldiscw          x dv
## 1    101    1   -10  0.3333333  0.9828781 -1.4420726  0.3333333  m
## 2101 101    1   -10  0.3333333  0.9828781 -1.4420726  0.3333333  y
## 2    101    2    -9  0.3333333  0.9279833  0.1441343  0.3333333  m
## 2102 101    2    -9  0.3333333  0.9279833  0.1441343  0.3333333  y
## 3    101    3    -8  0.3333333 -0.7188603 -1.6259807  0.3333333  m
## 2103 101    3    -8  0.3333333 -0.7188603 -1.6259807  0.3333333  y
## 4    101    4    -7  1.3333333  0.7450007  1.9219121  1.3333333  m
## 2104 101    4    -7  1.3333333  0.7450007  1.9219121  1.3333333  y
## 5    101    5    -6 -1.6666667 -0.1241668 -1.9324941 -1.6666667  m
## 2105 101    5    -6 -1.6666667 -0.1241668 -1.9324941 -1.6666667  y
##               z dy dm dvnum
## 1     0.9828781  0  1     1
## 2101 -1.4420726  1  0     0
## 2     0.9279833  0  1     1
## 2102  0.1441343  1  0     0
## 3    -0.7188603  0  1     1
## 2103 -1.6259807  1  0     0
## 4     0.7450007  0  1     1
## 2104  1.9219121  1  0     0
## 5    -0.1241668  0  1     1
## 2105 -1.9324941  1  0     0

The data set datalong is now ready for analysis.

Running the mediation model -

We are now ready to start running the mediation model as a multivariate multilevel model with two outcomes.

We’ll construct a model for the mediator variable as outcome with fwkstrcw (x) and timec as predictors; and a model for the y variable as outcome with fwkstrcw (x) and fwkdiscw (m) and timec as predictors. All the variables have been person centered so that there is no need for intercept terms.

We use the two dummy variables (dm and dy) to turn on and off the parameters for each of the outcomes. The parameters invoked with \(dm\) are for the mediator as outcome model to get the a path, and parameters invoked with \(dy\) are for the y as outcome model to get the b and c’ paths.

Setting up and fitting the 1-1-1 mediation model using lme().

#lme mediation model
model_lme <- lme(fixed = z ~ -1 + 
                             dm + dm:fwkstrcw + dm:timec + #m as outcome
                             dy + dy:fwkdiscw + dy:fwkstrcw + dy:timec, #y as outcome
                  random = ~ -1  +  dm:fwkstrcw + dy:fwkdiscw + dy:fwkstrcw | id, 
                  weights = varIdent(form = ~ 1 | dvnum), #separate sigma^{2}_{e} for each outcome
                  data = datalong,
                  na.action = na.exclude,
                  control = lmeControl(opt = "optim", maxIter = 200, msMaxIter = 200, niterEM = 50, msMaxEval = 400))

summary(model_lme)
## Linear mixed-effects model fit by REML
##  Data: datalong 
##        AIC      BIC    logLik
##   12195.67 12290.79 -6082.837
## 
## Random effects:
##  Formula: ~-1 + dm:fwkstrcw + dy:fwkdiscw + dy:fwkstrcw | id
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev     Corr         
## dm:fwkstrcw 0.26103020 dm:fwk dy:fwk
## dy:fwkdiscw 0.21769909 0.542        
## fwkstrcw:dy 0.08798665 0.441  0.932 
## Residual    1.08982601              
## 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | dvnum 
##  Parameter estimates:
##         1         0 
## 1.0000000 0.8492324 
## Fixed effects: z ~ -1 + dm + dm:fwkstrcw + dm:timec + dy + dy:fwkdiscw + dy:fwkstrcw +      dy:timec 
##                   Value  Std.Error   DF   t-value p-value
## dm          -0.00000623 0.02378243 4094 -0.000262  0.9998
## dy          -0.00011824 0.02019650 4094 -0.005855  0.9953
## dm:fwkstrcw  0.19030912 0.03559543 4094  5.346448  0.0000
## dm:timec    -0.00583706 0.00397737 4094 -1.467570  0.1423
## dy:fwkdiscw  0.14780833 0.02866743 4094  5.155966  0.0000
## fwkstrcw:dy  0.10548277 0.02267412 4094  4.652121  0.0000
## timec:dy    -0.00247302 0.00338844 4094 -0.729839  0.4655
##  Correlation: 
##             dm     dy     dm:fwk dm:tmc dy:fwk fwkst:
## dy           0.000                                   
## dm:fwkstrcw -0.001  0.000                            
## dm:timec     0.000  0.000 -0.012                     
## dy:fwkdiscw  0.000  0.000  0.301  0.001              
## fwkstrcw:dy  0.000 -0.001  0.127  0.001  0.190       
## timec:dy     0.000  0.000  0.000  0.003  0.015 -0.014
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.547996904 -0.675619227 -0.007149205  0.670459292  3.349086939 
## 
## Number of Observations: 4200
## Number of Groups: 100

The results look good and closely match the results in the Bolger & Laurenceau (2013) book! Soon, we will fill more interpretataion in here of the lme() model output (and use this model to obtain confidence intervals).

The fixed effects -

#pulling out fixed effects info
FE <- fixef(model_lme)
FE
##            dm            dy   dm:fwkstrcw      dm:timec   dy:fwkdiscw 
## -6.234341e-06 -1.182449e-04  1.903091e-01 -5.837061e-03  1.478083e-01 
##   fwkstrcw:dy      timec:dy 
##  1.054828e-01 -2.473016e-03

Let’s interpret …

  • 0.0000 = intercept in the m as outcome model (\(d_{Mi}\): We expect this to be zero because we person-centered all the data)
  • 0.0001 = intercept in the y as outcome model (\(d_{Yi}\): We expect this to be zero because we person-centered all the data)
  • 0.1903 = effect of x –> m (a: work stressors predicting work dissatisfaction)
  • -0.0058 = effect of time –> m (time trend in work dissatisfaction)
  • 0.1478 = effect of m –> y (b: work dissatisfaction predicting relationship dissatisfaction)
  • 0.1054 = effect of x –> y (c’: work stressors predicting relationship dissatisfaction, after adjusting for work dissatisfaction)
  • -0.0024 = effect of time –> y (time trend in relationship dissatisfaction)

Let’s put the parameters into named objects, as these will be useful later when calculating confidence intervals.

#making parameter objects
a <- as.numeric(FE[3])
a
## [1] 0.1903091
b <- as.numeric(FE[5])
b
## [1] 0.1478083
cprime <- as.numeric(FE[6])
cprime
## [1] 0.1054828

The random effects -

#pulling out random effects info
VarCorr(model_lme)
## id = pdLogChol(-1 + dm:fwkstrcw + dy:fwkdiscw + dy:fwkstrcw) 
##             Variance    StdDev     Corr         
## dm:fwkstrcw 0.068136768 0.26103020 dm:fwk dy:fwk
## dy:fwkdiscw 0.047392894 0.21769909 0.542        
## fwkstrcw:dy 0.007741651 0.08798665 0.441  0.932 
## Residual    1.187720737 1.08982601

They are a little easier to interpret and work with when placed in objects …

The variance of the a paths,

#The *variance* of the *a* paths
sig2_a <- as.numeric(VarCorr(model_lme)[1,1])
sig2_a
## [1] 0.06813677

\(\sigma^2_{a}\),= 0.0681368

The variance of the b paths,

sig2_b <- as.numeric(VarCorr(model_lme)[2,1])
sig2_b
## [1] 0.04739289

\(\sigma^2_{b}\),is 0.0473929

The variance of the c’ paths,

sig2_cprime <- as.numeric(VarCorr(model_lme)[3,1])
sig2_cprime
## [1] 0.007741651

\(\sigma^2_{c'}\),is 0.0077417

The residual variance of the mediator variable,

sig2_em <- (1.00*as.numeric(VarCorr(model_lme)[4,2]))^2
sig2_em
## [1] 1.187721

\(\sigma^2_{em}\), calculated as \((0.8492324 + 1.187721)^{2}\) = 1.1877207.

The residual variance of the outcome variable,

sig2_ey <- (0.8492324*as.numeric(VarCorr(model_lme)[4,2]))^2
sig2_ey
## [1] 0.856579

\(\sigma^2_{ey}\), is calculated as \((0.8492324 + 1.187721)^{2}\) = 0.856579.

These residual variances differ from the values in the book. I am not sure why.

The covariance between the \(a_{j}\) and \(b_{j}\) paths

covajbj <- as.numeric(VarCorr(model_lme)[2,2])*as.numeric(VarCorr(model_lme)[2,3])*as.numeric(VarCorr(model_lme)[1,2]) 
covajbj 
## [1] 0.03079971

\(\sigma_{ajbj}\) is calcuated as \(\sigma_{aj}*r_{ajbj}*\sigma_{bj}\) = 0.0307997

Yay! We can now use all these to calculate the indirect and total effects.

Calculating mediated effect (e.g., indirect effect)

IndirectEffects = ab + \(\sigma_{ajbj}\)
= (0.1903*0.1478) + 0.0308
= 0.0589

indirecteffect <- a*b + covajbj   
indirecteffect
## [1] 0.05892898

Calculating total effect c

c = c’ + ab + \(\sigma_{ajbj}\)
= 0.1054 + (0.1903*0.1478) + 0.0308
= 0.1644

totaleffect <- cprime + a*b + covajbj 
totaleffect
## [1] 0.1644117

Calculating percent of total effect accounted for by mediated effect

PercentMediated = (ab + \(\sigma_{ajbj}\)) / (c’ + ab + \(\sigma_{ajbj}\))
= IndirectEffect / c
= 0.0589 / 0.1644
= 0.3584

percentmediated <- 100*(indirecteffect/totaleffect)  
percentmediated
## [1] 35.84232

Calculating percent of mediated effect accounted for by ab covariance (\(\sigma_{ajbj}\))

PercentCovariance = \(\sigma_{ajbj}\) / (ab + \(\sigma_{ajbj}\))
= \(\sigma_{ajbj}\) / ie
= 0.0308 / 0.0589
= 0.5226

percentcovariance <- 100*(covajbj/indirecteffect) 
percentcovariance
## [1] 52.26581

All these calculations match the results in Bolger & Laurenceau (2013) Chapter 9 pretty well.

3. Conclusion

We have briefly reviewed within-person (1-1-1) mediation models. For further information and discussion of moderated mediation in multilevel models check out the paper here http://quantpsy.org/pubs/bauer_preacher_gil_2006.pdf. There are some interesting extensions that can be used to obtain bootstrapped confidence intervals for the mediated effects.

As always, please be careful with the mediational inferences.

Thanks for playing!