Although mediation is extremely difficult to implement in a way that supports the desired inferences, there is some potential for using experience sampling to examine within-person mediation. This approach may provide for more sound inferences than the usual cross-sectional (between-person) mediation. In this tutorial, we illustrate how the within-person (1-1-1) mediation model if fit in a multilevel modeling framework. Generally, we follow the example in Bolger and Laurenceau (2013) Chapter 9: Within-subject Mediation Analysis, but also make use of other resources.
Mediation is said to occur when the effect of one variable (x) on another (y) is transmitted through an intervening variable (m).
There are many resources on mediation, but not so many R-based examples and implementations of within-person mediation. We’ve done our best to compile these together into a workable example. A good set of general resources is here (note also their caution about one section of code at the beginning) https://stats.idre.ucla.edu/r/faq/how-can-i-perform-mediation-with-multilevel-data-method-2/, and also on the sites of researchers studying mediation, e.g., http://quantpsy.org/, http://afhayes.com/index.html, http://davidakenny.net/cm/mediate.htm, https://psychology.clas.asu.edu/research/labs/research-prevention-laboratory-mackinnon.
In this tutorial, we will cover…
Loading Libraries
Loading libraries used in this script.
library(boot) #for bootstrap functions
library(ggplot2) #for data viz
library(lme4) #for multilevel models
library(nlme) #for multilevel moels
library(psych) #for describing data
library(reshape2) #for manipulating data
In basic form, mediation occurs when the effect of an independent variable (x) on a dependent variable (y) is transmitted via a mediator variable (m). This “mediation effect” is often referred to as the indirect effect of X on Y through M. Mediation models allow researchers to test simple hypotheses about “causal processes.” Mediation models often involve parsing the total effect (c) of X on Y into a direct effect (c’) and an indirect effect (a × b). These coefficients can be derived by fitting a set of simultaneous equations to sample data using linear regression or path analysis. Specifically, in the within-person world,
\[M_{it} = d_{Mi} + a_{i}X_{it} + e_{Mit}\] \[Y_{it} = d_{Yi} + b_{i}M_{it} + c'_{i}X_{it} + e_{Yit}\] where \(d_{Mi}\) and \(d_{Yi}\) are person-specific intercept terms.
In Session O we examined the extent to which daily work stressors were associated with daily relationship dissatisfaction. Here, we examine the extent to which work dissatisfaction can explain that association. For parsimony of presentation, we use data from the female partners only.
The data are daily repeated measures of work stress, work dissatisfaction, and relationship satisfaction for females within a dyad.
Loading Data
Loading data file.
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/B&Lmediation.csv"
#read in the .csv file using the url() function
data <- read.csv(file = url(filepath), header = TRUE)
# Examine first few rows of the data set
head(data, 10)
## id time timec freldis fwkdis fwkstr fwkstrc fwkdisc freldisc
## 1 101 1 -10 3.034483 5.590119 3 0.03 0.3601189 -1.60551724
## 2 101 2 -9 4.620690 5.535224 3 0.03 0.3052242 -0.01931034
## 3 101 3 -8 2.850575 3.888381 3 0.03 -1.3416194 -1.78942529
## 4 101 4 -7 6.398467 5.352242 4 1.03 0.1222415 1.75846743
## 5 101 5 -6 2.544061 4.483074 1 -1.97 -0.7469259 -2.09593870
## 6 101 6 -5 5.164751 3.339433 2 -0.97 -1.8905672 0.52475096
## 7 101 7 -4 2.704981 4.135407 3 0.03 -1.0945929 -1.93501916
## 8 101 8 -3 5.003831 5.800549 4 1.03 0.5705489 0.36383142
## 9 101 9 -2 4.099617 5.434584 3 0.03 0.2045837 -0.54038314
## 10 101 10 -1 5.471264 4.830741 2 -0.97 -0.3992589 0.83126437
## fwkstrcb fwkdiscb freldiscb fwkstrcw fwkdiscw freldiscw
## 1 -0.3033333 -0.6227591 -0.1634446 0.3333333 0.9828781 -1.4420726
## 2 -0.3033333 -0.6227591 -0.1634446 0.3333333 0.9279833 0.1441343
## 3 -0.3033333 -0.6227591 -0.1634446 0.3333333 -0.7188603 -1.6259807
## 4 -0.3033333 -0.6227591 -0.1634446 1.3333333 0.7450007 1.9219121
## 5 -0.3033333 -0.6227591 -0.1634446 -1.6666667 -0.1241668 -1.9324941
## 6 -0.3033333 -0.6227591 -0.1634446 -0.6666667 -1.2678081 0.6881956
## 7 -0.3033333 -0.6227591 -0.1634446 0.3333333 -0.4718337 -1.7715745
## 8 -0.3033333 -0.6227591 -0.1634446 1.3333333 1.1933081 0.5272760
## 9 -0.3033333 -0.6227591 -0.1634446 0.3333333 0.8273428 -0.3769385
## 10 -0.3033333 -0.6227591 -0.1634446 -0.6666667 0.2235002 0.9947090
## x m y
## 1 0.3333333 0.9828781 -1.4420726
## 2 0.3333333 0.9279833 0.1441343
## 3 0.3333333 -0.7188603 -1.6259807
## 4 1.3333333 0.7450007 1.9219121
## 5 -1.6666667 -0.1241668 -1.9324941
## 6 -0.6666667 -1.2678081 0.6881956
## 7 0.3333333 -0.4718337 -1.7715745
## 8 1.3333333 1.1933081 0.5272760
## 9 0.3333333 0.8273428 -0.3769385
## 10 -0.6666667 0.2235002 0.9947090
The variables of interest in this example are fwkstrcw
(the predictor: female work stress centered within person),fwkdiscw
(the mediator: female work dissatisfaction centered within person),freldiscw
(the outcome: female relationship dissatisfaction centered within person). Note that these variables are state variables. They have already been person-mean centered. We are not currently interested in between-person differences, so they have been separated and set aside. In sum, these data are already person-centered. It may take some steps to get your data to this form.
Copies of the three variables of interest have been made and conveniently labeled as x
,m
, and y
for easy conceptualization and operationalization of the mediation model.
Describe the data.
#variables of interest
vars <- c("fwkstrcw", "fwkdiscw", "freldiscw", "x", "m", "y")
#descriptives
describe(data[, vars])
## vars n mean sd median trimmed mad min max range skew
## fwkstrcw 1 2100 0 1.00 0.00 0 1.20 -2.90 2.95 5.86 0.05
## fwkdiscw 2 2100 0 1.13 -0.01 0 1.15 -3.81 3.81 7.63 0.02
## freldiscw 3 2100 0 0.99 0.00 0 0.98 -3.52 3.22 6.74 0.02
## x 4 2100 0 1.00 0.00 0 1.20 -2.90 2.95 5.86 0.05
## m 5 2100 0 1.13 -0.01 0 1.15 -3.81 3.81 7.63 0.02
## y 6 2100 0 0.99 0.00 0 0.98 -3.52 3.22 6.74 0.02
## kurtosis se
## fwkstrcw -0.25 0.02
## fwkdiscw -0.11 0.02
## freldiscw 0.07 0.02
## x -0.25 0.02
## m -0.11 0.02
## y 0.07 0.02
As expected, the variables all have mean = 0, as they have been person-centered. Standard deviations near one are not purposeful, just coincidence.
We plot each leg of the model for a few individuals.
x –> y: The c path.
fwkstrcw –> freldiscw
#within-person regressions, x --> y
ggplot(data = data[which(data$id <= 106), ],
aes(x = fwkstrcw, y = freldiscw, group = id)) +
geom_point(color = "black") +
geom_smooth(method = lm, se = TRUE, fullrange = TRUE, color = "black") +
xlab("x = Work Stress (within)") +
ylab("y = Relationship Dissatisfaction (within)") +
facet_wrap( ~ id)
x –> m: The a path. fwkstrcw –> fwkdiscw
#within-person regressions, x --> m
ggplot(data = data[which(data$id <= 106),],
aes(x = fwkstrcw, y = fwkdiscw, group = id)) +
geom_point(color = "blue") +
geom_smooth(method = lm, se = TRUE, fullrange = TRUE, color = "blue") +
xlab("x = Work Stress (within)") +
ylab("m = Work Dissatisfaction (within)") +
facet_wrap( ~ id)
m –> y: The b path.
fwkdiscw –> freldiscw
#within-person regressions, m --> y
ggplot(data = data[which(data$id <= 106),],
aes(x = fwkdiscw, y = freldiscw, group = id)) +
geom_point(color = "red") +
geom_smooth(method = lm, se = TRUE, fullrange = TRUE, color = "red") +
xlab("m = Work Dissatisfaction (within)") +
ylab("y = Relationship Dissatisfaction (within)") +
facet_wrap( ~ id)
As noted in the introduction, the mediation model can be conceived as a multivariate multilevel model where both y
and m
are outcome variables.
Practically, fitting the bivariate outcome model is facilitated through some data-restructuring, specifically “double-entry” data.
Restructuring is done in just a few steps.
data1 <- data[ , c("id", "time", "timec", "fwkstrcw", "fwkdiscw", "freldiscw", "x", "m", "y")]
#look at data set
head(data1, 10)
## id time timec fwkstrcw fwkdiscw freldiscw x m
## 1 101 1 -10 0.3333333 0.9828781 -1.4420726 0.3333333 0.9828781
## 2 101 2 -9 0.3333333 0.9279833 0.1441343 0.3333333 0.9279833
## 3 101 3 -8 0.3333333 -0.7188603 -1.6259807 0.3333333 -0.7188603
## 4 101 4 -7 1.3333333 0.7450007 1.9219121 1.3333333 0.7450007
## 5 101 5 -6 -1.6666667 -0.1241668 -1.9324941 -1.6666667 -0.1241668
## 6 101 6 -5 -0.6666667 -1.2678081 0.6881956 -0.6666667 -1.2678081
## 7 101 7 -4 0.3333333 -0.4718337 -1.7715745 0.3333333 -0.4718337
## 8 101 8 -3 1.3333333 1.1933081 0.5272760 1.3333333 1.1933081
## 9 101 9 -2 0.3333333 0.8273428 -0.3769385 0.3333333 0.8273428
## 10 101 10 -1 -0.6666667 0.2235002 0.9947090 -0.6666667 0.2235002
## y
## 1 -1.4420726
## 2 0.1441343
## 3 -1.6259807
## 4 1.9219121
## 5 -1.9324941
## 6 0.6881956
## 7 -1.7715745
## 8 0.5272760
## 9 -0.3769385
## 10 0.9947090
y
and m
) into a single z
variable using the melt()
function in the reshape2
library.#trick to get variable names for easy cut and paste
dput(names(data1))
## c("id", "time", "timec", "fwkstrcw", "fwkdiscw", "freldiscw",
## "x", "m", "y")
#melting data
datalong <- melt(data = data1,
id.vars = c("id", "time", "timec", "fwkstrcw", "fwkdiscw", "freldiscw", "x"), #("m","y") have been removed
na.rm = FALSE,
variable.name = "dv",
value.name = "z")
# Reorder rows for convenience
datalong <- datalong[order(datalong$id, datalong$time, datalong$dv), ]
#look at updated data set
head(datalong, 10)
## id time timec fwkstrcw fwkdiscw freldiscw x dv
## 1 101 1 -10 0.3333333 0.9828781 -1.4420726 0.3333333 m
## 2101 101 1 -10 0.3333333 0.9828781 -1.4420726 0.3333333 y
## 2 101 2 -9 0.3333333 0.9279833 0.1441343 0.3333333 m
## 2102 101 2 -9 0.3333333 0.9279833 0.1441343 0.3333333 y
## 3 101 3 -8 0.3333333 -0.7188603 -1.6259807 0.3333333 m
## 2103 101 3 -8 0.3333333 -0.7188603 -1.6259807 0.3333333 y
## 4 101 4 -7 1.3333333 0.7450007 1.9219121 1.3333333 m
## 2104 101 4 -7 1.3333333 0.7450007 1.9219121 1.3333333 y
## 5 101 5 -6 -1.6666667 -0.1241668 -1.9324941 -1.6666667 m
## 2105 101 5 -6 -1.6666667 -0.1241668 -1.9324941 -1.6666667 y
## z
## 1 0.9828781
## 2101 -1.4420726
## 2 0.9279833
## 2102 0.1441343
## 3 -0.7188603
## 2103 -1.6259807
## 4 0.7450007
## 2104 1.9219121
## 5 -0.1241668
## 2105 -1.9324941
#adding the double indicators
datalong$dy <- ifelse(datalong$dv == "y", 1, 0)
datalong$dm <- ifelse(datalong$dv == "m", 1, 0)
datalong$dvnum <- ifelse(datalong$dv == "m", 1, 0)
#look at updated data set
head(datalong, 10)
## id time timec fwkstrcw fwkdiscw freldiscw x dv
## 1 101 1 -10 0.3333333 0.9828781 -1.4420726 0.3333333 m
## 2101 101 1 -10 0.3333333 0.9828781 -1.4420726 0.3333333 y
## 2 101 2 -9 0.3333333 0.9279833 0.1441343 0.3333333 m
## 2102 101 2 -9 0.3333333 0.9279833 0.1441343 0.3333333 y
## 3 101 3 -8 0.3333333 -0.7188603 -1.6259807 0.3333333 m
## 2103 101 3 -8 0.3333333 -0.7188603 -1.6259807 0.3333333 y
## 4 101 4 -7 1.3333333 0.7450007 1.9219121 1.3333333 m
## 2104 101 4 -7 1.3333333 0.7450007 1.9219121 1.3333333 y
## 5 101 5 -6 -1.6666667 -0.1241668 -1.9324941 -1.6666667 m
## 2105 101 5 -6 -1.6666667 -0.1241668 -1.9324941 -1.6666667 y
## z dy dm dvnum
## 1 0.9828781 0 1 1
## 2101 -1.4420726 1 0 0
## 2 0.9279833 0 1 1
## 2102 0.1441343 1 0 0
## 3 -0.7188603 0 1 1
## 2103 -1.6259807 1 0 0
## 4 0.7450007 0 1 1
## 2104 1.9219121 1 0 0
## 5 -0.1241668 0 1 1
## 2105 -1.9324941 1 0 0
The data set datalong
is now ready for analysis.
We are now ready to start running the mediation model as a multivariate multilevel model with two outcomes.
We’ll construct a model for the mediator variable as outcome with fwkstrcw
(x) and timec
as predictors; and a model for the y variable as outcome with fwkstrcw
(x) and fwkdiscw
(m) and timec
as predictors. All the variables have been person centered so that there is no need for intercept terms.
We use the two dummy variables (dm
and dy
) to turn on and off the parameters for each of the outcomes. The parameters invoked with \(dm\) are for the mediator as outcome model to get the a path, and parameters invoked with \(dy\) are for the y as outcome model to get the b and c’ paths.
Setting up and fitting the 1-1-1 mediation model using lme()
.
#lme mediation model
model_lme <- lme(fixed = z ~ -1 +
dm + dm:fwkstrcw + dm:timec + #m as outcome
dy + dy:fwkdiscw + dy:fwkstrcw + dy:timec, #y as outcome
random = ~ -1 + dm:fwkstrcw + dy:fwkdiscw + dy:fwkstrcw | id,
weights = varIdent(form = ~ 1 | dvnum), #separate sigma^{2}_{e} for each outcome
data = datalong,
na.action = na.exclude,
control = lmeControl(opt = "optim", maxIter = 200, msMaxIter = 200, niterEM = 50, msMaxEval = 400))
summary(model_lme)
## Linear mixed-effects model fit by REML
## Data: datalong
## AIC BIC logLik
## 12195.67 12290.79 -6082.837
##
## Random effects:
## Formula: ~-1 + dm:fwkstrcw + dy:fwkdiscw + dy:fwkstrcw | id
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## dm:fwkstrcw 0.26103020 dm:fwk dy:fwk
## dy:fwkdiscw 0.21769909 0.542
## fwkstrcw:dy 0.08798665 0.441 0.932
## Residual 1.08982601
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | dvnum
## Parameter estimates:
## 1 0
## 1.0000000 0.8492324
## Fixed effects: z ~ -1 + dm + dm:fwkstrcw + dm:timec + dy + dy:fwkdiscw + dy:fwkstrcw + dy:timec
## Value Std.Error DF t-value p-value
## dm -0.00000623 0.02378243 4094 -0.000262 0.9998
## dy -0.00011824 0.02019650 4094 -0.005855 0.9953
## dm:fwkstrcw 0.19030912 0.03559543 4094 5.346448 0.0000
## dm:timec -0.00583706 0.00397737 4094 -1.467570 0.1423
## dy:fwkdiscw 0.14780833 0.02866743 4094 5.155966 0.0000
## fwkstrcw:dy 0.10548277 0.02267412 4094 4.652121 0.0000
## timec:dy -0.00247302 0.00338844 4094 -0.729839 0.4655
## Correlation:
## dm dy dm:fwk dm:tmc dy:fwk fwkst:
## dy 0.000
## dm:fwkstrcw -0.001 0.000
## dm:timec 0.000 0.000 -0.012
## dy:fwkdiscw 0.000 0.000 0.301 0.001
## fwkstrcw:dy 0.000 -0.001 0.127 0.001 0.190
## timec:dy 0.000 0.000 0.000 0.003 0.015 -0.014
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.547996904 -0.675619227 -0.007149205 0.670459292 3.349086939
##
## Number of Observations: 4200
## Number of Groups: 100
The results look good and closely match the results in the Bolger & Laurenceau (2013) book! Soon, we will fill more interpretataion in here of the lme()
model output (and use this model to obtain confidence intervals).
#pulling out fixed effects info
FE <- fixef(model_lme)
FE
## dm dy dm:fwkstrcw dm:timec dy:fwkdiscw
## -6.234341e-06 -1.182449e-04 1.903091e-01 -5.837061e-03 1.478083e-01
## fwkstrcw:dy timec:dy
## 1.054828e-01 -2.473016e-03
Let’s interpret …
m
as outcome model (\(d_{Mi}\): We expect this to be zero because we person-centered all the data)y
as outcome model (\(d_{Yi}\): We expect this to be zero because we person-centered all the data)Let’s put the parameters into named objects, as these will be useful later when calculating confidence intervals.
#making parameter objects
a <- as.numeric(FE[3])
a
## [1] 0.1903091
b <- as.numeric(FE[5])
b
## [1] 0.1478083
cprime <- as.numeric(FE[6])
cprime
## [1] 0.1054828
#pulling out random effects info
VarCorr(model_lme)
## id = pdLogChol(-1 + dm:fwkstrcw + dy:fwkdiscw + dy:fwkstrcw)
## Variance StdDev Corr
## dm:fwkstrcw 0.068136768 0.26103020 dm:fwk dy:fwk
## dy:fwkdiscw 0.047392894 0.21769909 0.542
## fwkstrcw:dy 0.007741651 0.08798665 0.441 0.932
## Residual 1.187720737 1.08982601
They are a little easier to interpret and work with when placed in objects …
The variance of the a paths,
#The *variance* of the *a* paths
sig2_a <- as.numeric(VarCorr(model_lme)[1,1])
sig2_a
## [1] 0.06813677
\(\sigma^2_{a}\),= 0.0681368
The variance of the b paths,
sig2_b <- as.numeric(VarCorr(model_lme)[2,1])
sig2_b
## [1] 0.04739289
\(\sigma^2_{b}\),is 0.0473929
The variance of the c’ paths,
sig2_cprime <- as.numeric(VarCorr(model_lme)[3,1])
sig2_cprime
## [1] 0.007741651
\(\sigma^2_{c'}\),is 0.0077417
The residual variance of the mediator variable,
sig2_em <- (1.00*as.numeric(VarCorr(model_lme)[4,2]))^2
sig2_em
## [1] 1.187721
\(\sigma^2_{em}\), calculated as \((0.8492324 + 1.187721)^{2}\) = 1.1877207.
The residual variance of the outcome variable,
sig2_ey <- (0.8492324*as.numeric(VarCorr(model_lme)[4,2]))^2
sig2_ey
## [1] 0.856579
\(\sigma^2_{ey}\), is calculated as \((0.8492324 + 1.187721)^{2}\) = 0.856579.
These residual variances differ from the values in the book. I am not sure why.
The covariance between the \(a_{j}\) and \(b_{j}\) paths
covajbj <- as.numeric(VarCorr(model_lme)[2,2])*as.numeric(VarCorr(model_lme)[2,3])*as.numeric(VarCorr(model_lme)[1,2])
covajbj
## [1] 0.03079971
\(\sigma_{ajbj}\) is calcuated as \(\sigma_{aj}*r_{ajbj}*\sigma_{bj}\) = 0.0307997
Yay! We can now use all these to calculate the indirect and total effects.
IndirectEffects = ab + \(\sigma_{ajbj}\)
= (0.1903*0.1478) + 0.0308
= 0.0589
indirecteffect <- a*b + covajbj
indirecteffect
## [1] 0.05892898
c = c’ + ab + \(\sigma_{ajbj}\)
= 0.1054 + (0.1903*0.1478) + 0.0308
= 0.1644
totaleffect <- cprime + a*b + covajbj
totaleffect
## [1] 0.1644117
PercentMediated = (ab + \(\sigma_{ajbj}\)) / (c’ + ab + \(\sigma_{ajbj}\))
= IndirectEffect / c
= 0.0589 / 0.1644
= 0.3584
percentmediated <- 100*(indirecteffect/totaleffect)
percentmediated
## [1] 35.84232
PercentCovariance = \(\sigma_{ajbj}\) / (ab + \(\sigma_{ajbj}\))
= \(\sigma_{ajbj}\) / ie
= 0.0308 / 0.0589
= 0.5226
percentcovariance <- 100*(covajbj/indirecteffect)
percentcovariance
## [1] 52.26581
All these calculations match the results in Bolger & Laurenceau (2013) Chapter 9 pretty well.
We have briefly reviewed within-person (1-1-1) mediation models. For further information and discussion of moderated mediation in multilevel models check out the paper here http://quantpsy.org/pubs/bauer_preacher_gil_2006.pdf. There are some interesting extensions that can be used to obtain bootstrapped confidence intervals for the mediated effects.
As always, please be careful with the mediational inferences.
Thanks for playing!