The MultiLevel Model (MLM) was developed to accommodate the interdependence in nested data. It is particularly useful for examining individual differences in bivariate associations - thus providing for more robust statistical inference about within-person associations (intraindividual covariation) and the between-person differences therein.
A. Overview of model
B. Set up the basic multilevel model
C. Interpretation of results
D. Display (data visualization) of the result
The basic linear multilevel model is written as
\[y_{it} = \beta_{0i} + \beta_{1i}x_{it}\] where \(\beta_{1i}\) is individual i’s level of
\[\beta_{0i} = \gamma_{00} + \gamma_{01}z_{i} + u_{0i}\] \[\beta_{1i} = \gamma_{10} + \gamma_{11}z_{i} + u_{1i}\]
where \[e_{it} \tilde{} N(0,\mathbf{R})\], and \[\mathbf{U}_{i} \tilde{} N(0,\mathbf{G})\]
\[\mathbf{R} = \mathbf{I}\sigma^{2}_{e}\], where where \(I\) is the identity matrix (diagonal matrix of 1s) and \(\sigma^{2}_{e}\) is the residual (within-person) variance.
\[\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0} & \sigma_{u0u1} \\ \sigma_{u1u0} & \sigma^{2}_{u1} \end{array}\right]\]
Loading Libraries
Loading libraries used in this script.
library(psych) # for describing the data
library(plyr) # for data manipulation
library(ggplot2) # for data visualization
library(lme4) # for multilevel models
library(lmerTest) # for p-values
Loading Data
Our example makes use of the person-level and day-level (EMA-type) AMIB data files. Specifically, we make use of three varaibles:
Outcome: daily negative affect (continuous), obtained through negaff
in the day-level file;
Predictor: daily stress, obtained through reverse coding of pss
in the day-level file;
Person-level predictor: trait neuroticism, bfi_n
in the person-level file
Loading person-level file and subsetting to variables of interest
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIBshare_persons_2019_0501.csv"
#read in the .csv file using the url() function
AMIB_persons <- read.csv(file=url(filepath),header=TRUE)
#subsetting to variables of interest
AMIB_persons <- AMIB_persons[ ,c("id","bfi_n")]
Loading day-level file (T = 8) and subsetting to variables of interest.
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIBshare_daily_2019_0501.csv"
#read in the .csv file using the url() function
AMIB_daily <- read.csv(file=url(filepath),header=TRUE)
#subsetting to variables of interest
AMIB_daily <- AMIB_daily[ ,c("id","day","negaff","pss")]
Data Preparation
Reverse code pss
so higher values indicate higher perceived stress.
#reverse coding the pss variable into a new stress variable
AMIB_daily$stress <- 4 - AMIB_daily$pss
#describing new variable
describe(AMIB_daily$stress)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 1445 1.39 0.68 1.25 1.36 0.74 0 4 4 0.35 0.13
## se
## X1 0.02
#histogram
ggplot(data=AMIB_daily, aes(x=stress)) +
geom_histogram(fill="white", color="black",bins=20) +
labs(x = "Stress (high = more stressed)")
We now split the predictor into “trait” (between-person differences) and “state” (within-person deviations) components. Specifically, the daily variable stress
is split into two varaibles: stress_trait
is the sample-mean centered between-person component, and stress_state
is the person-centered within-person component.
Making trait variables.
#calculating intraindividual means
#Alternative approach using dplyr
#AMIB_imeans <- AMIB_daily %>% group_by(id) %>% summarize(stress_trait=mean(stress, na.rm=TRUE))
AMIB_imeans <- ddply(AMIB_daily, "id", summarize,
stress_trait = mean(stress, na.rm=TRUE),
negaff_trait = mean(negaff, na.rm=TRUE))
describe(AMIB_imeans)
## vars n mean sd median trimmed mad min max
## id 1 190 318.29 130.44 321.50 318.99 151.23 101.00 532.00
## stress_trait 2 190 1.40 0.48 1.41 1.39 0.51 0.19 2.56
## negaff_trait 3 190 2.48 0.73 2.41 2.43 0.72 1.11 5.09
## range skew kurtosis se
## id 431.00 -0.04 -1.09 9.46
## stress_trait 2.38 -0.04 -0.23 0.03
## negaff_trait 3.98 0.68 0.45 0.05
#merging into person-level file
AMIB_persons <- merge(AMIB_persons, AMIB_imeans, by="id")
#make centered versions of the person-level scores
AMIB_persons$bfi_n_c <- scale(AMIB_persons$bfi_n,center=TRUE,scale=FALSE)
AMIB_persons$stress_trait_c <- scale(AMIB_persons$stress_trait,center=TRUE,scale=FALSE)
#describe person-level data
describe(AMIB_persons)
## vars n mean sd median trimmed mad min max
## id 1 190 318.29 130.44 321.50 318.99 151.23 101.00 532.00
## bfi_n 2 190 2.98 0.96 3.00 3.00 1.48 1.00 5.00
## stress_trait 3 190 1.40 0.48 1.41 1.39 0.51 0.19 2.56
## negaff_trait 4 190 2.48 0.73 2.41 2.43 0.72 1.11 5.09
## bfi_n_c 5 190 0.00 0.96 0.02 0.02 1.48 -1.98 2.02
## stress_trait_c 6 190 0.00 0.48 0.01 0.00 0.51 -1.21 1.17
## range skew kurtosis se
## id 431.00 -0.04 -1.09 9.46
## bfi_n 4.00 -0.09 -0.82 0.07
## stress_trait 2.38 -0.04 -0.23 0.03
## negaff_trait 3.98 0.68 0.45 0.05
## bfi_n_c 4.00 -0.09 -0.82 0.07
## stress_trait_c 2.38 -0.04 -0.23 0.03
Making state variables in long data
#merging person-level data into daily data
daily_long <- merge(AMIB_daily,AMIB_persons,by="id")
#calculating state variables
daily_long$stress_state <- daily_long$stress - daily_long$stress_trait
#describing data
describe(daily_long)
## vars n mean sd median trimmed mad min max
## id 1 1458 322.53 129.08 324.00 324.20 151.23 101.00 532.00
## day 2 1458 3.48 2.30 3.00 3.47 2.97 0.00 7.00
## negaff 3 1441 2.45 1.04 2.20 2.34 1.04 1.00 6.90
## pss 4 1445 2.61 0.68 2.75 2.64 0.74 0.00 4.00
## stress 5 1445 1.39 0.68 1.25 1.36 0.74 0.00 4.00
## bfi_n 6 1458 2.97 0.96 3.00 2.98 1.48 1.00 5.00
## stress_trait 7 1458 1.39 0.47 1.41 1.39 0.51 0.19 2.56
## negaff_trait 8 1458 2.45 0.71 2.38 2.41 0.67 1.11 5.09
## bfi_n_c 9 1458 -0.01 0.96 0.02 0.00 1.48 -1.98 2.02
## stress_trait_c 10 1458 -0.01 0.47 0.01 -0.01 0.51 -1.21 1.17
## stress_state 11 1445 0.00 0.49 -0.03 -0.02 0.46 -1.75 2.12
## range skew kurtosis se
## id 431.00 -0.07 -1.06 3.38
## day 7.00 0.01 -1.24 0.06
## negaff 5.90 0.96 0.77 0.03
## pss 4.00 -0.35 0.13 0.02
## stress 4.00 0.35 0.13 0.02
## bfi_n 4.00 -0.08 -0.79 0.03
## stress_trait 2.38 -0.08 -0.21 0.01
## negaff_trait 3.98 0.66 0.52 0.02
## bfi_n_c 4.00 -0.08 -0.79 0.03
## stress_trait_c 2.38 -0.08 -0.21 0.01
## stress_state 3.88 0.36 0.79 0.01
Great! Now the data are all prepared as needed.
Outlining the substantive inquiry.
We define stress reactivty (a person-level dynamic characteristic) as the extent to which a person’s level of daily negative affect is related to daily stress level. Stress reactivity is the intraindividual covariation of daily negative affect and daily stress.
Formally,
\[y_{it} = \beta_{0i} + \beta_{1i}x_{it}\] First check out the distribution of the outcome variable (negaff
):
#pdf("NegativeAffectHistogram.pdf",height=3, width=3.5)
ggplot(data=AMIB_daily, aes(x=negaff)) +
geom_histogram(aes(y=..density..),
binwidth=.25,
colour="black", fill="white") +
labs(x = "Negative Affect", y="Density") +
theme(axis.text=element_text(size=14),
axis.title=element_text(size=16))
## Warning: Removed 17 rows containing non-finite values (stat_bin).
#dev.off()
Now look at a subset of persons to see what stress reactivity looks like.
#faceted plot
ggplot(data=daily_long[which(daily_long$id <= 125),], aes(x=stress_state,y=negaff)) +
geom_point() +
stat_smooth(method="lm", fullrange=TRUE) +
xlab("Stress") + ylab("Negative Affect (Continuous)") +
facet_wrap( ~ id) +
theme(axis.title=element_text(size=16),
axis.text=element_text(size=14),
strip.text=element_text(size=14))
If you want to contain the regression line within the range of stress scores for each person (instead of extrapolating to the full range of stress scores), specify fullrange=FALSE
in the stat_smooth
layer above.
Running the models
We use the lmer()
function in the lme4
package.
To start, lets fit the usual unconditional means model …
#unconditional means model
model0_fit <- lmer(formula = negaff ~ 1 + (1|id),
data=daily_long,
na.action=na.exclude)
summary(model0_fit)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: negaff ~ 1 + (1 | id)
## Data: daily_long
##
## REML criterion at convergence: 3833.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8739 -0.6123 -0.1608 0.4658 3.9394
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 0.4270 0.6535
## Residual 0.6627 0.8141
## Number of obs: 1441, groups: id, 190
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 2.46368 0.05229 185.80794 47.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Lets extract the random effects with the VarCorr() function:
VarCorr(model0_fit)
## Groups Name Std.Dev.
## id (Intercept) 0.65347
## Residual 0.81408
We can then compute the intra-class correlation (ICC) as the ratio of the random intercept variance (between-person) to the total variance (between + within).
Store the random effect variances, which will be the first column of the VarCorr object (see above).
RandomEffects <- as.data.frame(VarCorr(model0_fit))
RandomEffects
## grp var1 var2 vcov sdcor
## 1 id (Intercept) <NA> 0.4270294 0.6534748
## 2 Residual <NA> <NA> 0.6627260 0.8140798
Next, compute the ICC. It is the ratio of the random intercept variance (between-person var) over the total variance (between + within var):
ICC_between <- RandomEffects[1,4]/(RandomEffects[1,4]+RandomEffects[2,4])
ICC_between
## [1] 0.391858
From the unconditional means model, the ICC was calculated, which indicated that of the total variance in negative affect, 39.19% is attributable to between-person variation whereas 60.81% is attributatable to within-person variation
OK - lets add the predictors, stress_trait
and stress_state
to examine stress reactivity.
# fit model
model1_fit <- lmer(formula = negaff ~ 1 + day + stress_trait_c +
stress_state + stress_state:stress_trait_c +
(1 + stress_state|id),
data=daily_long,
na.action=na.exclude)
summary(model1_fit)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula:
## negaff ~ 1 + day + stress_trait_c + stress_state + stress_state:stress_trait_c +
## (1 + stress_state | id)
## Data: daily_long
##
## REML criterion at convergence: 3162.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.5368 -0.6127 -0.0729 0.5093 4.4164
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 0.2135 0.4621
## stress_state 0.1257 0.3546 0.53
## Residual 0.4038 0.6355
## Number of obs: 1438, groups: id, 190
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 2.695e+00 4.583e-02 3.922e+02 58.806
## day -6.580e-02 7.552e-03 1.250e+03 -8.713
## stress_trait_c 1.038e+00 7.946e-02 1.859e+02 13.067
## stress_state 7.647e-01 4.561e-02 1.664e+02 16.765
## stress_trait_c:stress_state 1.550e-01 9.780e-02 1.584e+02 1.585
## Pr(>|t|)
## (Intercept) <2e-16 ***
## day <2e-16 ***
## stress_trait_c <2e-16 ***
## stress_state <2e-16 ***
## stress_trait_c:stress_state 0.115
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) day strs__ strss_
## day -0.569
## strss_trt_c 0.004 0.007
## stress_stat 0.216 0.012 0.002
## strss_tr_:_ 0.034 -0.057 0.268 -0.118
# save predicted scores
daily_long$pred_m1 <- predict(model1_fit)
Interpreting and Plotting the Results
The results indicate that there is an association between perceived stress and negative affect at both within- and between-person levels of analysis. In other words, there is within-person variation across days in the extent to which perceived stress is associated with negative affect and there are between-person differences in the extent to which perceived stress is associated with negative affect on average.
We can also get confidence intervals for the fixed and random effects. Depending on model complexity, the confint()
can sometimes take awhile to run
# Get confidence intervals for both fixed and random effects
confint(model1_fit)
## Computing profile confidence intervals ...
## 2.5 % 97.5 %
## .sig01 0.40389523 0.52247837
## .sig02 0.27635443 0.77129749
## .sig03 0.25230527 0.45069551
## .sigma 0.60962930 0.66244352
## (Intercept) 2.60530435 2.78468853
## day -0.08058845 -0.05099124
## stress_trait_c 0.88249219 1.19390742
## stress_state 0.67339998 0.85449747
## stress_trait_c:stress_state -0.03788643 0.34703329
The labels for the random effects are not entirely intuitive (e.g., sig01
, sig02
, sig03
, sigma
) - especially for models with several random effects.
In the above output:
In general, the output given from the confint()
will give you the confidence intervals (given in SD units - will need to square these values if you want the variances) for the first random effect (e.g., the random intercept). Then it will give you the correlations between that random effect and all other random effects it is correlated with. The next row of CI values will correspond to the second random effect (e.g., the random slope for stress_state), with the following rows pertaining to all random effects it is correlated with (but not include any correlations already given - because for example, the correlation for the random intercept and random slope for stress_state was already given). The last row pertaining to random effects is always called sigma
, and pertains to the residual error (given in SD units - will need to square these values if you want the variances).
Next, we can plot between-person associations.
ggplot(data=AMIB_imeans, aes(x=stress_trait, y=negaff_trait, group=factor(id)), legend=FALSE) +
geom_point(colour="gray40") +
geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=FALSE, lty=1, size=2, color="blue") +
xlab("Trait Stress") + ylab("Trait Negative Affect") +
theme_classic() +
theme(axis.title=element_text(size=16),
axis.text=element_text(size=12),
plot.title=element_text(size=16, hjust=.5)) +
ggtitle("Between-Person Association Plot\nTrait Stress & Negative Affect")
Plotting predicted within-person associations.
ggplot(data=daily_long, aes(x=stress_state, y=pred_m1, group=factor(id), colour="gray"), legend=FALSE) +
geom_smooth(method=lm, se=FALSE, fullrange=FALSE, lty=1, size=.5, color="gray40") +
geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=FALSE, lty=1, size=2, color="blue") +
xlab("Stress State") + ylab("Predicted State Negative Affect") +
theme_classic() +
theme(axis.title=element_text(size=18),
axis.text=element_text(size=14),
plot.title=element_text(size=18, hjust=.5)) +
ggtitle("Within-Person Association Plot\nPerceived Stress & Negative Affect")
## Warning: Removed 20 rows containing non-finite values (stat_smooth).
## Warning: Removed 20 rows containing non-finite values (stat_smooth).
Adding another Person-level Predictor
We now add trait neuroticism to the model, to see if and how between-person differences in neuroticism are related to stress reactivity.
Running the expanded model.
# fit model
model2_fit <- lmer(formula = negaff ~ 1 + day + stress_trait_c +
bfi_n_c + stress_trait_c:bfi_n_c +
stress_state + stress_state:stress_trait_c +
stress_state:bfi_n_c + stress_state:stress_trait_c:bfi_n_c +
(1 + stress_state|id),
data=daily_long,
na.action=na.exclude)
summary(model2_fit)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula:
## negaff ~ 1 + day + stress_trait_c + bfi_n_c + stress_trait_c:bfi_n_c +
## stress_state + stress_state:stress_trait_c + stress_state:bfi_n_c +
## stress_state:stress_trait_c:bfi_n_c + (1 + stress_state | id)
## Data: daily_long
##
## REML criterion at convergence: 3161.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4271 -0.6011 -0.0749 0.5045 4.4732
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 0.1955 0.4422
## stress_state 0.1238 0.3518 0.51
## Residual 0.4040 0.6356
## Number of obs: 1438, groups: id, 190
##
## Fixed effects:
## Estimate Std. Error df
## (Intercept) 2.690e+00 4.556e-02 3.944e+02
## day -6.545e-02 7.572e-03 1.247e+03
## stress_trait_c 9.695e-01 7.878e-02 1.835e+02
## bfi_n_c 1.543e-01 3.917e-02 1.809e+02
## stress_state 7.687e-01 4.682e-02 1.673e+02
## stress_trait_c:bfi_n_c 3.715e-02 7.832e-02 1.824e+02
## stress_trait_c:stress_state 1.254e-01 1.015e-01 1.692e+02
## bfi_n_c:stress_state 7.595e-02 4.845e-02 1.549e+02
## stress_trait_c:bfi_n_c:stress_state -3.167e-02 1.024e-01 1.722e+02
## t value Pr(>|t|)
## (Intercept) 59.055 < 2e-16 ***
## day -8.644 < 2e-16 ***
## stress_trait_c 12.307 < 2e-16 ***
## bfi_n_c 3.939 0.000117 ***
## stress_state 16.418 < 2e-16 ***
## stress_trait_c:bfi_n_c 0.474 0.635819
## stress_trait_c:stress_state 1.235 0.218477
## bfi_n_c:stress_state 1.568 0.119005
## stress_trait_c:bfi_n_c:stress_state -0.309 0.757561
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) day strs__ bf_n_c strss_ st__:__ st__:_ bf__:_
## day -0.576
## strss_trt_c 0.003 0.005
## bfi_n_c -0.008 0.007 -0.224
## stress_stat 0.204 0.004 0.000 0.001
## strss_t_:__ -0.179 0.008 0.008 0.040 -0.054
## strss_tr_:_ 0.042 -0.073 0.248 -0.056 -0.087 0.003
## bf_n_c:str_ -0.033 0.058 -0.058 0.258 0.016 0.009 -0.244
## strs__:__:_ -0.061 0.032 0.004 0.008 -0.233 0.243 -0.103 -0.059
# save predicted scores
daily_long$pred_m2 <- predict(model2_fit)
This tutorial illustrated application of the basic multilevel model in cases where the outcome variable is assumed normally distributed.
Good luck! =:-]