Overview

The MultiLevel Model (MLM) was developed to accommodate the interdependence in nested data. It is particularly useful for examining individual differences in bivariate associations - thus providing for more robust statistical inference about within-person associations (intraindividual covariation) and the between-person differences therein.

Outline

A. Overview of model

B. Set up the basic multilevel model

C. Interpretation of results

D. Display (data visualization) of the result

A. Introduction to the Generalized Multilevel Model

The basic linear multilevel model is written as

\[y_{it} = \beta_{0i} + \beta_{1i}x_{it}\] where \(\beta_{1i}\) is individual i’s level of … -ity, and where

\[\beta_{0i} = \gamma_{00} + \gamma_{01}z_{i} + u_{0i}\] \[\beta_{1i} = \gamma_{10} + \gamma_{11}z_{i} + u_{1i}\]

where \[e_{it} \tilde{} N(0,\mathbf{R})\], and \[\mathbf{U}_{i} \tilde{} N(0,\mathbf{G})\]

\[\mathbf{R} = \mathbf{I}\sigma^{2}_{e}\], where where \(I\) is the identity matrix (diagonal matrix of 1s) and \(\sigma^{2}_{e}\) is the residual (within-person) variance.

\[\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0} & \sigma_{u0u1} \\ \sigma_{u1u0} & \sigma^{2}_{u1} \end{array}\right]\]

B. Set up the Basic Multilevel Model

Preliminaries

Loading Libraries

Loading libraries used in this script.

library(psych)      # for describing the data
library(plyr)       # for data manipulation
library(ggplot2)    # for data visualization
library(lme4)       # for multilevel models
library(lmerTest)   # for p-values

Loading Data

Our example makes use of the person-level and day-level (EMA-type) AMIB data files. Specifically, we make use of three varaibles:

Outcome: daily negative affect (continuous), obtained through negaff in the day-level file;

Predictor: daily stress, obtained through reverse coding of pss in the day-level file;
Person-level predictor: trait neuroticism, bfi_n in the person-level file

Loading person-level file and subsetting to variables of interest

#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIBshare_persons_2019_0501.csv"
#read in the .csv file using the url() function
AMIB_persons <- read.csv(file=url(filepath),header=TRUE)

#subsetting to variables of interest
AMIB_persons <- AMIB_persons[ ,c("id","bfi_n")]

Loading day-level file (T = 8) and subsetting to variables of interest.

#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIBshare_daily_2019_0501.csv"
#read in the .csv file using the url() function
AMIB_daily <- read.csv(file=url(filepath),header=TRUE)

#subsetting to variables of interest
AMIB_daily <- AMIB_daily[ ,c("id","day","negaff","pss")]

Data Preparation

Reverse code pss so higher values indicate higher perceived stress.

#reverse coding the pss variable into a new stress variable
AMIB_daily$stress <- 4 - AMIB_daily$pss
#describing new variable
describe(AMIB_daily$stress)
##    vars    n mean   sd median trimmed  mad min max range skew kurtosis
## X1    1 1445 1.39 0.68   1.25    1.36 0.74   0   4     4 0.35     0.13
##      se
## X1 0.02
#histogram
ggplot(data=AMIB_daily, aes(x=stress)) +
  geom_histogram(fill="white", color="black",bins=20) +
  labs(x = "Stress (high = more stressed)")

We now split the predictor into “trait” (between-person differences) and “state” (within-person deviations) components. Specifically, the daily variable stress is split into two varaibles: stress_trait is the sample-mean centered between-person component, and stress_state is the person-centered within-person component.

Making trait variables.

#calculating intraindividual means
#Alternative approach using dplyr
    #AMIB_imeans <- AMIB_daily %>% group_by(id) %>% summarize(stress_trait=mean(stress, na.rm=TRUE))
AMIB_imeans <- ddply(AMIB_daily, "id", summarize,
                       stress_trait = mean(stress, na.rm=TRUE),
                       negaff_trait = mean(negaff, na.rm=TRUE))
describe(AMIB_imeans)
##              vars   n   mean     sd median trimmed    mad    min    max
## id              1 190 318.29 130.44 321.50  318.99 151.23 101.00 532.00
## stress_trait    2 190   1.40   0.48   1.41    1.39   0.51   0.19   2.56
## negaff_trait    3 190   2.48   0.73   2.41    2.43   0.72   1.11   5.09
##               range  skew kurtosis   se
## id           431.00 -0.04    -1.09 9.46
## stress_trait   2.38 -0.04    -0.23 0.03
## negaff_trait   3.98  0.68     0.45 0.05
#merging into person-level file
AMIB_persons <- merge(AMIB_persons, AMIB_imeans, by="id")                                              
#make centered versions of the person-level scores
AMIB_persons$bfi_n_c <- scale(AMIB_persons$bfi_n,center=TRUE,scale=FALSE)
AMIB_persons$stress_trait_c <- scale(AMIB_persons$stress_trait,center=TRUE,scale=FALSE)
#describe person-level data
describe(AMIB_persons)
##                vars   n   mean     sd median trimmed    mad    min    max
## id                1 190 318.29 130.44 321.50  318.99 151.23 101.00 532.00
## bfi_n             2 190   2.98   0.96   3.00    3.00   1.48   1.00   5.00
## stress_trait      3 190   1.40   0.48   1.41    1.39   0.51   0.19   2.56
## negaff_trait      4 190   2.48   0.73   2.41    2.43   0.72   1.11   5.09
## bfi_n_c           5 190   0.00   0.96   0.02    0.02   1.48  -1.98   2.02
## stress_trait_c    6 190   0.00   0.48   0.01    0.00   0.51  -1.21   1.17
##                 range  skew kurtosis   se
## id             431.00 -0.04    -1.09 9.46
## bfi_n            4.00 -0.09    -0.82 0.07
## stress_trait     2.38 -0.04    -0.23 0.03
## negaff_trait     3.98  0.68     0.45 0.05
## bfi_n_c          4.00 -0.09    -0.82 0.07
## stress_trait_c   2.38 -0.04    -0.23 0.03

Making state variables in long data

#merging person-level data into daily data
daily_long <- merge(AMIB_daily,AMIB_persons,by="id")

#calculating state variables
daily_long$stress_state <- daily_long$stress - daily_long$stress_trait
#describing data
describe(daily_long)
##                vars    n   mean     sd median trimmed    mad    min    max
## id                1 1458 322.53 129.08 324.00  324.20 151.23 101.00 532.00
## day               2 1458   3.48   2.30   3.00    3.47   2.97   0.00   7.00
## negaff            3 1441   2.45   1.04   2.20    2.34   1.04   1.00   6.90
## pss               4 1445   2.61   0.68   2.75    2.64   0.74   0.00   4.00
## stress            5 1445   1.39   0.68   1.25    1.36   0.74   0.00   4.00
## bfi_n             6 1458   2.97   0.96   3.00    2.98   1.48   1.00   5.00
## stress_trait      7 1458   1.39   0.47   1.41    1.39   0.51   0.19   2.56
## negaff_trait      8 1458   2.45   0.71   2.38    2.41   0.67   1.11   5.09
## bfi_n_c           9 1458  -0.01   0.96   0.02    0.00   1.48  -1.98   2.02
## stress_trait_c   10 1458  -0.01   0.47   0.01   -0.01   0.51  -1.21   1.17
## stress_state     11 1445   0.00   0.49  -0.03   -0.02   0.46  -1.75   2.12
##                 range  skew kurtosis   se
## id             431.00 -0.07    -1.06 3.38
## day              7.00  0.01    -1.24 0.06
## negaff           5.90  0.96     0.77 0.03
## pss              4.00 -0.35     0.13 0.02
## stress           4.00  0.35     0.13 0.02
## bfi_n            4.00 -0.08    -0.79 0.03
## stress_trait     2.38 -0.08    -0.21 0.01
## negaff_trait     3.98  0.66     0.52 0.02
## bfi_n_c          4.00 -0.08    -0.79 0.03
## stress_trait_c   2.38 -0.08    -0.21 0.01
## stress_state     3.88  0.36     0.79 0.01

Great! Now the data are all prepared as needed.

Set-up for basic multilevel model with continuous outcome

Outlining the substantive inquiry.

We define stress reactivty (a person-level dynamic characteristic) as the extent to which a person’s level of daily negative affect is related to daily stress level. Stress reactivity is the intraindividual covariation of daily negative affect and daily stress.

Formally,
\[y_{it} = \beta_{0i} + \beta_{1i}x_{it}\] First check out the distribution of the outcome variable (negaff):

#pdf("NegativeAffectHistogram.pdf",height=3, width=3.5)
ggplot(data=AMIB_daily, aes(x=negaff)) +
  geom_histogram(aes(y=..density..),      
                   binwidth=.25,
                   colour="black", fill="white") +
  labs(x = "Negative Affect", y="Density") +
  theme(axis.text=element_text(size=14),
        axis.title=element_text(size=16))
## Warning: Removed 17 rows containing non-finite values (stat_bin).

#dev.off()

Now look at a subset of persons to see what stress reactivity looks like.

#faceted plot
ggplot(data=daily_long[which(daily_long$id <= 125),], aes(x=stress_state,y=negaff)) +
  geom_point() +
  stat_smooth(method="lm", fullrange=TRUE) +
  xlab("Stress") + ylab("Negative Affect (Continuous)") + 
  facet_wrap( ~ id) +
  theme(axis.title=element_text(size=16),
        axis.text=element_text(size=14),
        strip.text=element_text(size=14))

If you want to contain the regression line within the range of stress scores for each person (instead of extrapolating to the full range of stress scores), specify fullrange=FALSE in the stat_smooth layer above.

Running the models

We use the lmer() function in the lme4 package.

To start, lets fit the usual unconditional means model …

#unconditional means model
model0_fit <- lmer(formula = negaff ~ 1 + (1|id), 
              data=daily_long,
              na.action=na.exclude)
summary(model0_fit)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: negaff ~ 1 + (1 | id)
##    Data: daily_long
## 
## REML criterion at convergence: 3833.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8739 -0.6123 -0.1608  0.4658  3.9394 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept) 0.4270   0.6535  
##  Residual             0.6627   0.8141  
## Number of obs: 1441, groups:  id, 190
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   2.46368    0.05229 185.80794   47.12   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Lets extract the random effects with the VarCorr() function:

VarCorr(model0_fit)
##  Groups   Name        Std.Dev.
##  id       (Intercept) 0.65347 
##  Residual             0.81408

We can then compute the intra-class correlation (ICC) as the ratio of the random intercept variance (between-person) to the total variance (between + within).

Store the random effect variances, which will be the first column of the VarCorr object (see above).

RandomEffects <- as.data.frame(VarCorr(model0_fit))
RandomEffects
##        grp        var1 var2      vcov     sdcor
## 1       id (Intercept) <NA> 0.4270294 0.6534748
## 2 Residual        <NA> <NA> 0.6627260 0.8140798

Next, compute the ICC. It is the ratio of the random intercept variance (between-person var) over the total variance (between + within var):

ICC_between <- RandomEffects[1,4]/(RandomEffects[1,4]+RandomEffects[2,4]) 
ICC_between
## [1] 0.391858

From the unconditional means model, the ICC was calculated, which indicated that of the total variance in negative affect, 39.19% is attributable to between-person variation whereas 60.81% is attributatable to within-person variation

OK - lets add the predictors, stress_trait and stress_state to examine stress reactivity.

# fit model
model1_fit <- lmer(formula = negaff ~ 1 + day + stress_trait_c + 
                      stress_state + stress_state:stress_trait_c + 
                      (1 + stress_state|id), 
                    data=daily_long,
                    na.action=na.exclude)
summary(model1_fit)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## negaff ~ 1 + day + stress_trait_c + stress_state + stress_state:stress_trait_c +  
##     (1 + stress_state | id)
##    Data: daily_long
## 
## REML criterion at convergence: 3162.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5368 -0.6127 -0.0729  0.5093  4.4164 
## 
## Random effects:
##  Groups   Name         Variance Std.Dev. Corr
##  id       (Intercept)  0.2135   0.4621       
##           stress_state 0.1257   0.3546   0.53
##  Residual              0.4038   0.6355       
## Number of obs: 1438, groups:  id, 190
## 
## Fixed effects:
##                               Estimate Std. Error         df t value
## (Intercept)                  2.695e+00  4.583e-02  3.922e+02  58.806
## day                         -6.580e-02  7.552e-03  1.250e+03  -8.713
## stress_trait_c               1.038e+00  7.946e-02  1.859e+02  13.067
## stress_state                 7.647e-01  4.561e-02  1.664e+02  16.765
## stress_trait_c:stress_state  1.550e-01  9.780e-02  1.584e+02   1.585
##                             Pr(>|t|)    
## (Intercept)                   <2e-16 ***
## day                           <2e-16 ***
## stress_trait_c                <2e-16 ***
## stress_state                  <2e-16 ***
## stress_trait_c:stress_state    0.115    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) day    strs__ strss_
## day         -0.569                     
## strss_trt_c  0.004  0.007              
## stress_stat  0.216  0.012  0.002       
## strss_tr_:_  0.034 -0.057  0.268 -0.118
# save predicted scores
daily_long$pred_m1 <- predict(model1_fit)

Interpreting and Plotting the Results

The results indicate that there is an association between perceived stress and negative affect at both within- and between-person levels of analysis. In other words, there is within-person variation across days in the extent to which perceived stress is associated with negative affect and there are between-person differences in the extent to which perceived stress is associated with negative affect on average.

  • Fixed Effects:
    • (Intercept): The average value for negative affect is 2.7
    • day: negative affect decreased over the course of the study day - for every unit increase in day, negative affect decreased by -0.07 (p = 0)
    • stress_trait_c: On average, individuals with higher perceived stress tended experience higher negative affect, 1.04, which is statistically sifnificantly different from zero (p = 0).
    • stress_state : On days when individuals’ perceived stress was higher than usual, their negative affect also tended to be higher than usual, 0.76, which was statistically significantly different from zero (p = 0).
    • stress_trait_c:stress_state: trait stress does not moderate the association between state stress and negative affect (0.16, p = 0.11)
  • Random Effects:
    • sd((Intercept)): There was a substantial extent of between-person differences in the average value of negative affect (0.21)
    • sd(stress_state): There was a substantial extent of between-person differences in the average within-person association between perceived stress and negative affect (0.13)
    • cov((Intercept),stress_state: The correlation between the random intercept and random slope was 0.53, which indicates that those who had higher intercept values for negative affect were also more likely to have stronger positive associations between negative affect and perceived stress.

We can also get confidence intervals for the fixed and random effects. Depending on model complexity, the confint() can sometimes take awhile to run

# Get confidence intervals for both fixed and random effects
confint(model1_fit)
## Computing profile confidence intervals ...
##                                   2.5 %      97.5 %
## .sig01                       0.40389523  0.52247837
## .sig02                       0.27635443  0.77129749
## .sig03                       0.25230527  0.45069551
## .sigma                       0.60962930  0.66244352
## (Intercept)                  2.60530435  2.78468853
## day                         -0.08058845 -0.05099124
## stress_trait_c               0.88249219  1.19390742
## stress_state                 0.67339998  0.85449747
## stress_trait_c:stress_state -0.03788643  0.34703329

The labels for the random effects are not entirely intuitive (e.g., sig01, sig02, sig03, sigma) - especially for models with several random effects.

In the above output:

  • sig01 = random effect CI for random intercept (given in SD units - need to square the values if you want the variances)
  • sig02 = correlation between the random intercept and random slope (for stress_state)
  • sig03 = random effect CI for random slope (stress_state; given in SD units - need to square the values if you want the variances)
  • sigma = residual error CI (given in SD units - need to square the values if you want the variances)

In general, the output given from the confint() will give you the confidence intervals (given in SD units - will need to square these values if you want the variances) for the first random effect (e.g., the random intercept). Then it will give you the correlations between that random effect and all other random effects it is correlated with. The next row of CI values will correspond to the second random effect (e.g., the random slope for stress_state), with the following rows pertaining to all random effects it is correlated with (but not include any correlations already given - because for example, the correlation for the random intercept and random slope for stress_state was already given). The last row pertaining to random effects is always called sigma, and pertains to the residual error (given in SD units - will need to square these values if you want the variances).

Next, we can plot between-person associations.

ggplot(data=AMIB_imeans, aes(x=stress_trait, y=negaff_trait, group=factor(id)), legend=FALSE) +
  geom_point(colour="gray40") +
  geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=FALSE, lty=1, size=2, color="blue") +
  xlab("Trait Stress") + ylab("Trait Negative Affect") +
  theme_classic() +
  theme(axis.title=element_text(size=16),
        axis.text=element_text(size=12),
        plot.title=element_text(size=16, hjust=.5)) +
  ggtitle("Between-Person Association Plot\nTrait Stress & Negative Affect")

Plotting predicted within-person associations.

ggplot(data=daily_long, aes(x=stress_state, y=pred_m1, group=factor(id), colour="gray"), legend=FALSE) +
  geom_smooth(method=lm, se=FALSE, fullrange=FALSE, lty=1, size=.5, color="gray40") +
  geom_smooth(aes(group=1), method=lm, se=FALSE, fullrange=FALSE, lty=1, size=2, color="blue") +
  xlab("Stress State") + ylab("Predicted State Negative Affect") +
  theme_classic() +
  theme(axis.title=element_text(size=18),
        axis.text=element_text(size=14),
        plot.title=element_text(size=18, hjust=.5)) +
  ggtitle("Within-Person Association Plot\nPerceived Stress & Negative Affect")
## Warning: Removed 20 rows containing non-finite values (stat_smooth).

## Warning: Removed 20 rows containing non-finite values (stat_smooth).

Adding another Person-level Predictor

We now add trait neuroticism to the model, to see if and how between-person differences in neuroticism are related to stress reactivity.

Running the expanded model.

# fit model
model2_fit <- lmer(formula = negaff ~ 1 + day + stress_trait_c + 
                      bfi_n_c + stress_trait_c:bfi_n_c +
                      stress_state + stress_state:stress_trait_c + 
                      stress_state:bfi_n_c + stress_state:stress_trait_c:bfi_n_c + 
                      (1 + stress_state|id),
                    data=daily_long,
                    na.action=na.exclude)
summary(model2_fit)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## negaff ~ 1 + day + stress_trait_c + bfi_n_c + stress_trait_c:bfi_n_c +  
##     stress_state + stress_state:stress_trait_c + stress_state:bfi_n_c +  
##     stress_state:stress_trait_c:bfi_n_c + (1 + stress_state |      id)
##    Data: daily_long
## 
## REML criterion at convergence: 3161.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4271 -0.6011 -0.0749  0.5045  4.4732 
## 
## Random effects:
##  Groups   Name         Variance Std.Dev. Corr
##  id       (Intercept)  0.1955   0.4422       
##           stress_state 0.1238   0.3518   0.51
##  Residual              0.4040   0.6356       
## Number of obs: 1438, groups:  id, 190
## 
## Fixed effects:
##                                       Estimate Std. Error         df
## (Intercept)                          2.690e+00  4.556e-02  3.944e+02
## day                                 -6.545e-02  7.572e-03  1.247e+03
## stress_trait_c                       9.695e-01  7.878e-02  1.835e+02
## bfi_n_c                              1.543e-01  3.917e-02  1.809e+02
## stress_state                         7.687e-01  4.682e-02  1.673e+02
## stress_trait_c:bfi_n_c               3.715e-02  7.832e-02  1.824e+02
## stress_trait_c:stress_state          1.254e-01  1.015e-01  1.692e+02
## bfi_n_c:stress_state                 7.595e-02  4.845e-02  1.549e+02
## stress_trait_c:bfi_n_c:stress_state -3.167e-02  1.024e-01  1.722e+02
##                                     t value Pr(>|t|)    
## (Intercept)                          59.055  < 2e-16 ***
## day                                  -8.644  < 2e-16 ***
## stress_trait_c                       12.307  < 2e-16 ***
## bfi_n_c                               3.939 0.000117 ***
## stress_state                         16.418  < 2e-16 ***
## stress_trait_c:bfi_n_c                0.474 0.635819    
## stress_trait_c:stress_state           1.235 0.218477    
## bfi_n_c:stress_state                  1.568 0.119005    
## stress_trait_c:bfi_n_c:stress_state  -0.309 0.757561    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) day    strs__ bf_n_c strss_ st__:__ st__:_ bf__:_
## day         -0.576                                                  
## strss_trt_c  0.003  0.005                                           
## bfi_n_c     -0.008  0.007 -0.224                                    
## stress_stat  0.204  0.004  0.000  0.001                             
## strss_t_:__ -0.179  0.008  0.008  0.040 -0.054                      
## strss_tr_:_  0.042 -0.073  0.248 -0.056 -0.087  0.003               
## bf_n_c:str_ -0.033  0.058 -0.058  0.258  0.016  0.009  -0.244       
## strs__:__:_ -0.061  0.032  0.004  0.008 -0.233  0.243  -0.103 -0.059
# save predicted scores
daily_long$pred_m2 <- predict(model2_fit)
  • Fixed Effects:
    • (Intercept): The average value for negative affect is 2.69
    • day: For every unit increase in day, negative affect is expected to decrease by -0.07 (p = 0)
    • stress_trait_c: On average, individuals with higher perceived stress tended experience higher negative affect, 0.97, which is statistically sifnificantly different from zero (p = 0).
    • bfi_n_c: On average, individuals with higher neuroticism also tended to experience higher negative affect (0.15, p = 0)
    • stress_state : On days when individuals’ perceived stress was higher than usual, their negative affect also tended to be higher than usual, 0.77, which was statistically significantly different from zero (p = 0).
    • stress_trait_c:bfi_n_c: trait neuroticism did not moderate the association between-person association between stress and negative affect (0.04, p = 0.64)
    • stress_trait_c:stress_state: trait stress does not moderate the association between state stress and negative affect (0.13, p = 0.22)
    • bfi_n_c:stress_state: trait neuroticism does not moderate the within-person association between stress and negative affect (0.08, p = 0.12)
    • stress_trait_c:bfi_n_c:stress_state: neuroticism does not moderate the association between trait stress and stressor reactivity (-0.03, p = 0.76)
  • Random Effects:
    • sd((Intercept)): There was a substantial extent of between-person differences in the average value of negative affect (0.2)
    • sd(stress_state): There was a substantial extent of between-person differences in the average within-person association between perceived stress and negative affect (0.12)
    • cov((Intercept),stress_state: The correlation between the random intercept and random slope was 0.51, which indicates that those who had higher intercept values for negative affect were also more likely to have stronger positive associations between negative affect and perceived stress.

Conclusion

This tutorial illustrated application of the basic multilevel model in cases where the outcome variable is assumed normally distributed.

Good luck! =:-]