This tutorial illustrates set-up and fitting of a generalized multilevel model to expereince sampling data where the outcome variable is a count variable. The Poisson multilevel model illustrated here can be adapted easily to accommodate other types of outcomes (e.g., to a logistic multilevel model for analysis of a binary outcome variable). Our example make use of The AMIB Data, a multiple time-scale data set that has been shared for teaching purposes. Parallel to the tutorial on the basic multilevel model, this tutorial treats the daily measure of Negative Affect as a count variable. Note however that while the basic model was fit using the lme() function in the nlme library, the generalized model is being fit using the glmer() function in the lme4 library.
A. Overview of model
B. Set up the Generalized Multilevel Model (Poisson MLM for count data)
C. Interpretation of results
D. Display (data visualization) of the result
The basic linear multilevel model is written as
\[y_{it} = \beta_{0i} + \beta_{1i}x_{it}\] where \(\beta_{1i}\) is individual i’s level of
\[\beta_{0i} = \gamma_{00} + \gamma_{01}z_{i} + u_{0i}\] \[\beta_{1i} = \gamma_{10} + \gamma_{11}z_{i} + u_{1i}\]
where \[e_{it} \tilde{} N(0,\mathbf{R})\], and \[\mathbf{U}_{i} \tilde{} N(0,\mathbf{G})\]
\[\mathbf{R} = \mathbf{I}\sigma^{2}_{e}\], where where \(I\) is the identity matrix (diagonal matrix of 1s) and \(\sigma^{2}_{e}\) is the residual (within-person) variance.
\[\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0} & \sigma_{u0u1} \\ \sigma_{u1u0} & \sigma^{2}_{u1} \end{array}\right]\]
The generalized linear multilevel model is an extension of linear multilevel models that allows that response variables from different distributions besides Gaussian (see also http://www.ats.ucla.edu/stat/mult_pkg/glmm.htm). To do this, we introduce a link function.
Let the linear predictor (the right hand side of the regression equation) be called \(\mathbf{\eta}\). That is (writing the above model in a general way common in statistics), \[\mathbf{\eta = X\beta + Z\gamma}\] Thus, \(\mathbf{\eta}\) is the combination of the fixed and random effects excluding the residuals.
We introduce a generic link function, \(g(\cdot)\), that relates the outcome \(\mathbf{y}\) to the linear predictor \(\mathbf{\eta}\).
\(g(\cdot)\) = link function
\(h(\cdot)=g^{−1}(\cdot)\) = inverse link function
We use these link functions to formalize that the conditional expectation of \(\mathbf{y}\) (conditional because it is the expected value depending on the level of the predictors) is
\[g(E(\mathbf{y}))=\mathbf{\eta}\] So, basically the link function “transforms” the outcome variable \(\mathbf{y}\) into a normal outcome.
We could also model the expectation of \(\mathbf{y}\) as \[E(\mathbf{y})=h(\mathbf{\eta})=\mathbf{\mu}\]
With \(\mathbf{y}\) itself equal to
\[\mathbf{y}=h(\mathbf{\eta}) + \mathbf{e}\]
There are many different link functions. In this tutorial we focus on the Poisson family, which is useful for modeling outcome varaibles that are measured as counts.
Count Outcomes For a count outcome, we use a log link function and the probability mass function (PMF) for the Poisson distribution. These are:
\[g(\cdot) = log_{e}(\cdot)\] \[h(\cdot)=e^{(\cdot)}\] \[PMF = Pr(X = k) = \frac{\lambda^ke^-\lambda}{k!}\] \[E(X)=\lambda\] \[Var(X)=\lambda\]
Model Fitting When we use an “identity” link function, we are in our usual specification of means and variances for the normal (Gaussian) distribution. “Generalized” is good becasue it covers the usual case and expands to other situations (binary, count, etc.). However, the estimation algorithms are different when working on the more general problem, and so the usual (normal continuous) case can be slow when using the software for generalized linear modeling. Stick with the non-generalized software for the normal case, and use the specialized programs when needed.
Model Interpretation After inclusion of the link function - the modeling proceeds as usual. Interpretation of results is similar. On the linearized metric (after using the link function) interpretation is done in a standard way - interpreting significance and sign of parameters. However, for substantive interpretation, it is often easier to back transform the results to the original metric. For example, in a random effects logistic model, one might want to talk about the probability of an event given some specific values of the predictors. Likewise in a poisson (count) model, one might want to talk about the expected count rather than the expected log count. These transformations complicate matters because they are nonlinear and so even random intercepts no longer play a strictly additive role and instead can have a multiplicative effect. Good explanation can be found, among other places, at http://www.ats.ucla.edu/stat/mult_pkg/glmm.htm, and a rich resource of practical questions is here http://glmm.wikidot.com/faq.
Loading Libraries
Loading libraries used in this script.
library(psych) # for describing the data
library(plyr) #for data manipulation
library(ggplot2) # for data visualization
library(lme4) #for multilevel models
Loading Data
Our example makes use of the person-level and day-level (EMA-type) AMIB data files. Specifically, we make use of three varaibles:
Outcome: daily negative affect count, obtained through transformation of negaff
in the day-level file;
Predictor: daily stress, obtained through reverse coding of pss
in the day-level file;
Person-level predictor: trait neuroticism, bfi_n
in the person-level file
Loading person-level file and subsetting to variables of interest
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIBshare_persons_2019_0501.csv"
#read in the .csv file using the url() function
AMIB_persons <- read.csv(file=url(filepath),header=TRUE)
#subsetting to variables of interest
AMIB_persons <- AMIB_persons[ ,c("id","bfi_n")]
Loading day-level file (T = 8) and subsetting to variables of interest.
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIBshare_daily_2019_0501.csv"
#read in the .csv file using the url() function
AMIB_daily <- read.csv(file=url(filepath),header=TRUE)
#subsetting to variables of interest
AMIB_daily <- AMIB_daily[ ,c("id","day","negaff","pss")]
Data Preparation
For our Poisson illustration we convert negaff
to a count variable - which must have only integer values. Our new variable is called negaffcount
.
#making a negaff count variable that ranges from 0 to 60
AMIB_daily$negaffcount <- (round(AMIB_daily$negaff-1,1)*10)
#describing new variable
table(AMIB_daily$negaffcount)
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
## 39 45 42 49 56 56 82 69 58 57 72 51 65 47 26 50 47 40 39 52 38 38 37 32 20
## 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 48 49 51
## 22 27 22 17 14 12 10 13 10 9 10 9 5 8 6 9 6 6 2 1 2 3 3 3 2
## 54 59
## 1 2
describe(AMIB_daily$negaffcount)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 1441 14.47 10.41 12 13.38 10.38 0 59 59 0.96 0.77
## se
## X1 0.27
#histogram of the count variable
ggplot(data=AMIB_daily, aes(x=negaffcount)) +
geom_histogram(fill="white", color="black")
We also reverse code pss
to a continuous stress
variable.
#reverse coding the pss variable into a new stress variable
AMIB_daily$stress <- 4 - AMIB_daily$pss
#describing new variable
describe(AMIB_daily$stress)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 1445 1.39 0.68 1.25 1.36 0.74 0 4 4 0.35 0.13
## se
## X1 0.02
#histogram
ggplot(data=AMIB_daily, aes(x=stress)) +
geom_histogram(fill="white", color="black") +
labs(x = "Stress (high = more stressed")
As usual we split the predictor into “trait” (between-person differences) and “state” (within-person deviations) components. Specifically, the daily variable stress
is split into two varaibles: stress_trait
is the sample-mean centered between-person component, and stress_state
is the person-centered within-person component.
Making trait variables.
#calculating intraindividual means
#Alternative approach using dplyr
#AMIB_imeans <- AMIB_daily %>% group_by(id) %>% summarize(stress_trait=mean(stress, na.rm=TRUE))
AMIB_imeans <- ddply(AMIB_daily, "id", summarize,
stress_trait = mean(stress, na.rm=TRUE))
describe(AMIB_imeans)
## vars n mean sd median trimmed mad min max
## id 1 190 318.29 130.44 321.50 318.99 151.23 101.00 532.00
## stress_trait 2 190 1.40 0.48 1.41 1.39 0.51 0.19 2.56
## range skew kurtosis se
## id 431.00 -0.04 -1.09 9.46
## stress_trait 2.38 -0.04 -0.23 0.03
#merging into person-level file
AMIB_persons <- merge(AMIB_persons, AMIB_imeans, by="id")
#make centered versions of the person-level scores
AMIB_persons$bfi_n_c <- scale(AMIB_persons$bfi_n,center=TRUE,scale=FALSE)
AMIB_persons$stress_trait_c <- scale(AMIB_persons$stress_trait,center=TRUE,scale=FALSE)
#describe person-level data
describe(AMIB_persons)
## vars n mean sd median trimmed mad min max
## id 1 190 318.29 130.44 321.50 318.99 151.23 101.00 532.00
## bfi_n 2 190 2.98 0.96 3.00 3.00 1.48 1.00 5.00
## stress_trait 3 190 1.40 0.48 1.41 1.39 0.51 0.19 2.56
## bfi_n_c 4 190 0.00 0.96 0.02 0.02 1.48 -1.98 2.02
## stress_trait_c 5 190 0.00 0.48 0.01 0.00 0.51 -1.21 1.17
## range skew kurtosis se
## id 431.00 -0.04 -1.09 9.46
## bfi_n 4.00 -0.09 -0.82 0.07
## stress_trait 2.38 -0.04 -0.23 0.03
## bfi_n_c 4.00 -0.09 -0.82 0.07
## stress_trait_c 2.38 -0.04 -0.23 0.03
Making state variables in long data
#merging person-level data into daily data
daily_long <- merge(AMIB_daily,AMIB_persons,by="id")
#calculating state variables
daily_long$stress_state <- daily_long$stress - daily_long$stress_trait
#describing data
describe(daily_long)
## vars n mean sd median trimmed mad min max
## id 1 1458 322.53 129.08 324.00 324.20 151.23 101.00 532.00
## day 2 1458 3.48 2.30 3.00 3.47 2.97 0.00 7.00
## negaff 3 1441 2.45 1.04 2.20 2.34 1.04 1.00 6.90
## pss 4 1445 2.61 0.68 2.75 2.64 0.74 0.00 4.00
## negaffcount 5 1441 14.47 10.41 12.00 13.38 10.38 0.00 59.00
## stress 6 1445 1.39 0.68 1.25 1.36 0.74 0.00 4.00
## bfi_n 7 1458 2.97 0.96 3.00 2.98 1.48 1.00 5.00
## stress_trait 8 1458 1.39 0.47 1.41 1.39 0.51 0.19 2.56
## bfi_n_c 9 1458 -0.01 0.96 0.02 0.00 1.48 -1.98 2.02
## stress_trait_c 10 1458 -0.01 0.47 0.01 -0.01 0.51 -1.21 1.17
## stress_state 11 1445 0.00 0.49 -0.03 -0.02 0.46 -1.75 2.12
## range skew kurtosis se
## id 431.00 -0.07 -1.06 3.38
## day 7.00 0.01 -1.24 0.06
## negaff 5.90 0.96 0.77 0.03
## pss 4.00 -0.35 0.13 0.02
## negaffcount 59.00 0.96 0.77 0.27
## stress 4.00 0.35 0.13 0.02
## bfi_n 4.00 -0.08 -0.79 0.03
## stress_trait 2.38 -0.08 -0.21 0.01
## bfi_n_c 4.00 -0.08 -0.79 0.03
## stress_trait_c 2.38 -0.08 -0.21 0.01
## stress_state 3.88 0.36 0.79 0.01
Great! Now the data are all prepared as needed.
Outlining the substantive inquiry.
As we did previously, we define stress reactivty (a person-level dynamic characteristic) as the extent to which a person’s level of daily negative affect is related to daily stress level. Stress reactivity is the intraindividual covariation of daily negative affect and daily stress. This time, though, we explicitly model negative affect as a count variable.
Formally,
\[log_{e}\left( y_{it} \right) = \beta_{0i} + \beta_{1i}x_{it}\]
where the outcome (left side of equation) has been “transformed” using the log link function, \(g(\cdot) = log_{e}(\cdot)\), and where \(\beta_{1i}\) is individual i’s level of stress reactivity.
Lets first look at a subset of persons to see what stress reactivity looks like.
#faceted plot
ggplot(data=daily_long[which(daily_long$id <= 125),], aes(x=stress_state,y=negaffcount)) +
geom_point() + #(position=position_jitter(h=.025)) +
stat_smooth(method="glm", method.args = list(family="poisson"), fullrange=TRUE) +
xlab("Stress") + ylab("Negative Affect (Count)") +
scale_x_continuous(limits=c(-2,2),breaks=seq(-2,2,by=1)) +
scale_y_continuous(limits=c(0,60), breaks=c(0,30,60)) +
facet_wrap( ~ id)
Note the nonlinearity in the associations captured by the Poisson regression, and the between-person differences in the shape of the association.
Running the models
We use the glmer()
function in the lme4
package.
To start, lets fit the usual unconditional means model …
#unconditional means model
model0_fit <- glmer(formula = negaffcount ~ 1 + (1|id),
family="poisson",
data=daily_long,
na.action=na.exclude)
summary(model0_fit)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: negaffcount ~ 1 + (1 | id)
## Data: daily_long
##
## AIC BIC logLik deviance df.resid
## 12556.4 12567.0 -6276.2 12552.4 1439
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.8837 -1.4131 -0.3372 1.0990 8.9100
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 0.3014 0.549
## Number of obs: 1441, groups: id, 190
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.55757 0.04065 62.91 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The output is similar to what we are used to - although, note that there is no residual term (as per the equation above). Remember that the outcome is \(log(negaffcount_{it})\).
OK - lets add the predictors, stress_trait
and stress_level._state
to examine stress reactivity.
#simple model
model1_fit <- glmer(formula = negaffcount ~ 1 + stress_trait_c +
stress_state + stress_state:stress_trait_c +
(1 + stress_state|id),
family="poisson",
data=daily_long,
na.action=na.exclude)
summary(model1_fit)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula:
## negaffcount ~ 1 + stress_trait_c + stress_state + stress_state:stress_trait_c +
## (1 + stress_state | id)
## Data: daily_long
##
## AIC BIC logLik deviance df.resid
## 10558.8 10595.7 -5272.4 10544.8 1431
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.6742 -1.0652 -0.1746 0.8470 7.3430
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 0.1591 0.3989
## stress_state 0.1500 0.3874 -0.34
## Number of obs: 1438, groups: id, 190
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.50209 0.03018 82.898 < 2e-16 ***
## stress_trait_c 0.83509 0.06396 13.057 < 2e-16 ***
## stress_state 0.55984 0.03469 16.137 < 2e-16 ***
## stress_trait_c:stress_state -0.29855 0.07564 -3.947 7.91e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) strs__ strss_
## strss_trt_c -0.032
## stress_stat -0.305 0.021
## strss_tr_:_ 0.021 -0.300 -0.139
#other useful outputs:
#confint(model1_fit) # 95% CI for the coefficients
#exp(coef(model1_fit)) # exponentiated coefficients
#exp(confint(model1_fit, method="Wald")) # 95% CI for exponentiated coefficients
#predict(model1_fit, type="response") # predicted values
#residuals(model1_fit, type="deviance") # residuals
Interpreting and Plotting the Results Interpretation occurs in the usual way for Poisson models.
Within this model the intercept \(\gamma_{00}\) is the expected ‘log’ of negative affect for a prototypical person on a prototyical day. We exponentiate to obtain an estimate on the original ‘count’ scale.
exp(2.50209)
## [1] 12.20798
Therefore, the prototype person is expected to have negative affect count of 12.21 on the protoypical (average stress) day.
Similarly, \(\gamma_{10}\) is the protoypical within-person association between daily stress and log negative affect. We exponentiate to obtain an estimate on the original ‘count’ scale.
exp(0.55983)
## [1] 1.750375
On the prototypical day, a one unit increase in daily stress is associated with a 1.75 increase in negative affect count.
Lets look at predicted scores. This can be done in two ways.
1. We can predict in the link
scale (for us that is log odds), or
2. We can predict in the response
scale (for us that is count).
#obtaining predicted scores for individuals
daily_long$pred_m1log <- predict(model1_fit, type="link")
daily_long$pred_m1original <- predict(model1_fit, type="response")
#create new data for prototype with average person-level predictor
newdata <- daily_long
newdata$stress_trait_c <- 0
#obtaining predicted scores for prototype
daily_long$proto_m1log <- predict(model1_fit, newdata, type="link", re.form=~0)
daily_long$proto_m1original <- predict(model1_fit, newdata, type="response", re.form=~0)
Plotting predicted within-person associations.
#Log scale predictions
ggplot(data = daily_long, aes(x = stress_state, y = pred_m1log, group = id)) +
geom_line(color="black") +
geom_point(aes(x = stress_state, y = proto_m1log), color="red",size=2) +
xlab("Stress State (centered)") + ylab("Predicted Log Negative Affect")
#Count scale predictions
ggplot(data = daily_long, aes(x = stress_state, y = pred_m1original, group = id)) +
geom_line(color="black") +
geom_point(aes(x = stress_state, y = proto_m1original), color="red",size=2) +
xlab("Stress State (centered)") + ylab("Predicted Negative Affect (Count)")
Adding another Person-level Predictor
As in the basic multilevel model example, we can add trait neuroticism to the model, to see if and how between-person differences in neuroticism are related to stress reactivity.
Running the expanded model.
#simple model
model2_fit <- glmer(formula = negaffcount ~ 1 + stress_trait_c +
bfi_n_c + stress_trait_c:bfi_n_c +
stress_state + stress_state:stress_trait_c +
stress_state:bfi_n_c + stress_state:stress_trait_c:bfi_n_c +
(1 + stress_state|id),
family="poisson",
data=daily_long,
na.action=na.exclude)
summary(model2_fit)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula:
## negaffcount ~ 1 + stress_trait_c + bfi_n_c + stress_trait_c:bfi_n_c +
## stress_state + stress_state:stress_trait_c + stress_state:bfi_n_c +
## stress_state:stress_trait_c:bfi_n_c + (1 + stress_state | id)
## Data: daily_long
##
## AIC BIC logLik deviance df.resid
## 10537.8 10595.8 -5257.9 10515.8 1427
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.6747 -1.0714 -0.1799 0.8393 7.3669
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 0.1384 0.3720
## stress_state 0.1476 0.3841 -0.36
## Number of obs: 1438, groups: id, 190
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.52086 0.02893 87.142 < 2e-16
## stress_trait_c 0.78492 0.06161 12.740 < 2e-16
## bfi_n_c 0.12743 0.03046 4.184 2.87e-05
## stress_state 0.55100 0.03494 15.768 < 2e-16
## stress_trait_c:bfi_n_c -0.19233 0.06263 -3.071 0.00214
## stress_trait_c:stress_state -0.31526 0.07722 -4.083 4.45e-05
## bfi_n_c:stress_state 0.01425 0.03636 0.392 0.69518
## stress_trait_c:bfi_n_c:stress_state 0.10430 0.07927 1.316 0.18826
##
## (Intercept) ***
## stress_trait_c ***
## bfi_n_c ***
## stress_state ***
## stress_trait_c:bfi_n_c **
## stress_trait_c:stress_state ***
## bfi_n_c:stress_state
## stress_trait_c:bfi_n_c:stress_state
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) strs__ bf_n_c strss_ st__:__ st__:_ bf__:_
## strss_trt_c -0.023
## bfi_n_c -0.011 -0.213
## stress_stat -0.321 0.017 0.002
## strss_t_:__ -0.204 -0.045 -0.002 0.061
## strss_tr_:_ 0.017 -0.316 0.062 -0.103 0.021
## bf_n_c:str_ 0.002 0.065 -0.326 -0.014 0.011 -0.183
## strs__:__:_ 0.059 0.021 0.011 -0.170 -0.316 -0.128 -0.103
We see that, while neuroticism is related to level of negative affect count, and moderates the association between stress_trait and negative affect count, neuroticism does not moderate the within-person association between stress_state and negative affect count. That is, neuroticism is not related to stress reactivity in these data.
This tutorial illustrated application of the generalized multilevel model in cases where the outcome variable is not normally distributed. Our example focused on a count variable, but the setp applies for other types of variables as well (e.g., binomial, Gamma). We encourage taking some with the number of random effects included in these models, and with interpretation. Converting results back and forth using link functions and inverse link functions is complicated, but well worth the effort for interpretation and communication. Readers are most appreciative!
Good luck! =:-]