## Overview

In Chapter 3 we illustrated how intraindividual covariation is examined within the multilevel modeling framework. We now build on that foundation in various ways. In particular, this tutorial demonstrates how the generalized multilevel model is used when the outcome variable is binary (or Poisson). Using link functions, the generalized model provides opportunity to articulate and test hypotheses about another set of dynamic characteristics using experience sampling data.

Much of the push into “ecological momentary assessment”, as one type of experience sampling study design, was done in health area. Often, the variables of interest are behaviors or states that are categorical - often binary (e.g., sick vs. not-sick, adherence vs. non-adherence) or ordinal counts (e.g., number of drinks, number of cigarettes). When these variables are the outcome variables, we need to use a generalized linear multilevel model.

After some explanantion of the modeling framework, we will work through a few examples.
The first example follows directly from Bolger & Laurenceau (2013) Chapter 6. The second example is drawn from the AMIB data.

## Outline

In this session we cover …

A. Introduction to the Generalized Multilevel Model
B. Modeling Binary Outcome
Example 1: Categorical Outcomes Dataset from Bolger & Laurenceau (2013) Chapter 6
C. Modeling Count Outcome
Example 2: Poisson Outcome from AMIB Dataset

*Note that we switch from the nlme package to the lme4 package. The functions work similarly, but the coding is slightly different.

library(psych)
library(ggplot2)
library(lme4)
library(gtools)
library(plyr)
library(nlme)
library(MASS)
library(glmmTMB)

## A. Introduction to the Generalized Multilevel Model

The basic linear multilevel model is written as

$y_{it} = \beta_{0i} + \beta_{1i}x_{it} + e_{it}$ where $$\beta_{1i}$$ is individual i’s level of … -ity, and where

$\beta_{0i} = \gamma_{00} + \gamma_{01}z_{i} + u_{0i}$ $\beta_{1i} = \gamma_{10} + \gamma_{11}z_{i} + u_{1i}$

where $e_{it} \tilde{} N(0,\mathbf{R})$, and $\mathbf{U}_{i} \tilde{} N(0,\mathbf{G})$

$\mathbf{R} = \mathbf{I}\sigma^{2}_{e}$, where where $$I$$ is the identity matrix (diagonal matrix of 1s) and $$\sigma^{2}_{e}$$ is the residual (within-person) variance.

$\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0} & \sigma_{u0u1} \\ \sigma_{u1u0} & \sigma^{2}_{u1} \end{array}\right]$

The generalized linear multilevel model is an extension of linear multilevel models that allows that response variables from different distributions besides Gaussian (see also http://www.ats.ucla.edu/stat/mult_pkg/glmm.htm). To do this, we introduce a link function.

Let the linear predictor (the right hand side of the equation) be called $$\mathbf{\eta}$$. $$\mathbf{\eta}$$ is the combination of the fixed and random effects excluding the residuals. Writing the above model in a general way common in statistics $\mathbf{\eta = X\beta + Z\gamma}$

We introduce a generic link function, $$g(\cdot)$$, that relates the outcome $$\mathbf{y}$$ to the linear predictor $$\mathbf{\eta}$$.
$$g(\cdot)$$ = link function
$$h(\cdot)=g^{−1}(\cdot)$$ = inverse link function
We use these link functions to formalize that the conditional expectation of $$\mathbf{y}$$ (conditional because it is the expected value depending on the level of the predictors) is

$g(E(\mathbf{y}))=\mathbf{\eta}$ So, basically the link function “transforms” the outcome variable $$\mathbf{y}$$ into a normal outcome.

We could also model the expectation of $$\mathbf{y}$$ as $E(\mathbf{y})=h(\mathbf{\eta})=\mathbf{\mu}$

With $$\mathbf{y}$$ itself equal to

$\mathbf{y}=h(\mathbf{\eta}) + \mathbf{e}$

## B. Modeling Binary Outcome

### Example 1: Categorical Outcomes Dataset from Bolger & Laurenceau (2013) Chapter 6

For the first example, we use data from the book. As explained in section 6.1 (p. 106-107), the dataset is from a daily diary study of couples during what was for them a typical 4-week period. Each day both partners in each couple provided reports in the morning and in the evening. For this illustration we use a categorical outcome. In particular, following the example in the book, we examine evening reports of conflict from the view of the male partner, testing whether the anger/irritability of the female partner at the beginning of a given day increased the risk of conflicts later in the day.

#### Prelim - Reading in Repeated Measures Data

The data can be downloaded from the book’s website (http://www.intensivelongitudinal.com/ch6/ch6R.zip) … We’ve reposted it for ease of running these scripts. Thank you to DRs. Bolger and Laurenceau.

#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/categorical.csv"
#read in the .csv file using the url() function
daily <- read.csv(file="categorical.csv",header=TRUE)

Lets have a quick look at the data file and the descriptives.

#data structure
head(daily,10)
##    id time     time7c pconf lpconf    lpconfc     amang     amangc
## 1   1    2 -1.8756988     0      0 -0.1568773 0.4166667 -0.0697026
## 2   1    3 -1.7275506     0      0 -0.1568773 0.0000000 -0.4863693
## 3   1    4 -1.5794025     0      0 -0.1568773 0.0000000 -0.4863693
## 4   1    5 -1.4312543     0      0 -0.1568773 0.0000000 -0.4863693
## 5   1    6 -1.2831062     1      0 -0.1568773 0.0000000 -0.4863693
## 6   1    7 -1.1349580     0      1  0.8431227 0.0000000 -0.4863693
## 7   1    8 -0.9868099     0      0 -0.1568773 0.0000000 -0.4863693
## 8   1    9 -0.8386617     0      0 -0.1568773 0.0000000 -0.4863693
## 9   1   10 -0.6905136     0      0 -0.1568773 0.0000000 -0.4863693
## 10  1   11 -0.5423654     0      0 -0.1568773 0.0000000 -0.4863693
##       amangcb    amangcw
## 1  -0.4709372  0.4012346
## 2  -0.4709372 -0.0154321
## 3  -0.4709372 -0.0154321
## 4  -0.4709372 -0.0154321
## 5  -0.4709372 -0.0154321
## 6  -0.4709372 -0.0154321
## 7  -0.4709372 -0.0154321
## 8  -0.4709372 -0.0154321
## 9  -0.4709372 -0.0154321
## 10 -0.4709372 -0.0154321
#unique ids
unique(daily$id) ## [1] 1 3 5 6 9 10 11 12 13 14 15 16 17 18 19 20 21 ## [18] 22 23 24 26 27 30 31 32 33 34 36 37 38 42 43 44 45 ## [35] 46 48 55 57 59 60 65 69 70 71 72 74 75 76 78 79 80 ## [52] 81 83 87 89 91 92 93 95 100 102 #unique occasions unique(daily$time)
##  [1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
## [24] 25 26 27 28
#descriptives
describe(daily)
##         vars    n  mean    sd median trimmed   mad   min    max  range
## id         1 1345 45.61 30.19  38.00   44.44 37.06  1.00 102.00 101.00
## time       2 1345 14.66  7.76  14.00   14.60 10.38  2.00  28.00  26.00
## time7c     3 1345  0.00  1.15  -0.10   -0.01  1.54 -1.88   1.98   3.85
## pconf      4 1345  0.14  0.35   0.00    0.06  0.00  0.00   1.00   1.00
## lpconf     5 1345  0.16  0.36   0.00    0.07  0.00  0.00   1.00   1.00
## lpconfc    6 1345  0.00  0.36  -0.16   -0.09  0.00 -0.16   0.84   1.00
## amang      7 1345  0.49  1.11   0.00    0.22  0.00  0.00  10.00  10.00
## amangc     8 1345  0.00  1.11  -0.49   -0.27  0.00 -0.49   9.51  10.00
## amangcb    9 1345  0.00  0.49  -0.18   -0.10  0.27 -0.47   1.61   2.08
## amangcw   10 1345  0.00  1.00  -0.17   -0.15  0.27 -2.10   9.40  11.50
##         skew kurtosis   se
## id      0.28    -1.29 0.82
## time    0.06    -1.18 0.21
## time7c  0.06    -1.18 0.03
## pconf   2.02     2.09 0.01
## lpconf  1.88     1.55 0.01
## lpconfc 1.88     1.55 0.01
## amang   3.96    21.05 0.03
## amangc  3.96    21.05 0.03
## amangcb 1.72     2.32 0.01
## amangcw 3.78    22.51 0.03

The variables of interest are:
id = couple ID number
time = diary day
pconf = male partner’s evening report of the occurrence of conflict that day
amang = female partner’s morning report of anger/irritability

Lets look at the data for each couple (a version of Figure 6.2)

#faceted plot
ggplot(data=daily, aes(x=amang,y=pconf)) +
geom_point() + #(position=position_jitter(h=.025)) +
xlab("Female Partner Morning Anger") +
ylab("Male Partner Evening Report of Conflict") +
scale_x_continuous(limits=c(0,10),breaks=c(0,5,10)) +
scale_y_continuous(limits=c(0,1), breaks=c(0,1)) +
facet_wrap( ~ id)

OK - Lets briefly consider the substantive framework of our inquiry. We define a person-level dynamic characteristic, anger sensitivity as the extent to which anger influences risk of conflict. Anger sensitivity is defined as the intraindividual covariation of pconf and amang.

Formally,
$log\left(\frac{p(y_{it})}{1-p(y_{it})}\right) = \beta_{0i} + \beta_{1i}x_{it}$

where the outcome (left side of equation) has been “transformed” using the logistic link function, $$g(\cdot) = log_{e}\left(\frac{p}{1−p}\right)$$, and where $$\beta_{1i}$$ is individual i’s level of anger sensitivity.

#### Setting up for Intraindividual Covariation with Binary Outcome (GMLM)

As usual we split the predictor into “trait” (between-person differences) and “state” (within-person deviations) components. In these data, this has already been done. Specifically, the variable amang has been split into two varaibles: amangcb is the sample-mean centered between-person component, and amangcw is the person-centered within-person component. Covariates include lpconfc which is a centered variable that indicates yesterday’s conflict report, and time7c which is a centered variable that indicates passage of time in weeks.

The lme4 package contains tools for fitting generalized linear mixed effects models (GLLM), in particular the glmer() function.

Note that lme() (from the nlme package) and glmer() have different coding setups.

Usage:
glmer(formula, data = NULL, family = gaussian, control = glmerControl(), start = NULL, verbose = 0L, nAGQ = 1L, subset, weights, na.action, offset, contrasts = NULL, mustart, etastart, devFunOnly = FALSE, …)

formula = a two-sided linear formula object describing both the fixed-effects and fixed-effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. Random-effects terms are distinguished by vertical bars (“|”) separating expressions for design matrices from grouping factors.

For illustration of the code, lets fit the usual unconditional means model …

um.fit <- glmer(formula = pconf ~ 1 + (1|id),
data=daily,
family="binomial",
na.action=na.exclude)
summary(um.fit)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: pconf ~ 1 + (1 | id)
##    Data: daily
##
##      AIC      BIC   logLik deviance df.resid
##   1098.7   1109.1   -547.3   1094.7     1343
##
## Scaled residuals:
##     Min      1Q  Median      3Q     Max
## -0.6792 -0.4056 -0.3746 -0.3175  3.1767
##
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.3132   0.5596
## Number of obs: 1345, groups:  id, 61
##
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -1.8625     0.1117  -16.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The output is similar to what we are used to - although, note that there is no residual term (as per the equation above).

OK - lets add the predictor, amangcw to examine anger sensitivity.

#simple model
model1.fit <- glmer(formula = pconf ~ 1 + amangcw +
(1|id),
data=daily,
family="binomial",
na.action=na.exclude)
summary(model1.fit)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: pconf ~ 1 + amangcw + (1 | id)
##    Data: daily
##
##      AIC      BIC   logLik deviance df.resid
##   1090.6   1106.2   -542.3   1084.6     1342
##
## Scaled residuals:
##     Min      1Q  Median      3Q     Max
## -0.8490 -0.4081 -0.3614 -0.3040  4.0245
##
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.3244   0.5695
## Number of obs: 1345, groups:  id, 61
##
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.88160    0.11345 -16.586  < 2e-16 ***
## amangcw      0.21986    0.06605   3.329 0.000872 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
##         (Intr)
## amangcw -0.099
#other useful outputs:
#confint(fit) # 95% CI for the coefficients
#exp(coef(fit)) # exponentiated coefficients
#exp(confint(fit, method="Wald")) # 95% CI for exponentiated coefficients
#predict(fit, type="response") # predicted values
#residuals(fit, type="deviance") # residuals

Interpretation occurs in the usual way for logistic models.

# Odds Ratio Estimates
exp(fixef(model1.fit))
## (Intercept)     amangcw
##   0.1523457   1.2459069
#with confidence intervals
exp(cbind(OR = fixef(model1.fit), confint(model1.fit, method="Wald")[-1,]))
##                    OR     2.5 %    97.5 %
## (Intercept) 0.1523457 0.1219735 0.1902808
## amangcw     1.2459069 1.0946212 1.4181015

Lets look at predicted scores. This can be done in two ways.
1. We can predict in the link scale (for us that is log odds), or
2. We can predict in the response scale (for us that is probability).

daily$pred.m1logit <- predict(model1.fit, type="link") daily$pred.m1prob <- predict(model1.fit, type="response")

Plotting predicted scores.

#Logit predictions
ggplot(data=daily, aes(x=amangcw,y=pred.m1logit, group=id)) +
geom_line() +
xlab("Female Morning Anger (centered)") +
ylab("Predicted Log Odds")