Overview

In Chapter 3 we illustrated how intraindividual covariation is examined within the multilevel modeling framework. We now build on that foundation in various ways. In particular, this tutorial demonstrates how the generalized multilevel model is used when the outcome variable is binary (or Poisson). Using link functions, the generalized model provides opportunity to articulate and test hypotheses about another set of dynamic characteristics using experience sampling data.

Much of the push into “ecological momentary assessment”, as one type of experience sampling study design, was done in health area. Often, the variables of interest are behaviors or states that are categorical - often binary (e.g., sick vs. not-sick, adherence vs. non-adherence) or ordinal counts (e.g., number of drinks, number of cigarettes). When these variables are the outcome variables, we need to use a generalized linear multilevel model.

After some explanantion of the modeling framework, we will work through a few examples.
The first example follows directly from Bolger & Laurenceau (2013) Chapter 6. The second example is drawn from the AMIB data.

Outline

In this session we cover …

A. Introduction to the Generalized Multilevel Model
B. Modeling Binary Outcome
Example 1: Categorical Outcomes Dataset from Bolger & Laurenceau (2013) Chapter 6
C. Modeling Count Outcome
Example 2: Poisson Outcome from AMIB Dataset

Prelim - Loading libraries used in this script.

*Note that we switch from the nlme package to the lme4 package. The functions work similarly, but the coding is slightly different.

library(psych)
library(ggplot2)
library(lme4)
library(gtools)
library(plyr)
library(nlme)
library(MASS)
library(glmmTMB)

A. Introduction to the Generalized Multilevel Model

The basic linear multilevel model is written as

\[y_{it} = \beta_{0i} + \beta_{1i}x_{it} + e_{it}\] where \(\beta_{1i}\) is individual i’s level of … -ity, and where

\[\beta_{0i} = \gamma_{00} + \gamma_{01}z_{i} + u_{0i}\] \[\beta_{1i} = \gamma_{10} + \gamma_{11}z_{i} + u_{1i}\]

where \[e_{it} \tilde{} N(0,\mathbf{R})\], and \[\mathbf{U}_{i} \tilde{} N(0,\mathbf{G})\]

\[\mathbf{R} = \mathbf{I}\sigma^{2}_{e}\], where where \(I\) is the identity matrix (diagonal matrix of 1s) and \(\sigma^{2}_{e}\) is the residual (within-person) variance.

\[\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0} & \sigma_{u0u1} \\ \sigma_{u1u0} & \sigma^{2}_{u1} \end{array}\right]\]

The generalized linear multilevel model is an extension of linear multilevel models that allows that response variables from different distributions besides Gaussian (see also http://www.ats.ucla.edu/stat/mult_pkg/glmm.htm). To do this, we introduce a link function.

Let the linear predictor (the right hand side of the equation) be called \(\mathbf{\eta}\). \(\mathbf{\eta}\) is the combination of the fixed and random effects excluding the residuals. Writing the above model in a general way common in statistics \[\mathbf{\eta = X\beta + Z\gamma}\]

We introduce a generic link function, \(g(\cdot)\), that relates the outcome \(\mathbf{y}\) to the linear predictor \(\mathbf{\eta}\).
\(g(\cdot)\) = link function
\(h(\cdot)=g^{−1}(\cdot)\) = inverse link function
We use these link functions to formalize that the conditional expectation of \(\mathbf{y}\) (conditional because it is the expected value depending on the level of the predictors) is

\[g(E(\mathbf{y}))=\mathbf{\eta}\] So, basically the link function “transforms” the outcome variable \(\mathbf{y}\) into a normal outcome.

We could also model the expectation of \(\mathbf{y}\) as \[E(\mathbf{y})=h(\mathbf{\eta})=\mathbf{\mu}\]

With \(\mathbf{y}\) itself equal to

\[\mathbf{y}=h(\mathbf{\eta}) + \mathbf{e}\]

B. Modeling Binary Outcome

Example 1: Categorical Outcomes Dataset from Bolger & Laurenceau (2013) Chapter 6

For the first example, we use data from the book. As explained in section 6.1 (p. 106-107), the dataset is from a daily diary study of couples during what was for them a typical 4-week period. Each day both partners in each couple provided reports in the morning and in the evening. For this illustration we use a categorical outcome. In particular, following the example in the book, we examine evening reports of conflict from the view of the male partner, testing whether the anger/irritability of the female partner at the beginning of a given day increased the risk of conflicts later in the day.

Prelim - Reading in Repeated Measures Data

The data can be downloaded from the book’s website (http://www.intensivelongitudinal.com/ch6/ch6R.zip) … We’ve reposted it for ease of running these scripts. Thank you to DRs. Bolger and Laurenceau.

#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/categorical.csv"
#read in the .csv file using the url() function
daily <- read.csv(file="categorical.csv",header=TRUE)

Lets have a quick look at the data file and the descriptives.

#data structure
head(daily,10)
##    id time     time7c pconf lpconf    lpconfc     amang     amangc
## 1   1    2 -1.8756988     0      0 -0.1568773 0.4166667 -0.0697026
## 2   1    3 -1.7275506     0      0 -0.1568773 0.0000000 -0.4863693
## 3   1    4 -1.5794025     0      0 -0.1568773 0.0000000 -0.4863693
## 4   1    5 -1.4312543     0      0 -0.1568773 0.0000000 -0.4863693
## 5   1    6 -1.2831062     1      0 -0.1568773 0.0000000 -0.4863693
## 6   1    7 -1.1349580     0      1  0.8431227 0.0000000 -0.4863693
## 7   1    8 -0.9868099     0      0 -0.1568773 0.0000000 -0.4863693
## 8   1    9 -0.8386617     0      0 -0.1568773 0.0000000 -0.4863693
## 9   1   10 -0.6905136     0      0 -0.1568773 0.0000000 -0.4863693
## 10  1   11 -0.5423654     0      0 -0.1568773 0.0000000 -0.4863693
##       amangcb    amangcw
## 1  -0.4709372  0.4012346
## 2  -0.4709372 -0.0154321
## 3  -0.4709372 -0.0154321
## 4  -0.4709372 -0.0154321
## 5  -0.4709372 -0.0154321
## 6  -0.4709372 -0.0154321
## 7  -0.4709372 -0.0154321
## 8  -0.4709372 -0.0154321
## 9  -0.4709372 -0.0154321
## 10 -0.4709372 -0.0154321
#unique ids
unique(daily$id)
##  [1]   1   3   5   6   9  10  11  12  13  14  15  16  17  18  19  20  21
## [18]  22  23  24  26  27  30  31  32  33  34  36  37  38  42  43  44  45
## [35]  46  48  55  57  59  60  65  69  70  71  72  74  75  76  78  79  80
## [52]  81  83  87  89  91  92  93  95 100 102
#unique occasions
unique(daily$time)
##  [1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
## [24] 25 26 27 28
#descriptives
describe(daily)
##         vars    n  mean    sd median trimmed   mad   min    max  range
## id         1 1345 45.61 30.19  38.00   44.44 37.06  1.00 102.00 101.00
## time       2 1345 14.66  7.76  14.00   14.60 10.38  2.00  28.00  26.00
## time7c     3 1345  0.00  1.15  -0.10   -0.01  1.54 -1.88   1.98   3.85
## pconf      4 1345  0.14  0.35   0.00    0.06  0.00  0.00   1.00   1.00
## lpconf     5 1345  0.16  0.36   0.00    0.07  0.00  0.00   1.00   1.00
## lpconfc    6 1345  0.00  0.36  -0.16   -0.09  0.00 -0.16   0.84   1.00
## amang      7 1345  0.49  1.11   0.00    0.22  0.00  0.00  10.00  10.00
## amangc     8 1345  0.00  1.11  -0.49   -0.27  0.00 -0.49   9.51  10.00
## amangcb    9 1345  0.00  0.49  -0.18   -0.10  0.27 -0.47   1.61   2.08
## amangcw   10 1345  0.00  1.00  -0.17   -0.15  0.27 -2.10   9.40  11.50
##         skew kurtosis   se
## id      0.28    -1.29 0.82
## time    0.06    -1.18 0.21
## time7c  0.06    -1.18 0.03
## pconf   2.02     2.09 0.01
## lpconf  1.88     1.55 0.01
## lpconfc 1.88     1.55 0.01
## amang   3.96    21.05 0.03
## amangc  3.96    21.05 0.03
## amangcb 1.72     2.32 0.01
## amangcw 3.78    22.51 0.03

The variables of interest are:
id = couple ID number
time = diary day
pconf = male partner’s evening report of the occurrence of conflict that day
amang = female partner’s morning report of anger/irritability

Lets look at the data for each couple (a version of Figure 6.2)

#faceted plot
ggplot(data=daily, aes(x=amang,y=pconf)) +
  geom_point() + #(position=position_jitter(h=.025)) + 
  xlab("Female Partner Morning Anger") + 
  ylab("Male Partner Evening Report of Conflict") + 
  scale_x_continuous(limits=c(0,10),breaks=c(0,5,10)) + 
  scale_y_continuous(limits=c(0,1), breaks=c(0,1)) +
  facet_wrap( ~ id)

OK - Lets briefly consider the substantive framework of our inquiry. We define a person-level dynamic characteristic, anger sensitivity as the extent to which anger influences risk of conflict. Anger sensitivity is defined as the intraindividual covariation of pconf and amang.

Formally,
\[log\left(\frac{p(y_{it})}{1-p(y_{it})}\right) = \beta_{0i} + \beta_{1i}x_{it}\]

where the outcome (left side of equation) has been “transformed” using the logistic link function, \(g(\cdot) = log_{e}\left(\frac{p}{1−p}\right)\), and where \(\beta_{1i}\) is individual i’s level of anger sensitivity.

Setting up for Intraindividual Covariation with Binary Outcome (GMLM)

As usual we split the predictor into “trait” (between-person differences) and “state” (within-person deviations) components. In these data, this has already been done. Specifically, the variable amang has been split into two varaibles: amangcb is the sample-mean centered between-person component, and amangcw is the person-centered within-person component. Covariates include lpconfc which is a centered variable that indicates yesterday’s conflict report, and time7c which is a centered variable that indicates passage of time in weeks.

The lme4 package contains tools for fitting generalized linear mixed effects models (GLLM), in particular the glmer() function.

Note that lme() (from the nlme package) and glmer() have different coding setups.

Usage:
glmer(formula, data = NULL, family = gaussian, control = glmerControl(), start = NULL, verbose = 0L, nAGQ = 1L, subset, weights, na.action, offset, contrasts = NULL, mustart, etastart, devFunOnly = FALSE, …)

formula = a two-sided linear formula object describing both the fixed-effects and fixed-effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. Random-effects terms are distinguished by vertical bars (“|”) separating expressions for design matrices from grouping factors.

For illustration of the code, lets fit the usual unconditional means model …

um.fit <- glmer(formula = pconf ~ 1 + (1|id), 
              data=daily,
              family="binomial",
              na.action=na.exclude)
summary(um.fit)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: pconf ~ 1 + (1 | id)
##    Data: daily
## 
##      AIC      BIC   logLik deviance df.resid 
##   1098.7   1109.1   -547.3   1094.7     1343 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.6792 -0.4056 -0.3746 -0.3175  3.1767 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.3132   0.5596  
## Number of obs: 1345, groups:  id, 61
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.8625     0.1117  -16.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The output is similar to what we are used to - although, note that there is no residual term (as per the equation above).

OK - lets add the predictor, amangcw to examine anger sensitivity.

#simple model 
model1.fit <- glmer(formula = pconf ~ 1 + amangcw + 
                      (1|id), 
                    data=daily,
                    family="binomial",
                    na.action=na.exclude)
summary(model1.fit)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: pconf ~ 1 + amangcw + (1 | id)
##    Data: daily
## 
##      AIC      BIC   logLik deviance df.resid 
##   1090.6   1106.2   -542.3   1084.6     1342 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.8490 -0.4081 -0.3614 -0.3040  4.0245 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.3244   0.5695  
## Number of obs: 1345, groups:  id, 61
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.88160    0.11345 -16.586  < 2e-16 ***
## amangcw      0.21986    0.06605   3.329 0.000872 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##         (Intr)
## amangcw -0.099
#other useful outputs:
#confint(fit) # 95% CI for the coefficients
#exp(coef(fit)) # exponentiated coefficients
#exp(confint(fit, method="Wald")) # 95% CI for exponentiated coefficients
#predict(fit, type="response") # predicted values
#residuals(fit, type="deviance") # residuals

Interpretation occurs in the usual way for logistic models.

# Odds Ratio Estimates
exp(fixef(model1.fit))
## (Intercept)     amangcw 
##   0.1523457   1.2459069
#with confidence intervals 
exp(cbind(OR = fixef(model1.fit), confint(model1.fit, method="Wald")[-1,]))
##                    OR     2.5 %    97.5 %
## (Intercept) 0.1523457 0.1219735 0.1902808
## amangcw     1.2459069 1.0946212 1.4181015

Lets look at predicted scores. This can be done in two ways.
1. We can predict in the link scale (for us that is log odds), or
2. We can predict in the response scale (for us that is probability).

daily$pred.m1logit <- predict(model1.fit, type="link")
daily$pred.m1prob <- predict(model1.fit, type="response")

Plotting predicted scores.

#Logit predictions
ggplot(data=daily, aes(x=amangcw,y=pred.m1logit, group=id)) +
  geom_line() + 
  xlab("Female Morning Anger (centered)") + 
  ylab("Predicted Log Odds") 

#Probability predictions
ggplot(data=daily, aes(x=amangcw,y=pred.m1prob, group=id)) +
  geom_line() + 
  xlab("Female Morning Anger (centered)") + 
  ylab("Predicted Probability") 

ggplot(data=daily, aes(x=amangcw,y=pred.m1prob)) +
  geom_point() + #(position=position_jitter(h=.025)) + 
  xlab("Female Partner Morning Anger (centered)") + 
  ylab("Male Partner Evening Report of Conflict") + 
  #scale_x_continuous(limits=c(0,10),breaks=c(0,5,10)) + 
  scale_y_continuous(limits=c(0,1), breaks=c(0,1)) +
  facet_wrap( ~ id)

And lets add the prototypical curve on there.

#create new data with range of values for the predictor
protodata <- data.frame(id = 999,
                        amangcw = seq(-2.5,+7.5,.1))
protodata$pred.m1logit <- predict(model1.fit, newdata=protodata, type="link", 
                                 allow.new.levels = TRUE)
protodata$pred.m1prob <- predict(model1.fit, newdata=protodata, type="response", 
                                 allow.new.levels = TRUE)

#Plotting log-odds predictions
ggplot(data=daily, aes(x=amangcw,y=pred.m1logit, group=id)) +
  geom_line() + 
  geom_line(data=protodata, color="red", size=2) +
  xlab("Female Morning Anger (centered)") + 
  ylab("Predicted Probability") 

#Plotting Probability predictions
ggplot(data=daily, aes(x=amangcw,y=pred.m1prob, group=id)) +
  geom_line() + 
  geom_line(data=protodata, color="red", size=2) +
  xlab("Female Morning Anger (centered)") + 
  ylab("Predicted Probability") +
  scale_y_continuous(limits=c(0,1), breaks=c(0,.2,.4,.6,.8,1))

Note that the intercept can be interpreted as the expected probability when cost = 0. In our case that is derived from the intercept term, -1.881603, which is in logits. Converstion to probability follows \[\gamma_{00} = \frac{p}{1-p}\]. We solve for \(p\), which is easily calculated in R using an inv.logit() function in the gtools package.

inv.logit(fixef(model1.fit)[1])
## (Intercept) 
##   0.1322049

Ok - now lets expand the model to include all the predcitors and covariates (amangcw, amangcb, lpconfc, time7c). This is the model from Section 6.3.

model2.fit <- glmer(formula = pconf ~ 1 + amangcw + amangcb + lpconfc + time7c + 
                      (1|id), 
                    data=daily,
                    family="binomial",
                    na.action=na.exclude)
summary(model2.fit)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: pconf ~ 1 + amangcw + amangcb + lpconfc + time7c + (1 | id)
##    Data: daily
## 
##      AIC      BIC   logLik deviance df.resid 
##   1084.8   1116.0   -536.4   1072.8     1339 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.7981 -0.4224 -0.3533 -0.2893  4.5619 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.2475   0.4975  
## Number of obs: 1345, groups:  id, 61
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.90161    0.10894 -17.456  < 2e-16 ***
## amangcw      0.21565    0.06738   3.200  0.00137 ** 
## amangcb     -0.22437    0.22399  -1.002  0.31649    
## lpconfc      0.31826    0.20892   1.523  0.12767    
## time7c      -0.19222    0.07073  -2.718  0.00657 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##         (Intr) amngcw amngcb lpcnfc
## amangcw -0.108                     
## amangcb  0.050 -0.103              
## lpconfc  0.006 -0.083  0.052       
## time7c   0.132 -0.004 -0.027  0.102

Note that these results correspond to the Mplus model in the book - which is a model without the autocorrelation. There does not seem to be any easy way to get the autocorrelation into lmer(). (see http://bbolker.github.io/mixedmodels-misc/notes/corr_braindump.html for workarounds)

Lets see if we can find a solution that matches the output in Table 6.1. Using the glmmPQL function in the MASS package.

model2.check <- glmmPQL(fixed =  pconf ~ 1 + amangcw + amangcb + lpconfc + time7c, 
                    random = ~ 1 | id,
                    family = binomial, 
                    data = daily,
                    na.action=na.exclude)
## iteration 1
## iteration 2
## iteration 3
## iteration 4
summary(model2.check)
## Linear mixed-effects model fit by maximum likelihood
##  Data: daily 
##   AIC BIC logLik
##    NA  NA     NA
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept)  Residual
## StdDev:   0.5035396 0.9643594
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt 
## Fixed effects: pconf ~ 1 + amangcw + amangcb + lpconfc + time7c 
##                  Value  Std.Error   DF    t-value p-value
## (Intercept) -1.8514003 0.10364066 1281 -17.863648  0.0000
## amangcw      0.2114809 0.06436036 1281   3.285887  0.0010
## amangcb     -0.2165962 0.21947589   59  -0.986879  0.3277
## lpconfc      0.2936809 0.19092390 1281   1.538209  0.1242
## time7c      -0.1901806 0.06820738 1281  -2.788271  0.0054
##  Correlation: 
##         (Intr) amngcw amngcb lpcnfc
## amangcw -0.094                     
## amangcb  0.046 -0.103              
## lpconfc -0.075 -0.066  0.050       
## time7c   0.125 -0.001 -0.028  0.092
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -0.8350503 -0.4451203 -0.3698122 -0.3025580  4.6395007 
## 
## Number of Observations: 1345
## Number of Groups: 61
model3.fit <- glmmPQL(fixed =  pconf ~ 1 + amangcw + amangcb + lpconfc + time7c, 
                    random = ~ 1 | id,
                    correlation = corAR1(),
                    family = binomial, 
                    data = daily,
                    na.action=na.exclude)
## iteration 1
## iteration 2
## iteration 3
## iteration 4
summary(model3.fit)
## Linear mixed-effects model fit by maximum likelihood
##  Data: daily 
##   AIC BIC logLik
##    NA  NA     NA
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept)  Residual
## StdDev:   0.4335377 0.9882222
## 
## Correlation Structure: AR(1)
##  Formula: ~1 | id 
##  Parameter estimate(s):
##        Phi 
## -0.1087779 
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt 
## Fixed effects: pconf ~ 1 + amangcw + amangcb + lpconfc + time7c 
##                  Value  Std.Error   DF    t-value p-value
## (Intercept) -1.8594942 0.09418881 1281 -19.742199  0.0000
## amangcw      0.2024783 0.06571314 1281   3.081246  0.0021
## amangcb     -0.1789475 0.19704554   59  -0.908153  0.3675
## lpconfc      0.9631149 0.17788526 1281   5.414248  0.0000
## time7c      -0.1637469 0.06324930 1281  -2.588912  0.0097
##  Correlation: 
##         (Intr) amngcw amngcb lpcnfc
## amangcw -0.087                     
## amangcb  0.039 -0.097              
## lpconfc -0.171 -0.058  0.053       
## time7c   0.107  0.005 -0.033  0.088
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -0.9883691 -0.4155742 -0.3478463 -0.2916223  4.5127137 
## 
## Number of Observations: 1345
## Number of Groups: 61
VarCorr(model3.fit)
## id = pdLogChol(1) 
##             Variance  StdDev   
## (Intercept) 0.1879549 0.4335377
## Residual    0.9765831 0.9882222

That is pretty close the the SAS output in the chapter.

Using the glmmTMB function in the glmmTMB package.

model2.check <- glmmTMB(formula = pconf ~ 1 + amangcw + amangcb + lpconfc + time7c + (1|id),
                    family = binomial, 
                    data = daily
                    #na.action=na.exclude
                    )
summary(model2.check)
##  Family: binomial  ( logit )
## Formula:          
## pconf ~ 1 + amangcw + amangcb + lpconfc + time7c + (1 | id)
## Data: daily
## 
##      AIC      BIC   logLik deviance df.resid 
##   1084.8   1116.0   -536.4   1072.8     1339 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.2479   0.4979  
## Number of obs: 1345, groups:  id, 61
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.90220    0.10967 -17.344  < 2e-16 ***
## amangcw      0.21574    0.06766   3.188  0.00143 ** 
## amangcb     -0.22451    0.22519  -0.997  0.31876    
## lpconfc      0.31824    0.21019   1.514  0.13002    
## time7c      -0.19231    0.07122  -2.700  0.00693 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Setting time as a factor
daily$timefactor <- factor(daily$time)
levels(daily$timefactor)
##  [1] "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15"
## [15] "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28"
#Including the autoregression
model3.fit <- glmmTMB(formula = pconf ~ 1 + amangcw + amangcb + lpconfc + time7c + (1|id) + ar1(timefactor+0|id),
                    family = binomial, 
                    data = daily)
summary(model3.fit)
##  Family: binomial  ( logit )
## Formula:          
## pconf ~ 1 + amangcw + amangcb + lpconfc + time7c + (1 | id) +  
##     ar1(timefactor + 0 | id)
## Data: daily
## 
##      AIC      BIC   logLik deviance df.resid 
##    615.6    657.2   -299.8    599.6     1337 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name         Variance Std.Dev. Corr                               
##  id     (Intercept)     8.354  2.89                                       
##  id.1   timefactor2  2281.452 47.76                                       
##         timefactor3  2281.452 47.76    -0.02                              
##         timefactor4  2281.452 47.76     0.00 -0.02                        
##         timefactor5  2281.452 47.76     0.00  0.00 -0.02                  
##         timefactor6  2281.452 47.76     0.00  0.00  0.00 -0.02            
##         timefactor7  2281.452 47.76     0.00  0.00  0.00  0.00 -0.02      
##         timefactor8  2281.452 47.76     0.00  0.00  0.00  0.00  0.00 -0.02
##         timefactor9  2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor10 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor11 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor12 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor13 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor14 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor15 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor16 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor17 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor18 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor19 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor20 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor21 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor22 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor23 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor24 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor25 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor26 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor27 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##         timefactor28 2281.452 47.76     0.00  0.00  0.00  0.00  0.00  0.00
##                                                                         
##                                                                         
##                                                                         
##                                                                         
##                                                                         
##                                                                         
##                                                                         
##                                                                         
##                                                                         
##  -0.02                                                                  
##   0.00 -0.02                                                            
##   0.00  0.00 -0.02                                                      
##   0.00  0.00  0.00 -0.02                                                
##   0.00  0.00  0.00  0.00 -0.02                                          
##   0.00  0.00  0.00  0.00  0.00 -0.02                                    
##   0.00  0.00  0.00  0.00  0.00  0.00 -0.02                              
##   0.00  0.00  0.00  0.00  0.00  0.00  0.00 -0.02                        
##   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -0.02                  
##   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -0.02            
##   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -0.02      
##   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -0.02
##   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
##   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
##   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
##   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
##   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
##   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
##   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
##   0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##                                                 
##  -0.02                                          
##   0.00 -0.02                                    
##   0.00  0.00 -0.02                              
##   0.00  0.00  0.00 -0.02                        
##   0.00  0.00  0.00  0.00 -0.02                  
##   0.00  0.00  0.00  0.00  0.00 -0.02            
##   0.00  0.00  0.00  0.00  0.00  0.00 -0.02      
##   0.00  0.00  0.00  0.00  0.00  0.00  0.00 -0.02
## Number of obs: 1345, groups:  id, 61
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -12.3764     0.9532 -12.984   <2e-16 ***
## amangcw       0.2628     0.3300   0.796    0.426    
## amangcb      -0.1846     0.9157  -0.202    0.840    
## lpconfc       0.8038     1.4310   0.562    0.574    
## time7c       -0.2073     0.3626  -0.572    0.567    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It runs, but there is something wrong with how the model is parameterized (also tried timefactor-1). Need to look at more carefully … https://cran.r-project.org/web/packages/glmmTMB/vignettes/covstruct.html

C. Modeling Count Outcome

Example 2: Poisson Outcome in AMIB Dataset

For the second example, we use data from the AMIB example.

Prelim - Reading in Repeated Measures Data

We read the data directly from the online dataset …

#Setting the working directory
#setwd("~/Desktop/Fall_2017")  #Person 1 Computer
#setwd("~/Desktop/Fall_2017")  #Person 2 Computer

#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIBbrief_raw_daily1.csv"
#read in the .csv file using the url() function
daily <- read.csv(file=url(filepath),header=TRUE)

#Little bit of clean-up
var.names.daily <- tolower(colnames(daily))
colnames(daily)<-var.names.daily
names(daily)
##  [1] "id"      "day"     "date"    "slphrs"  "weath"   "lteq"    "pss"    
##  [8] "se"      "swls"    "evalday" "posaff"  "negaff"  "temp"    "hum"    
## [15] "wind"    "bar"     "prec"

We use the long file.

We consider two variables, negaff (negaff) and a new variable “stresslevel” (derived from pss).

For our Poisson illustration we convert negaff to a count variable - which must have only integer values. Our new variable is called negaffcount.

#making a negaff count variable that ranges from 0 to 6
daily$negaffcount <- (round(daily$negaff-1,1)*10)
table(daily$negaffcount)
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
## 39 45 42 49 56 56 82 69 58 57 72 51 65 47 26 50 47 40 39 52 38 38 37 32 20 
## 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 48 49 51 
## 22 27 22 17 14 12 10 13 10  9 10  9  5  8  6  9  6  6  2  1  2  3  3  3  2 
## 54 59 
##  1  2
#histogram of the count variable
ggplot(data=daily, aes(x=negaffcount)) +
  geom_histogram(fill="white", color="black") + 
  labs(x = "Negative Affect (count)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 17 rows containing non-finite values (stat_bin).

We also convert pss to a continuous stresslevel variable.

#reverse coding the pss variable into a new stresslevel variable
daily$stresslevel <- 4 - daily$pss
describe(daily$stresslevel)
##    vars    n mean   sd median trimmed  mad min max range skew kurtosis
## X1    1 1445 1.39 0.68   1.25    1.36 0.74   0   4     4 0.35     0.13
##      se
## X1 0.02
ggplot(data=daily, aes(x=stresslevel)) +
  geom_histogram(fill="white", color="black") +
  labs(x = "Stress (high = more stressed")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 13 rows containing non-finite values (stat_bin).

Lets look at the data for a subset of persons

#faceted plot
ggplot(data=daily[which(daily$id <= 125),], aes(x=stresslevel,y=negaffcount)) +
  geom_point() + #(position=position_jitter(h=.025)) + 
  xlab("Stress Level") + 
  ylab("Negative Affect (Count)") + 
  scale_x_continuous(limits=c(0,4),breaks=seq(0,4,by=1)) + 
  scale_y_continuous(limits=c(0,60), breaks=c(0,30,60)) +
  stat_smooth(method="glm", method.args = list(family="poisson"), fullrange=TRUE) +
  facet_wrap( ~ id)
## Warning: Removed 4 rows containing non-finite values (stat_smooth).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 190 rows containing missing values (geom_smooth).

Setting up the substantive inquiry. Defining the dynamic characteristic of interest

As we did previously, we define a person-level dynamic characteristic, stress reactivty as the extent to which a person’s level of daily negative affect is related to daily stress level. Stress reactivity is the intraindividual covariation of daily negative affect and daily stress. This time, though, we explicitly model negative affect as a count variable.

Formally,
\[log_{e}\left( y_{it} \right) = \beta_{0i} + \beta_{1i}x_{it}\]

where the outcome (left side of equation) has been “transformed” using the log link function, \(g(\cdot) = log_{e}(\cdot)\), and where \(\beta_{1i}\) is individual i’s level of stress reactivity.

Setting up for Intraindividual Covariation with Count Outcome (GMLM)

As usual we split the predictor into “trait” (between-person differences) and “state” (within-person deviations) components. Specifically, the variable stresslevel is split into two varaibles: imean.stresslevel is the sample-mean centered between-person component, and stresslevel.state is the person-centered within-person component.

#calculating intraindividual means
daily.imeans <- ddply(daily, "id", summarize, imean.stresslevel=mean(stresslevel, na.rm=TRUE))
                                              
#centering the person-level variables
daily.imeans$imean.stresslevel.c <- scale(daily.imeans$imean.stresslevel,center=TRUE,scale=FALSE)
describe(daily.imeans)
##                     vars   n   mean     sd median trimmed    mad    min
## id                     1 190 318.29 130.44 321.50  318.99 151.23 101.00
## imean.stresslevel      2 190   1.40   0.48   1.41    1.39   0.51   0.19
## imean.stresslevel.c    3 190   0.00   0.48   0.01    0.00   0.51  -1.21
##                        max  range  skew kurtosis   se
## id                  532.00 431.00 -0.04    -1.09 9.46
## imean.stresslevel     2.56   2.38 -0.04    -0.23 0.03
## imean.stresslevel.c   1.17   2.38 -0.04    -0.23 0.03
#merging intraindividual means into long data
daily <- merge(daily,daily.imeans,by="id")

#calculating state variables
daily$stresslevel.state <- daily$stresslevel - daily$imean.stresslevel

We again use the glmer() function in the lme4 package.

To start, lets fit the usual unconditional means model …

um.fit <- glmer(formula = negaffcount ~ 1 + (1|id), 
              data=daily,
              family="poisson",
              na.action=na.exclude)
summary(um.fit)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: negaffcount ~ 1 + (1 | id)
##    Data: daily
## 
##      AIC      BIC   logLik deviance df.resid 
##  12556.4  12567.0  -6276.2  12552.4     1439 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.8837 -1.4131 -0.3372  1.0990  8.9100 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.3014   0.549   
## Number of obs: 1441, groups:  id, 190
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.55757    0.04065   62.91   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The output is similar to what we are used to - although, note that there is no residual term (as per the equation above).

OK - lets add the predictors, imean.stresslevel and stresslevel.state to examine stress reactivity.

#simple model 
model1.fit <- glmer(formula = negaffcount ~ 1 + imean.stresslevel.c + stresslevel.state + stresslevel.state:imean.stresslevel.c + 
                      (1 + stresslevel.state|id), 
                    data=daily,
                    family="poisson",
                    na.action=na.exclude)
summary(model1.fit)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: 
## negaffcount ~ 1 + imean.stresslevel.c + stresslevel.state + stresslevel.state:imean.stresslevel.c +  
##     (1 + stresslevel.state | id)
##    Data: daily
## 
##      AIC      BIC   logLik deviance df.resid 
##  10558.8  10595.7  -5272.4  10544.8     1431 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.6742 -1.0652 -0.1746  0.8470  7.3430 
## 
## Random effects:
##  Groups Name              Variance Std.Dev. Corr 
##  id     (Intercept)       0.1591   0.3989        
##         stresslevel.state 0.1500   0.3874   -0.34
## Number of obs: 1438, groups:  id, 190
## 
## Fixed effects:
##                                       Estimate Std. Error z value Pr(>|z|)
## (Intercept)                            2.50209    0.03018   82.90  < 2e-16
## imean.stresslevel.c                    0.83509    0.06396   13.06  < 2e-16
## stresslevel.state                      0.55983    0.03469   16.14  < 2e-16
## imean.stresslevel.c:stresslevel.state -0.29855    0.07564   -3.95 7.91e-05
##                                          
## (Intercept)                           ***
## imean.stresslevel.c                   ***
## stresslevel.state                     ***
## imean.stresslevel.c:stresslevel.state ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) imn.s. strss.
## imn.strssl. -0.032              
## strsslvl.st -0.305  0.021       
## imn.strs.:.  0.021 -0.300 -0.139
#other useful outputs:
#confint(fit) # 95% CI for the coefficients
#exp(coef(fit)) # exponentiated coefficients
#exp(confint(fit, method="Wald")) # 95% CI for exponentiated coefficients
#predict(fit, type="response") # predicted values
#residuals(fit, type="deviance") # residuals

Interpretation occurs in the usual way for Poisson models.

Within this model the intercept \(\gamma_{00}\) is the expected ‘log’ of negative affect for a prototypical person on a prototyical day. We exponentiate to obtain an estimate on the original ‘count’ scale.

exp(2.50209)
## [1] 12.20798

Therefore, the prototype person is expected to have negative affect of 12.21 on the protoypical (average stress) day.

Similarly, \(\gamma_{10}\) is the protoypical within-person association between stress and log negative affect. We exponentiate to obtain an estimate on the original ‘count’ scale.

exp(0.55983)
## [1] 1.750375

On the prototypical day, a unit incrase in stress level is associated with a 1.75 increase in log negative affect.

Lets look at predicted scores. This can be done in two ways.
1. We can predict in the link scale (for us that is log odds), or
2. We can predict in the response scale (for us that is probability).

daily$pred.m1log <- predict(model1.fit, type="link")
daily$pred.m1original <- predict(model1.fit, type="response")

Plotting predicted scores.

#Log scale predictions
ggplot(data=daily, aes(x=stresslevel.state,y=pred.m1log, group=id)) +
  geom_line() + 
  xlab("Stress State (centered)") + 
  ylab("Predicted Log Negative Affect") 
## Warning: Removed 16 rows containing missing values (geom_path).

#Count scale predictions
ggplot(data=daily, aes(x=stresslevel.state,y=pred.m1original, group=id)) +
  geom_line() + 
  xlab("Stress State (centered)") + 
  ylab("Predicted Negative Affect (count)") 
## Warning: Removed 16 rows containing missing values (geom_path).

And lets add the prototypical curve on there.

#create new data with range of values for the predictor
protodata <- data.frame(id = 999,
                        stresslevel.state = seq(-2.5,+2.5,.1),
                        imean.stresslevel.c = rep(0,51))
#not sure why these two are not working
# protodata$pred.m1log <- predict(model1.fit, newdata=protodata, type="link", 
#                                  allow.new.levels = TRUE)
# protodata$pred.m1original <- predict(model1.fit, newdata=protodata, type="response", 
#                                  allow.new.levels = TRUE)
 

#A work-around
protodata$pred.m1log <- 2.50209 + 
  0.83509*protodata$imean.stresslevel.c + 
  0.55983*protodata$stresslevel.state + 
  -0.29855*protodata$imean.stresslevel.c*protodata$stresslevel.state
protodata$pred.m1original <- exp(protodata$pred.m1log)

#Plotting Log scale predictions
ggplot(data=daily, aes(x=stresslevel.state,y=pred.m1log, group=id)) +
  geom_line() + 
  geom_line(data=protodata, color="red", size=2) +
  xlab("Stress State (centered)") + 
  ylab("Predicted Log Negative Affect")  
## Warning: Removed 16 rows containing missing values (geom_path).

#Plotting Count scale predictions
ggplot(data=daily, aes(x=stresslevel.state,y=pred.m1original, group=id)) +
  geom_line() + 
  geom_line(data=protodata, color="red", size=2) +
  xlab("Stress State (centered)") + 
  ylab("Predicted Negative Affect (count)")
## Warning: Removed 16 rows containing missing values (geom_path).

Note that in describing

OK - so we built the facility to work with non-normal outcomes (e.g., binary, Poisson). Just be careful with the number of random effects, and with interpretation. Converting results back and forth using link functions and inverse link functions is complicated, but well worth the effort for interpretation and communication. Readers are most appreciative!

Good luck! =:-]