Overview

In the previous tutorials we covered how the multilevel model is used to examine intraindividual covariability. In this tutorial, we outline how an extension, the multilevel model with heterogeneous variance can be used to examine differences in intraindividual variability - which we had previously done in a 2-step way using the iSD.

Outline

  1. Introduction to The Variance Heterogeneity Model
  2. A Series of Analysis Models
  3. Empirical Examples
  4. Conclusion

Prelim - Loading libraries used in this script.

library(psych)
library(ggplot2)
library(nlme)

1. Introduction to the Variance Heterogeneity Model

We have used two “separate” sets of methods to examine …
1. Intraindividual Variation (calculation of within-person summaries; iSD, iEntropy, iMSSD, etc. following Ram & Gerstorf, 2009)
2. Intraindividual Covariation (multilevel models - following Bolger & Laurenceau, 2013)

In 1., our interest was in the interindividual differences in within-person variability in an outcome variable, \(\sigma_{yi}^2\). In 2., our interest was in the interindividual differences in the within-person association between a predictor and outcome \(\beta_{1i}\).

Here we outline a framework where these two interests can be combined together into a single model …. Multilevel Model with Heterogeneous Variance.

Like with multilevel modeling itself, the framework is referred to by different names in different fields. The larger class of models includes models referred to as … Variance Heterogeneity models, Location-Scale models, Generalized Additive Models for Location, Scale and Shape (GAMLSS, http://www.gamlss.org). It is not totally clear if there are some nuanced differences among the various different models. However, as far as I can tell, all of these model variants are applied to longitudinal panel-type data (multiple persons, multiple occassions).

A parallel form of the model exists for time-series data, ARCH models (auto-regressive models with conditional heteroscedasticity; https://en.wikipedia.org/wiki/Autoregressive_conditional_heteroskedasticity). There are lots of variants there including GARCH, NGARCH, IGARCH, EGARCH, GARCH-M, QGARCH, TGARCH. All of these model variants are applied to time-series data (single entity, many many occassions). I am under the impression that not many people have attempted to form links between these two different traditions, but have not yet looked into the literature closely.

Much of what we explore today is covered in Hedeker, Mermelstein, Berbaum, & Campbell (2009), Hedeker & Mermelstein (2007), and Hoffman (2007). Some more recent extensions are in Hedeker, Mermelstein & Demirtas (2012), and Rast, Hofer, & Sparks (2012).

Some Basic Definitions

A few definitions from Wikipedia (which may or may not be useful today, but are part of our larger effort to understand how we can make use of all these models).

Location parameter: https://en.wikipedia.org/wiki/Location_parameter
In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter \(x_0\), which determines the “location” or shift of the distribution. Examples of location parameters include the mean, the median, and the mode.

Scale parameters: https://en.wikipedia.org/wiki/Scale_parameter
In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution.

The normal distribution has two parameters: a location parameter \(\mu\) and a scale parameter \(\sigma\). In practice the normal distribution is often parameterized in terms of the squared scale \(\sigma^2\), which corresponds to the variance of the distribution.

Here you can see location and scale of the distribution changing over time (image from http://www.gamlss.org)

Shape parameters: https://en.wikipedia.org/wiki/Shape_parameter
A shape parameter is any parameter of a probability distribution that is neither a location parameter nor a scale parameter (nor a function of either or both of these only, such as a rate parameter). Such a parameter must affect the shape of a distribution rather than simply shifting it (as a location parameter does) or stretching/shrinking it (as a scale parameter does). Distributions with a shape parameter include skew normal, gamma, Weibull, ExGaussian distributions.

The Usual (Homogeneous Variance) Multilevel Model

Typically, the multilevel models we use (and that are covered in B&L) make a homogeneity of variance assumption. For example, lets look at the basic “unconditional means” model.

\[y_{it} = \beta_{0i} + e_{it}\] \[\beta_{0i} = \gamma_{00} + u_{0i}\]

where \[\mathbf{e_{it}} \sim \mathcal{N}(0,\mathbf{R})\], and \[\mathbf{u_{i}} \sim \mathcal{N}(0,\mathbf{G})\].

For clarity, lets write out the full variance covariance matrix of the between-person residuals, \(\mathbf{G}\). In this case it is just a 1 x 1 matrix. \[\mathbf{G} = \left[\begin{array} {r} \sigma^{2}_{u0} \end{array}\right]\] Embedded here is the assumption that the between-person variance characterizes the entire sample (population). There are no subgroups. Later, we will relax this assumption, and allow that \(\sigma^{2}_{u0g}\) can differ across “groups” of individuals (e.g., Males, Females).

For clarity, lets also write out the full variance covariance matrix of the within-person residuals, \(\mathbf{R}\) (spanning across the t = 8 days of observation in the example data), \[\mathbf{R} = \mathbf{I} \left[\begin{array} {r} \sigma^{2}_{e} \end{array}\right] = \left[\begin{array} {rrrrrrrr} \sigma^{2}_{e} & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \sigma^{2}_{e} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \sigma^{2}_{e} & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & \sigma^{2}_{e} & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & \sigma^{2}_{e} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & \sigma^{2}_{e} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & \sigma^{2}_{e} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & \sigma^{2}_{e} \end{array}\right]\], The homoscedasticity of errors applies across persons and occasions. There are no subgroups. Later, we will relax this assumption, and allow that \(\sigma^{2}_{eg}\) can differ across g “groups” of individuals (e.g., males, females).

Here is a little plot of some simulated data that follow this model.

From this plot we can see …
* between-person differences in \(u_{0i}\),
* homogeneity of within-person variance \(\sigma^{2}_{e}\)

The Heterogenous Variance Multilevel Model

The objective of this model is to relax the homogeneity of variance assumption, and we can do this at both the between-person level, allowing for group or individual differences in \(\sigma^{2}_{u0g}\), and/or at the within-person level, allowing for group or individual differences in \(\sigma^{2}_{eg}\). We expand the basic “unconditional means” model to illustrate.

\[y_{it} = \beta_{0i} + e_{it}\] \[\beta_{0i} = \gamma_{00} + u_{0i}\] where \[\mathbf{e_{it}} \sim \mathcal{N}(0,\mathbf{R})\], and \[\mathbf{u_{i}} \sim \mathcal{N}(0,\mathbf{G})\]. That is all the same as usual.

But now lets add some complexity. Rather than the previous assumption that \(\sigma^{2}_{u0g} = \sigma^{2}_{u0}\), lets allow that \(\sigma^{2}_{u0g} = f(g_{i})\), where \(g_{i}\) is a grouping variable (Male, Female).
This allows that \(\sigma^{2}_{u0g=1} \neq \sigma^{2}_{u0g=2}\).

Formally, the full variance covariance matrix of the between-person residuals is now,
\[\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0g=1} & 0 \\ 0 & \sigma^{2}_{u0g=2} \end{array}\right]\] There are now, explicitly, two variances \(\sigma^{2}_{u0g=1}\) and \(\sigma^{2}_{u0g=2}\), with the off-diagonal covariance being zero because individuals are only in one group. Lets look at some example data.

Note how the extent of between-person differences in \(u_{0i}\) is different in the g = 1 = red and g = 2 = green groups. Visually, the difference in the group variances map to
\[\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0g=1}=0 & 0 \\ 0 & \sigma^{2}_{u0g=2}=1 \end{array}\right]\]

2. A Series of Analysis Models

Let’s set up an analysis model. We use the same lme() function from the nlme package that we have been using, and simply make some adjustments (some are similar to what we did for the dyadic models).

Specific for the heterogeneity in between-person differences we make adjustments to the random = formula part of the script.

First, the usual variance homogeneity model.

This is a model of “common” between-person variance and “common” within person variance, where “common” means that these variances are equivalent across groups, \(\sigma^{2}_{u0g} = \sigma^{2}_{u0}\).

# Common models for between and within person variance
model.01 = lme(fixed = y_ti ~ 1,  
               random = ~ 1 | id,
                       data = sim_nogrow,
                   method = 'REML')
summary(model.01)
## Linear mixed-effects model fit by REML
##  Data: sim_nogrow 
##        AIC     BIC    logLik
##   2351.889 2368.69 -1172.944
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept)  Residual
## StdDev:    1.271246 0.3797654
## 
## Fixed effects: y_ti ~ 1 
##                 Value Std.Error   DF   t-value p-value
## (Intercept) 0.1258579 0.1274079 1900 0.9878346  0.3234
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.489280944 -0.612660509  0.004013223  0.621295100  4.750794513 
## 
## Number of Observations: 2000
## Number of Groups: 100

Now, wihtout changing the model, we are going to write the random = formula as a list(). The exact same model is … where pdSymm() indicates that the random effects are structured as a positive-definite Symmetric matrix.

# Common models for between and within person variance
model.01 = lme(fixed = y_ti ~ 1,  
               random = list(id = pdSymm(form = ~ 1)),
                       data = sim_nogrow,
                   method = 'REML')
summary(model.01)
## Linear mixed-effects model fit by REML
##  Data: sim_nogrow 
##        AIC     BIC    logLik
##   2351.889 2368.69 -1172.944
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept)  Residual
## StdDev:    1.271246 0.3797654
## 
## Fixed effects: y_ti ~ 1 
##                 Value Std.Error   DF   t-value p-value
## (Intercept) 0.1258579 0.1274079 1900 0.9878346  0.3234
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.489280944 -0.612660509  0.004013223  0.621295100  4.750794513 
## 
## Number of Observations: 2000
## Number of Groups: 100

Looking specifically at the variance structures …

VarCorr(model.01)
## id = pdSymm(1) 
##             Variance  StdDev   
## (Intercept) 1.6160662 1.2712459
## Residual    0.1442217 0.3797654

We see that \(\sigma^{2}_{u0g} = \sigma^{2}_{u0} = 1.6160662\),
and that \(\sigma^{2}_{eg} = \sigma^{2}_{e} = 0.1442217\).

Second, heterogeneity in between-peson residuals (Level-2 random effects).

This is a model of heterogeneous between-person variance and common within person variance. Note that in addition to changing the form = we also explicitly specify that the \(\mathbf{G}\) matrix should be diagonal with the pdDiag() specification (positive-definite Diagonal). This gets us the set-up we want (see above), \[\mathbf{G} = \left[\begin{array} {rr} \sigma^{2}_{u0g=0} & 0 \\ 0 & \sigma^{2}_{u0g=1} \end{array}\right]\]

# Heterogeneity for between variance and common within person variance
model.01b = lme(fixed = y_ti ~ 1,  
                random = list(id = pdDiag(form = ~ factor(group_i))),
                        data = sim_nogrow,
                    method = 'REML')
summary(model.01b)
## Linear mixed-effects model fit by REML
##  Data: sim_nogrow 
##        AIC      BIC    logLik
##   2104.694 2127.096 -1048.347
## 
## Random effects:
##  Formula: ~factor(group_i) | id
##  Structure: Diagonal
##          (Intercept) factor(group_i)1  Residual
## StdDev: 2.447866e-05         1.798439 0.3780477
## 
## Fixed effects: y_ti ~ 1 
##                     Value  Std.Error   DF     t-value p-value
## (Intercept) -0.0002883882 0.01194176 1900 -0.02414955  0.9807
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.525619386 -0.631708514  0.003199987  0.624580177  4.787642278 
## 
## Number of Observations: 2000
## Number of Groups: 100

Looking specifically at the variance and “correlation” structures …

VarCorr(model.01b)
## id = pdDiag(factor(group_i)) 
##                  Variance     StdDev      
## (Intercept)      5.992050e-10 2.447866e-05
## factor(group_i)1 3.234381e+00 1.798439e+00
## Residual         1.429200e-01 3.780477e-01

We see that …
\(\sigma^{2}_{u0g=0} = 0.00\), \(\sigma^{2}_{u0g=1} = 3.23\),
and that \(\sigma^{2}_{eg} = \sigma^{2}_{e} = 0.149\).

Ok - so hopefully the idea of variance heterogeneity in the between-person random effects is clear.

Lets turn to the within-person random effects.

Third, heterogeneity in within-person residual variance (Level-1 random effects).

This is a model of heterogeneous between-person variance and heterogeneous within person variance. We do this using the weights = formula in the same way that we did for the dyadic models, in order that we can get the desired structure … \[\mathbf{R} = \left[\begin{array} {rr} \sigma^{2}_{eg=0} & 0 \\ 0 & \sigma^{2}_{eg=1} \end{array}\right]\]

Again note the diagonal structure, meaning that we need the weights =part of the code, but we do not need the correlation = part of the code we used in the dyadic models.

Our expanded model is specified as …

# Heterogeneity for both between and within person variance
model.01c = lme(fixed = y_ti ~ 1,  
                random = list(id = pdDiag(form = ~ factor(group_i))),
                weights = varIdent(form = ~ 1 | factor(group_i)),
                        data = sim_nogrow,
                    method = 'REML')
summary(model.01c)
## Linear mixed-effects model fit by REML
##  Data: sim_nogrow 
##        AIC      BIC   logLik
##   2005.756 2033.758 -997.878
## 
## Random effects:
##  Formula: ~factor(group_i) | id
##  Structure: Diagonal
##          (Intercept) factor(group_i)1  Residual
## StdDev: 4.500199e-05         1.797808 0.3145904
## 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | factor(group_i) 
##  Parameter estimates:
##       0       1 
## 1.00000 1.38244 
## Fixed effects: y_ti ~ 1 
##                     Value   Std.Error   DF     t-value p-value
## (Intercept) -0.0004595558 0.009940638 1900 -0.04623001  0.9631
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.058631812 -0.651976599  0.003777448  0.649732254  4.157940315 
## 
## Number of Observations: 2000
## Number of Groups: 100

The between-person variances are given by the upper portion of …

VarCorr(model.01c)
## id = pdDiag(factor(group_i)) 
##                  Variance     StdDev      
## (Intercept)      2.025179e-09 4.500199e-05
## factor(group_i)1 3.232113e+00 1.797808e+00
## Residual         9.896710e-02 3.145904e-01

For the within-person variances we need to do some multiplication (and squaring to get variances). The Residual StdDev is here ..

summary(model.01c)$sigma
## [1] 0.3145904

and the weights are here … (yeah, I’m not sure why they are so hard to extract)

coef(model.01c$modelStruct$varStruct, unconstrained=FALSE)
##       1 
## 1.38244

So the estimated Residual variance for group = 0, \(\sigma^{2}_{eg=0}\) is … (true value = .10)

(summary(model.01c)$sigma*1.0000)^2
## [1] 0.0989671

and the estimated variance for group = 1, \(\sigma^{2}_{eg=1}\) is … (true value = .20)

(summary(model.01c)$sigma*coef(model.01c$modelStruct$varStruct, uncons=FALSE))^2
##       1 
## 0.18914

Yay! - now we expanded the original model to include both heterogenous between-person variances and heterogeneous within-person variances.

To formally test if they heterogenous variances are needed we would do a log-likelihood test, e.g., using the anova() function.

anova(model.01, model.01b)
##           Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## model.01      1  3 2351.889 2368.690 -1172.944                        
## model.01b     2  4 2104.694 2127.096 -1048.347 1 vs 2 249.1944  <.0001
anova(model.01b, model.01c)
##           Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## model.01b     1  4 2104.694 2127.096 -1048.347                        
## model.01c     2  5 2005.756 2033.758  -997.878 1 vs 2 100.9382  <.0001

Those tests indicate that the more complex models are significantly better for these data.

Continuous Predictor of Heterogenous Variance

In the above we accommodated group (i.e., categorical) differences in between-person and within-person variances. We can also model the variances as a function of a continuous variable. For example, \[\sigma^{2}_{ei} = f(x_{i})\], where \(x_{i}\) is an individual differences variable (e.g., sex, neuroticism).

Lets look at what some data might look like.

Formally, the model is \[y_{it} = \beta_{0i} + e_{it}\] \[\beta_{0i} = \gamma_{00} + \gamma_{01}x_i + u_{0i}\]

where \[\mathbf{e_{it}} \sim \mathcal{N}(0,\mathbf{R})\], and \[\mathbf{u_{i}} \sim \mathcal{N}(0,\mathbf{G})\].

We assume the between-person residuals in intercepts are homogenous, \(\mathbf{G}\). In this case it is just a 1 x 1 matrix. \[\mathbf{G} = \left[\begin{array} {r} \sigma^{2}_{u0} \end{array}\right]\] and the within-person residuals are heterogeneous, so that
\[\mathbf{R} = \mathbf{I} \left[\begin{array} {r} \sigma^{2}_{ei} \end{array}\right]\] where, here in lme() … see ?varExp \[\sigma^{2}_{ei} = \alpha_{0}^{2}exp(2\alpha_{1}x_i)\] (Yes, the details of the exact implementations are confusing, and not very clear in any book or paper. One has to dig a bit to find that in lme() there is a multiplication by 2. A good, practical overview piece is needed. )

Our expanded model changes the specification in the weights = option to …

# Continuous predictor for within person variance, common between-person differences
model.02a = lme(fixed = y_ti ~ 1 + x_i,  
                random = list(id = pdSymm(form = ~ 1)),
                weights = varExp(form = ~ x_i),
                        data = sim_nogrow[which(sim_nogrow$id <=100),],
                    method = 'REML')
summary(model.02a)
## Linear mixed-effects model fit by REML
##  Data: sim_nogrow[which(sim_nogrow$id <= 100), ] 
##       AIC     BIC    logLik
##   41418.5 41446.5 -20704.25
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept)  Residual
## StdDev:  0.01894279 0.3058026
## 
## Variance function:
##  Structure: Exponential of variance covariate
##  Formula: ~x_i 
##  Parameter estimates:
##     expon 
## 0.2003149 
## Fixed effects: y_ti ~ 1 + x_i 
##                Value  Std.Error   DF   t-value p-value
## (Intercept) 0.034557 0.07712814 1900   0.44804  0.6542
## x_i         4.008449 0.01950011   98 205.56030  0.0000
##  Correlation: 
##     (Intr)
## x_i -0.775
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.101613649 -0.653626272 -0.002955521  0.668324110  4.117217647 
## 
## Number of Observations: 2000
## Number of Groups: 100

The residual between-person variances are given by the upper portion of …

VarCorr(model.02a)
## id = pdSymm(1) 
##             Variance     StdDev    
## (Intercept) 0.0003588295 0.01894279
## Residual    0.0935152258 0.30580259

Specifically, residual varaince in intercepts unrelated to $x_i* are … (true value = 0)

VarCorr(model.02a)[1]
## [1] "0.0003588295"

For the within-person variances we need to do some multiplication (and squaring to get variances). The Residual Std Deviation, \(\alpha_{0}\) is here … (true value for \(\alpha_{0}^2\) = 0.1)

summary(model.02a)$sigma
## [1] 0.3058026

And the parameter in the exponential variance function is here … (true value = 0.2)

coef(model.02a$modelStruct$varStruct, uncons=FALSE)
##     expon 
## 0.2003149

So for a given value of x_i we expect the residual within-person variance to be … \[\sigma^{2}_{ei} = \alpha_{0}^{2}exp(2\alpha_{1}x_i)\]

x_i <- as.matrix(1:50)
varpred <- summary(model.02a)$sigma^2 * exp(coef(2*model.02a$modelStruct$varStruct, uncons=FALSE)*x_i)
head(varpred,10)
##            [,1]
##  [1,] 0.1395962
##  [2,] 0.2083843
##  [3,] 0.3110687
##  [4,] 0.4643524
##  [5,] 0.6931688
##  [6,] 1.0347379
##  [7,] 1.5446201
##  [8,] 2.3057543
##  [9,] 3.4419485
## [10,] 5.1380190

And how does that compare to the true values in our simulation?

head(personframe,10)
##    x_i group_i b0_i   sig2_ei
## 1    1       0    4 0.1491825
## 2    2       0    8 0.2225541
## 3    3       0   12 0.3320117
## 4    4       0   16 0.4953032
## 5    5       0   20 0.7389056
## 6    6       0   24 1.1023176
## 7    7       0   28 1.6444647
## 8    8       0   32 2.4532530
## 9    9       0   36 3.6598234
## 10  10       0   40 5.4598150

A bit underestimated, but pretty close.

We can make a plot of how the residual variances are related to the predictor

#make a data frame with predictions and log predictions
plotvars <- as.data.frame(cbind(x_i,varpred))
names(plotvars) <- c("x_i","varpred")
plotvars$logvarpred <- log(varpred)
#plotting
ggplot(data = plotvars[1:10,], aes(x=x_i,y = varpred), legend=FALSE) +
  geom_point() +
  geom_line() + 
  xlab("x") + 
  ylab("Predicted IIV (residual variance)") + #ylim(-5,5) +
  scale_x_continuous(breaks=seq(0,10,by=1)) 

Side note:

Why did we do this with an exponential function? Same as Poisson, the predicted scores should not go below zero (we cannot have negative variances) So we transform to obtain a linear equation. \[log(\sigma^{2}_{ei}) = \alpha_{0}^{2} + 2\alpha_{1}x_i\] In the log space we get …

ggplot(data = plotvars[1:10,], aes(x=x_i,y = logvarpred), legend=FALSE) +
  #geom_rect(mapping=aes(xmin=day-.5, xmax=day+.5, ymin=1, ymax=7, fill=stress), alpha=0.8) +
  geom_point() +
  geom_line() + 
  xlab("x") + 
  ylab("Predicted IIV (residual variance)") + #ylim(-5,5) +
  scale_x_continuous(breaks=seq(0,10,by=1)) 

Interim Thoughts …

Now have the possibility to relax the homogeneity of variance assumption. And this leads us into a world where we may be able to do some more precise testing of how between-person difference characteristics are related to IIV.
We have to be careful with how we construct the models, and although lme() does not make it super easy to test for significance we can probably figure all the details out. Notably, lme() actually allows quite a few different variance functions (varFunc), so there is a rich set of potential models to be explored. The most detailed information I have found so far on how to do this in R is Chapter 7 of Galecki, A. & Burzykowski, T. (2013). Linear mixed-effects models using R. A step-by-step approach. Springer, New York. There are a number of papers that need to be written on this topic, and how these models can be applied to experience sampling data.

3. Empirical Examples

Let’s try a real data example. We use an expanded set of data from AMIB (I’ve added some between-person variables in another file).

Read in the data

#Daily data
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIBbrief_raw_daily1.csv"
#read in the .csv file using the url() function
daily <- read.csv(file=url(filepath),header=TRUE)

#Little bit of clean-up
var.names.daily <- tolower(colnames(daily))
colnames(daily)<-var.names.daily

#subsetting data
daily1 <- daily[,c("id","day","posaff","negaff")]

#Person data
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIB_person1.csv"
#read in the .csv file using the url() function
person <- read.csv(file=url(filepath),header=TRUE)

#make a centered variable
person$bfi_Nc <- person$bfi_N - mean(person$bfi_N, na.rm=TRUE)
#making a factor variable
person$sex_f <- factor(person$sex, labels = c("male","female"))
#str(person$sex_f)

#Merging person and daily together
data.all <- merge(person,daily1,by="id")

# Examine first few rows of the data set
head(data.all,10)
##     id sex bfi_N     bfi_Nc  sex_f day posaff negaff
## 1  101   1     2 -0.9815789 female   0    3.9    3.0
## 2  101   1     2 -0.9815789 female   1    3.8    2.3
## 3  101   1     2 -0.9815789 female   2    5.1    1.0
## 4  101   1     2 -0.9815789 female   3    5.6    1.3
## 5  101   1     2 -0.9815789 female   4    4.3    1.1
## 6  101   1     2 -0.9815789 female   5    3.9    1.0
## 7  101   1     2 -0.9815789 female   6    5.1    1.2
## 8  101   1     2 -0.9815789 female   7    4.8    1.1
## 9  102   1     2 -0.9815789 female   0    6.3    1.4
## 10 102   1     2 -0.9815789 female   1    7.0    1.6

Ok - lets consider two individual differences (notice that we do this by subsetting to one row per person):
Sex (factor version)

table(data.all[which(data.all$day==0),]$sex_f)
## 
##   male female 
##     65    125

Neuroticism

describe(data.all[which(data.all$day==0), c("bfi_N","bfi_Nc")])
##        vars   n mean   sd median trimmed  mad   min  max range  skew
## bfi_N     1 190 2.98 0.96   3.00    3.00 1.48  1.00 5.00     4 -0.09
## bfi_Nc    2 190 0.00 0.96   0.02    0.02 1.48 -1.98 2.02     4 -0.09
##        kurtosis   se
## bfi_N     -0.82 0.07
## bfi_Nc    -0.82 0.07

Formulate Hypotheses.

We can formulate some simple hypotheses about variance heterogeneity based on sterotypes and construct definitions.
A. Females have more variable emotions (e.g., positive affect) than Males.
B. Individuals higher in Neuroticism have higher “intrinsic IIV” in negative affect.

Research Question A: Females have more variable emotions (e.g., positive affect) than Males

We first run a homogenous variance model.

Model 1: Common models for between-person and within-person variance
model.A1 = lme(fixed = posaff ~ 1 + sex_f,
               random = list(id = pdSymm(form = ~ 1)),
               data = data.all,
               na.action = na.exclude,
               method = 'REML')
summary(model.A1)
## Linear mixed-effects model fit by REML
##  Data: data.all 
##        AIC      BIC    logLik
##   4036.608 4057.695 -2014.304
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept)  Residual
## StdDev:   0.6342518 0.8807278
## 
## Fixed effects: posaff ~ 1 + sex_f 
##                 Value  Std.Error   DF  t-value p-value
## (Intercept)  4.352704 0.08889986 1251 48.96187   0e+00
## sex_ffemale -0.369189 0.10938668  188 -3.37509   9e-04
##  Correlation: 
##             (Intr)
## sex_ffemale -0.813
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.34222753 -0.57183337  0.05917556  0.59744941  3.67294434 
## 
## Number of Observations: 1441
## Number of Groups: 190

The between-person variances are given by the upper portion of …

VarCorr(model.A1)
## id = pdSymm(1) 
##             Variance  StdDev   
## (Intercept) 0.4022753 0.6342518
## Residual    0.7756815 0.8807278

Specifically, variance of the intercepts is 0.40.

We then run heterogeneous variance models.

Model 2: Heterogeneous variance for between-person and common within-person variance
model.A2 = lme(fixed = posaff ~ 1 + sex_f,
               random = list(id = pdDiag(form = ~ 0 + sex_f)),
               data = data.all,
               na.action = na.exclude,
               method = 'REML')
summary(model.A2)
## Linear mixed-effects model fit by REML
##  Data: data.all 
##       AIC      BIC    logLik
##   4038.51 4064.869 -2014.255
## 
## Random effects:
##  Formula: ~0 + sex_f | id
##  Structure: Diagonal
##         sex_fmale sex_ffemale Residual
## StdDev: 0.6521647   0.6249233 0.880727
## 
## Fixed effects: posaff ~ 1 + sex_f 
##                 Value  Std.Error   DF  t-value p-value
## (Intercept)  4.352339 0.09089551 1251 47.88288   0.000
## sex_ffemale -0.368757 0.11058864  188 -3.33450   0.001
##  Correlation: 
##             (Intr)
## sex_ffemale -0.822
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.33591533 -0.57396576  0.06172524  0.60158443  3.67149248 
## 
## Number of Observations: 1441
## Number of Groups: 190

The between-person variances are given by the upper portion of …

VarCorr(model.A2)
## id = pdDiag(0 + sex_f) 
##             Variance  StdDev   
## sex_fmale   0.4253188 0.6521647
## sex_ffemale 0.3905291 0.6249233
## Residual    0.7756801 0.8807270

Hmmm… the variances are …
\(\sigma^{2}_{u0g=male} = 0.425\),
\(\sigma^{2}_{u0g=female} = 0.391\),
These are not so different. Lets check the relative fits of the two models

anova(model.A1,model.A2)
##          Model df      AIC      BIC    logLik   Test    L.Ratio p-value
## model.A1     1  4 4036.608 4057.695 -2014.304                          
## model.A2     2  5 4038.510 4064.869 -2014.255 1 vs 2 0.09745325  0.7549

Not different - we go with the more parsimonious model for the between-person variance.

Model 3: Common between-person variance and heterogeneous within-person variance
model.A3 = lme(fixed = posaff ~ 1 + sex_f,
               random = list(id = pdSymm(form = ~ 1)),
               weights = varIdent(form = ~ 1 | sex_f),
               data = data.all,
               na.action = na.exclude,
               method = 'REML')
summary(model.A3)
## Linear mixed-effects model fit by REML
##  Data: data.all 
##        AIC     BIC    logLik
##   4027.781 4054.14 -2008.891
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept)  Residual
## StdDev:   0.6355109 0.9196675
## 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | sex_f 
##  Parameter estimates:
##   female     male 
## 1.000000 0.868614 
## Fixed effects: posaff ~ 1 + sex_f 
##                 Value  Std.Error   DF  t-value p-value
## (Intercept)  4.351398 0.08738773 1251 49.79415   0e+00
## sex_ffemale -0.367695 0.10856625  188 -3.38682   9e-04
##  Correlation: 
##             (Intr)
## sex_ffemale -0.805
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.47441843 -0.57332917  0.06120628  0.60932794  3.51348142 
## 
## Number of Observations: 1441
## Number of Groups: 190

The between-person variances are given by the upper portion of …

VarCorr(model.A3)
## id = pdSymm(1) 
##             Variance  StdDev   
## (Intercept) 0.4038741 0.6355109
## Residual    0.8457884 0.9196675

The common between-person variance is …
\(\sigma^{2}_{u0} = 0.404\),

For the within-person variances we need to do some multiplication (and squaring to get variances). The Residual StdDev is here ..

summary(model.A3)$sigma
## [1] 0.9196675

and the weights are here … (yeah, I’m not sure why they are so hard to extract) And be careful about the labels - they are easily missed.

coef(model.A3$modelStruct$varStruct, unconstrained=FALSE)
##     male 
## 0.868614

So the estimated Residual variance for group = female, \(\sigma^{2}_{eg=female}\) is …

(summary(model.A3)$sigma*1.0000)^2
## [1] 0.8457884

and the estimated variance for group = male, \(\sigma^{2}_{eg=male}\) is …

(summary(model.A3)$sigma*coef(model.A3$modelStruct$varStruct, uncons=FALSE))^2
##      male 
## 0.6381392

Those look like they may be different, with females having greater within-person variance than males.

Lets check the relative fits of the two models before deciding about our hypothesis.

anova(model.A1,model.A3)
##          Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## model.A1     1  4 4036.608 4057.695 -2014.304                        
## model.A3     2  5 4027.781 4054.140 -2008.891 1 vs 2 10.82627   0.001

Yes! There is a significant difference in fit between the models. The heterogeneity in within-person variance is needed. We conclude that there is a significant difference in IIV between males and females. In this sample, females have more IIV in daily positive affect than males.

Research Question B: Individuals higher in Neuroticism have higher “intrinsic IIV” in negative affect

We again start with a base model.

Model 1: Common models for between-person and within-person variance
model.B1 = lme(fixed = negaff ~ 1 + bfi_Nc,
               random = list(id = pdSymm(form = ~ 1)),
               data = data.all,
               na.action = na.exclude,
               method = 'REML')
summary(model.B1)
## Linear mixed-effects model fit by REML
##  Data: data.all 
##        AIC      BIC    logLik
##   3820.718 3841.805 -1906.359
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept)  Residual
## StdDev:   0.6055919 0.8139183
## 
## Fixed effects: negaff ~ 1 + bfi_Nc 
##                 Value  Std.Error   DF  t-value p-value
## (Intercept) 2.4640809 0.04913707 1251 50.14709       0
## bfi_Nc      0.2642412 0.05149343  188  5.13155       0
##  Correlation: 
##        (Intr)
## bfi_Nc 0.006 
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.8045055 -0.6157753 -0.1600364  0.4738359  3.8266991 
## 
## Number of Observations: 1441
## Number of Groups: 190

The between-person variances are given by the upper portion of …

VarCorr(model.B1)
## id = pdSymm(1) 
##             Variance  StdDev   
## (Intercept) 0.3667416 0.6055919
## Residual    0.6624629 0.8139183

We skip Model 2 as its not clear how a continuous predictor works in this specific case.

Model 3: Common between-person variance and heterogeneous within-person variance
model.B3 = lme(fixed = negaff ~ 1 + bfi_Nc,
               random = list(id = pdSymm(form = ~ 1)),
               weights = varExp(form = ~ bfi_Nc),
               data = data.all,
               na.action = na.exclude,
               method = 'REML')
summary(model.B3)
## Linear mixed-effects model fit by REML
##  Data: data.all 
##        AIC      BIC    logLik
##   3798.319 3824.677 -1894.159
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept) Residual
## StdDev:   0.6031915 0.807518
## 
## Variance function:
##  Structure: Exponential of variance covariate
##  Formula: ~bfi_Nc 
##  Parameter estimates:
##      expon 
## 0.09838318 
## Fixed effects: negaff ~ 1 + bfi_Nc 
##                 Value  Std.Error   DF  t-value p-value
## (Intercept) 2.4634419 0.04900460 1251 50.26960       0
## bfi_Nc      0.2617456 0.05133223  188  5.09905       0
##  Correlation: 
##        (Intr)
## bfi_Nc 0.042 
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.8296045 -0.6260128 -0.1640791  0.4841943  4.0175758 
## 
## Number of Observations: 1441
## Number of Groups: 190

The between-person variances are given by the upper portion of …

VarCorr(model.B3)
## id = pdSymm(1) 
##             Variance  StdDev   
## (Intercept) 0.3638400 0.6031915
## Residual    0.6520853 0.8075180

For the within-person variances we need to do some multiplication (and squaring to get variances). The Residual StdDev, \(\sqrt{\alpha_{0}}\) is here …

summary(model.B3)$sigma
## [1] 0.807518

And the parameter in the exponential variance function is here …

coef(model.B3$modelStruct$varStruct, uncons=FALSE)
##      expon 
## 0.09838318

So for a given value of bfi_Nc we expect the residual within-person variance to be …

bfi_Nc <- seq(-2.5,2.5,.1)
varpred <- summary(model.B3)$sigma^2 * exp(coef(2*model.B3$modelStruct$varStruct, uncons=FALSE)*bfi_Nc)

We can make a plot of how the residual variances are related to the predictor

#make a data frame with predictions and back-transform centered variable to original scale.
plotvars <- as.data.frame(cbind(bfi_Nc,varpred))
names(plotvars) <- c("bfi_Nc","varpred")
plotvars$bfi_N <- plotvars$bfi_Nc + mean(person$bfi_N, na.rm=TRUE)
#plotting
ggplot(data = plotvars, aes(x=bfi_N,y = varpred), legend=FALSE) +
  geom_point() +
  geom_line() + 
  xlab("BFI Neuroticism") + 
  ylab("Predicted IIV in Daily NA (residual variance)") + ylim(0,1)

Lets check the relative fits of the two models before deciding if there is actually a relation between Neuroticism and between-person differences in intraindividual variability.

anova(model.B1,model.B3)
##          Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## model.B1     1  4 3820.718 3841.805 -1906.359                        
## model.B3     2  5 3798.319 3824.677 -1894.159 1 vs 2 24.39873  <.0001

Yes! There is a significant difference in fit between the models. The heterogeneity in within-person variance is needed. We conclude that higher Neuroticism is related to higher IIV in daily Negative Affect.

4. Conclusion

Ok - we have attempted to push into the variance heterogeneity models. These models are not yet widely used in the analysis of experience sampling data. There are still a variety of details that must be worked out and understood before publishing analyses with these models. For example, we still have to figure out how to display the raw data and the results in ways that facilitate intuitive understanding.
But, it is all within reach!

Be Careful! It is easy to get into trouble! Make sure to do some simulations so that you are confident in your interpretation of the parameters. It is not always 2*straightforward.