In the previous tutorials we covered how the multilevel model is used to examine intraindividual covariability. In this tutorial, we outline how an extension, the multilevel model with heterogeneous variance can be used to examine differences in intraindividual variability - which we had previously done in a 2-step way using the iSD.
library(psych)
library(ggplot2)
library(nlme)
We have used two “separate” sets of methods to examine …
1. Intraindividual Variation (calculation of within-person summaries; iSD, iEntropy, iMSSD, etc. following Ram & Gerstorf, 2009)
2. Intraindividual Covariation (multilevel models - following Bolger & Laurenceau, 2013)
In 1., our interest was in the interindividual differences in within-person variability in an outcome variable, \(\sigma_{yi}^2\). In 2., our interest was in the interindividual differences in the within-person association between a predictor and outcome \(\beta_{1i}\).
Here we outline a framework where these two interests can be combined together into a single model …. Multilevel Model with Heterogeneous Variance.
Like with multilevel modeling itself, the framework is referred to by different names in different fields. The larger class of models includes models referred to as … Variance Heterogeneity models, Location-Scale models, Generalized Additive Models for Location, Scale and Shape (GAMLSS, http://www.gamlss.org). It is not totally clear if there are some nuanced differences among the various different models. However, as far as I can tell, all of these model variants are applied to longitudinal panel-type data (multiple persons, multiple occassions).
A parallel form of the model exists for time-series data, ARCH models (auto-regressive models with conditional heteroscedasticity; https://en.wikipedia.org/wiki/Autoregressive_conditional_heteroskedasticity). There are lots of variants there including GARCH, NGARCH, IGARCH, EGARCH, GARCH-M, QGARCH, TGARCH. All of these model variants are applied to time-series data (single entity, many many occassions). I am under the impression that not many people have attempted to form links between these two different traditions, but have not yet looked into the literature closely.
Much of what we explore today is covered in Hedeker, Mermelstein, Berbaum, & Campbell (2009), Hedeker & Mermelstein (2007), and Hoffman (2007). Some more recent extensions are in Hedeker, Mermelstein & Demirtas (2012), and Rast, Hofer, & Sparks (2012).
A few definitions from Wikipedia (which may or may not be useful today, but are part of our larger effort to understand how we can make use of all these models).
Location parameter: https://en.wikipedia.org/wiki/Location_parameter
In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter \(x_0\), which determines the “location” or shift of the distribution. Examples of location parameters include the mean, the median, and the mode.
Scale parameters: https://en.wikipedia.org/wiki/Scale_parameter
In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution.
The normal distribution has two parameters: a location parameter \(\mu\) and a scale parameter \(\sigma\). In practice the normal distribution is often parameterized in terms of the squared scale \(\sigma^2\), which corresponds to the variance of the distribution.
Here you can see location and scale of the distribution changing over time (image from http://www.gamlss.org)
Shape parameters: https://en.wikipedia.org/wiki/Shape_parameter
A shape parameter is any parameter of a probability distribution that is neither a location parameter nor a scale parameter (nor a function of either or both of these only, such as a rate parameter). Such a parameter must affect the shape of a distribution rather than simply shifting it (as a location parameter does) or stretching/shrinking it (as a scale parameter does). Distributions with a shape parameter include skew normal, gamma, Weibull, ExGaussian distributions.
Typically, the multilevel models we use (and that are covered in B&L) make a homogeneity of variance assumption. For example, lets look at the basic “unconditional means” model.
\[y_{it} = \beta_{0i} + e_{it}\] \[\beta_{0i} = \gamma_{00} + u_{0i}\]
where \[\mathbf{e_{it}} \sim \mathcal{N}(0,\mathbf{R})\], and \[\mathbf{u_{i}} \sim \mathcal{N}(0,\mathbf{G})\].
For clarity, lets write out the full variance covariance matrix of the between-person residuals, \(\mathbf{G}\). In this case it is just a 1 x 1 matrix. \[\mathbf{G} = \left[\begin{array} {r} \sigma^{2}_{u0} \end{array}\right]\] Embedded here is the assumption that the between-person variance characterizes the entire sample (population). There are no subgroups. Later, we will relax this assumption, and allow that \(\sigma^{2}_{u0g}\) can differ across “groups” of individuals (e.g., Males, Females).
For clarity, lets also write out the full variance covariance matrix of the within-person residuals, \(\mathbf{R}\) (spanning across the t = 8 days of observation in the example data), \[\mathbf{R} = \mathbf{I} \left[\begin{array} {r} \sigma^{2}_{e} \end{array}\right] = \left[\begin{array} {rrrrrrrr} \sigma^{2}_{e} & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \sigma^{2}_{e} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \sigma^{2}_{e} & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & \sigma^{2}_{e} & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & \sigma^{2}_{e} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & \sigma^{2}_{e} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & \sigma^{2}_{e} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & \sigma^{2}_{e} \end{array}\right]\], The homoscedasticity of errors applies across persons and occasions. There are no subgroups. Later, we will relax this assumption, and allow that \(\sigma^{2}_{eg}\) can differ across g “groups” of individuals (e.g., males, females).
Here is a little plot of some simulated data that follow this model.
From this plot we can see …
* between-person differences in \(u_{0i}\),
* homogeneity of within-person variance \(\sigma^{2}_{e}\)
The objective of this model is to relax the homogeneity of variance assumption, and we can do this at both the between-person level, allowing for group or individual differences in \(\sigma^{2}_{u0g}\), and/or at the within-person level, allowing for group or individual differences in \(\sigma^{2}_{eg}\). We expand the basic “unconditional means” model to illustrate.
\[y_{it} = \beta_{0i} + e_{it}\] \[\beta_{0i} = \gamma_{00} + u_{0i}\] where \[\mathbf{e_{it}} \sim \mathcal{N}(0,\mathbf{R})\], and \[\mathbf{u_{i}} \sim \mathcal{N}(0,\mathbf{G})\]. That is all the same as usual.
But now lets add some complexity. Rather than the previous assumption that \(\sigma^{2}_{u0g} = \sigma^{2}_{u0}\), lets allow that \(\sigma^{2}_{u0g} = f(g_{i})\), where \(g_{i}\) is a grouping variable (Male, Female).
This allows that \(\sigma^{2}_{u0g=1} \neq \sigma^{2}_{u0g=2}\).
Formally, the full variance covariance matrix of the between-person residuals is now,
\[\mathbf{G} =
\left[\begin{array}
{rr}
\sigma^{2}_{u0g=1} & 0 \\
0 & \sigma^{2}_{u0g=2}
\end{array}\right]\] There are now, explicitly, two variances \(\sigma^{2}_{u0g=1}\) and \(\sigma^{2}_{u0g=2}\), with the off-diagonal covariance being zero because individuals are only in one group. Lets look at some example data.
Note how the extent of between-person differences in \(u_{0i}\) is different in the g = 1 = red and g = 2 = green groups. Visually, the difference in the group variances map to
\[\mathbf{G} =
\left[\begin{array}
{rr}
\sigma^{2}_{u0g=1}=0 & 0 \\
0 & \sigma^{2}_{u0g=2}=1
\end{array}\right]\]
Let’s set up an analysis model. We use the same lme()
function from the nlme
package that we have been using, and simply make some adjustments (some are similar to what we did for the dyadic models).
Specific for the heterogeneity in between-person differences we make adjustments to the random = formula
part of the script.
This is a model of “common” between-person variance and “common” within person variance, where “common” means that these variances are equivalent across groups, \(\sigma^{2}_{u0g} = \sigma^{2}_{u0}\).
# Common models for between and within person variance
model.01 = lme(fixed = y_ti ~ 1,
random = ~ 1 | id,
data = sim_nogrow,
method = 'REML')
summary(model.01)
## Linear mixed-effects model fit by REML
## Data: sim_nogrow
## AIC BIC logLik
## 2351.889 2368.69 -1172.944
##
## Random effects:
## Formula: ~1 | id
## (Intercept) Residual
## StdDev: 1.271246 0.3797654
##
## Fixed effects: y_ti ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 0.1258579 0.1274079 1900 0.9878346 0.3234
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.489280944 -0.612660509 0.004013223 0.621295100 4.750794513
##
## Number of Observations: 2000
## Number of Groups: 100
Now, wihtout changing the model, we are going to write the random = formula
as a list()
. The exact same model is … where pdSymm()
indicates that the random effects are structured as a positive-definite Symmetric matrix.
# Common models for between and within person variance
model.01 = lme(fixed = y_ti ~ 1,
random = list(id = pdSymm(form = ~ 1)),
data = sim_nogrow,
method = 'REML')
summary(model.01)
## Linear mixed-effects model fit by REML
## Data: sim_nogrow
## AIC BIC logLik
## 2351.889 2368.69 -1172.944
##
## Random effects:
## Formula: ~1 | id
## (Intercept) Residual
## StdDev: 1.271246 0.3797654
##
## Fixed effects: y_ti ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 0.1258579 0.1274079 1900 0.9878346 0.3234
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.489280944 -0.612660509 0.004013223 0.621295100 4.750794513
##
## Number of Observations: 2000
## Number of Groups: 100
Looking specifically at the variance structures …
VarCorr(model.01)
## id = pdSymm(1)
## Variance StdDev
## (Intercept) 1.6160662 1.2712459
## Residual 0.1442217 0.3797654
We see that \(\sigma^{2}_{u0g} = \sigma^{2}_{u0} = 1.6160662\),
and that \(\sigma^{2}_{eg} = \sigma^{2}_{e} = 0.1442217\).
This is a model of heterogeneous between-person variance and common within person variance. Note that in addition to changing the form =
we also explicitly specify that the \(\mathbf{G}\) matrix should be diagonal with the pdDiag()
specification (positive-definite Diagonal). This gets us the set-up we want (see above), \[\mathbf{G} =
\left[\begin{array}
{rr}
\sigma^{2}_{u0g=0} & 0 \\
0 & \sigma^{2}_{u0g=1}
\end{array}\right]\]
# Heterogeneity for between variance and common within person variance
model.01b = lme(fixed = y_ti ~ 1,
random = list(id = pdDiag(form = ~ factor(group_i))),
data = sim_nogrow,
method = 'REML')
summary(model.01b)
## Linear mixed-effects model fit by REML
## Data: sim_nogrow
## AIC BIC logLik
## 2104.694 2127.096 -1048.347
##
## Random effects:
## Formula: ~factor(group_i) | id
## Structure: Diagonal
## (Intercept) factor(group_i)1 Residual
## StdDev: 2.447866e-05 1.798439 0.3780477
##
## Fixed effects: y_ti ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) -0.0002883882 0.01194176 1900 -0.02414955 0.9807
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.525619386 -0.631708514 0.003199987 0.624580177 4.787642278
##
## Number of Observations: 2000
## Number of Groups: 100
Looking specifically at the variance and “correlation” structures …
VarCorr(model.01b)
## id = pdDiag(factor(group_i))
## Variance StdDev
## (Intercept) 5.992050e-10 2.447866e-05
## factor(group_i)1 3.234381e+00 1.798439e+00
## Residual 1.429200e-01 3.780477e-01
We see that …
\(\sigma^{2}_{u0g=0} = 0.00\), \(\sigma^{2}_{u0g=1} = 3.23\),
and that \(\sigma^{2}_{eg} = \sigma^{2}_{e} = 0.149\).
Ok - so hopefully the idea of variance heterogeneity in the between-person random effects is clear.
Lets turn to the within-person random effects.
This is a model of heterogeneous between-person variance and heterogeneous within person variance. We do this using the weights =
formula in the same way that we did for the dyadic models, in order that we can get the desired structure … \[\mathbf{R} =
\left[\begin{array}
{rr}
\sigma^{2}_{eg=0} & 0 \\
0 & \sigma^{2}_{eg=1}
\end{array}\right]\]
Again note the diagonal structure, meaning that we need the weights =
part of the code, but we do not need the correlation =
part of the code we used in the dyadic models.
Our expanded model is specified as …
# Heterogeneity for both between and within person variance
model.01c = lme(fixed = y_ti ~ 1,
random = list(id = pdDiag(form = ~ factor(group_i))),
weights = varIdent(form = ~ 1 | factor(group_i)),
data = sim_nogrow,
method = 'REML')
summary(model.01c)
## Linear mixed-effects model fit by REML
## Data: sim_nogrow
## AIC BIC logLik
## 2005.756 2033.758 -997.878
##
## Random effects:
## Formula: ~factor(group_i) | id
## Structure: Diagonal
## (Intercept) factor(group_i)1 Residual
## StdDev: 4.500199e-05 1.797808 0.3145904
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | factor(group_i)
## Parameter estimates:
## 0 1
## 1.00000 1.38244
## Fixed effects: y_ti ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) -0.0004595558 0.009940638 1900 -0.04623001 0.9631
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.058631812 -0.651976599 0.003777448 0.649732254 4.157940315
##
## Number of Observations: 2000
## Number of Groups: 100
The between-person variances are given by the upper portion of …
VarCorr(model.01c)
## id = pdDiag(factor(group_i))
## Variance StdDev
## (Intercept) 2.025179e-09 4.500199e-05
## factor(group_i)1 3.232113e+00 1.797808e+00
## Residual 9.896710e-02 3.145904e-01
For the within-person variances we need to do some multiplication (and squaring to get variances). The Residual StdDev is here ..
summary(model.01c)$sigma
## [1] 0.3145904
and the weights are here … (yeah, I’m not sure why they are so hard to extract)
coef(model.01c$modelStruct$varStruct, unconstrained=FALSE)
## 1
## 1.38244
So the estimated Residual variance for group = 0, \(\sigma^{2}_{eg=0}\) is … (true value = .10)
(summary(model.01c)$sigma*1.0000)^2
## [1] 0.0989671
and the estimated variance for group = 1, \(\sigma^{2}_{eg=1}\) is … (true value = .20)
(summary(model.01c)$sigma*coef(model.01c$modelStruct$varStruct, uncons=FALSE))^2
## 1
## 0.18914
Yay! - now we expanded the original model to include both heterogenous between-person variances and heterogeneous within-person variances.
To formally test if they heterogenous variances are needed we would do a log-likelihood test, e.g., using the anova()
function.
anova(model.01, model.01b)
## Model df AIC BIC logLik Test L.Ratio p-value
## model.01 1 3 2351.889 2368.690 -1172.944
## model.01b 2 4 2104.694 2127.096 -1048.347 1 vs 2 249.1944 <.0001
anova(model.01b, model.01c)
## Model df AIC BIC logLik Test L.Ratio p-value
## model.01b 1 4 2104.694 2127.096 -1048.347
## model.01c 2 5 2005.756 2033.758 -997.878 1 vs 2 100.9382 <.0001
Those tests indicate that the more complex models are significantly better for these data.
In the above we accommodated group (i.e., categorical) differences in between-person and within-person variances. We can also model the variances as a function of a continuous variable. For example, \[\sigma^{2}_{ei} = f(x_{i})\], where \(x_{i}\) is an individual differences variable (e.g., sex, neuroticism).
Lets look at what some data might look like.
Formally, the model is \[y_{it} = \beta_{0i} + e_{it}\] \[\beta_{0i} = \gamma_{00} + \gamma_{01}x_i + u_{0i}\]
where \[\mathbf{e_{it}} \sim \mathcal{N}(0,\mathbf{R})\], and \[\mathbf{u_{i}} \sim \mathcal{N}(0,\mathbf{G})\].
We assume the between-person residuals in intercepts are homogenous, \(\mathbf{G}\). In this case it is just a 1 x 1 matrix. \[\mathbf{G} =
\left[\begin{array}
{r}
\sigma^{2}_{u0}
\end{array}\right]\] and the within-person residuals are heterogeneous, so that
\[\mathbf{R} =
\mathbf{I}
\left[\begin{array}
{r}
\sigma^{2}_{ei}
\end{array}\right]\] where, here in lme()
… see ?varExp
\[\sigma^{2}_{ei} = \alpha_{0}^{2}exp(2\alpha_{1}x_i)\] (Yes, the details of the exact implementations are confusing, and not very clear in any book or paper. One has to dig a bit to find that in lme()
there is a multiplication by 2. A good, practical overview piece is needed. )
Our expanded model changes the specification in the weights =
option to …
# Continuous predictor for within person variance, common between-person differences
model.02a = lme(fixed = y_ti ~ 1 + x_i,
random = list(id = pdSymm(form = ~ 1)),
weights = varExp(form = ~ x_i),
data = sim_nogrow[which(sim_nogrow$id <=100),],
method = 'REML')
summary(model.02a)
## Linear mixed-effects model fit by REML
## Data: sim_nogrow[which(sim_nogrow$id <= 100), ]
## AIC BIC logLik
## 41418.5 41446.5 -20704.25
##
## Random effects:
## Formula: ~1 | id
## (Intercept) Residual
## StdDev: 0.01894279 0.3058026
##
## Variance function:
## Structure: Exponential of variance covariate
## Formula: ~x_i
## Parameter estimates:
## expon
## 0.2003149
## Fixed effects: y_ti ~ 1 + x_i
## Value Std.Error DF t-value p-value
## (Intercept) 0.034557 0.07712814 1900 0.44804 0.6542
## x_i 4.008449 0.01950011 98 205.56030 0.0000
## Correlation:
## (Intr)
## x_i -0.775
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.101613649 -0.653626272 -0.002955521 0.668324110 4.117217647
##
## Number of Observations: 2000
## Number of Groups: 100
The residual between-person variances are given by the upper portion of …
VarCorr(model.02a)
## id = pdSymm(1)
## Variance StdDev
## (Intercept) 0.0003588295 0.01894279
## Residual 0.0935152258 0.30580259
Specifically, residual varaince in intercepts unrelated to $x_i* are … (true value = 0)
VarCorr(model.02a)[1]
## [1] "0.0003588295"
For the within-person variances we need to do some multiplication (and squaring to get variances). The Residual Std Deviation, \(\alpha_{0}\) is here … (true value for \(\alpha_{0}^2\) = 0.1)
summary(model.02a)$sigma
## [1] 0.3058026
And the parameter in the exponential variance function is here … (true value = 0.2)
coef(model.02a$modelStruct$varStruct, uncons=FALSE)
## expon
## 0.2003149
So for a given value of x_i we expect the residual within-person variance to be … \[\sigma^{2}_{ei} = \alpha_{0}^{2}exp(2\alpha_{1}x_i)\]
x_i <- as.matrix(1:50)
varpred <- summary(model.02a)$sigma^2 * exp(coef(2*model.02a$modelStruct$varStruct, uncons=FALSE)*x_i)
head(varpred,10)
## [,1]
## [1,] 0.1395962
## [2,] 0.2083843
## [3,] 0.3110687
## [4,] 0.4643524
## [5,] 0.6931688
## [6,] 1.0347379
## [7,] 1.5446201
## [8,] 2.3057543
## [9,] 3.4419485
## [10,] 5.1380190
And how does that compare to the true values in our simulation?
head(personframe,10)
## x_i group_i b0_i sig2_ei
## 1 1 0 4 0.1491825
## 2 2 0 8 0.2225541
## 3 3 0 12 0.3320117
## 4 4 0 16 0.4953032
## 5 5 0 20 0.7389056
## 6 6 0 24 1.1023176
## 7 7 0 28 1.6444647
## 8 8 0 32 2.4532530
## 9 9 0 36 3.6598234
## 10 10 0 40 5.4598150
A bit underestimated, but pretty close.
We can make a plot of how the residual variances are related to the predictor
#make a data frame with predictions and log predictions
plotvars <- as.data.frame(cbind(x_i,varpred))
names(plotvars) <- c("x_i","varpred")
plotvars$logvarpred <- log(varpred)
#plotting
ggplot(data = plotvars[1:10,], aes(x=x_i,y = varpred), legend=FALSE) +
geom_point() +
geom_line() +
xlab("x") +
ylab("Predicted IIV (residual variance)") + #ylim(-5,5) +
scale_x_continuous(breaks=seq(0,10,by=1))
Why did we do this with an exponential function? Same as Poisson, the predicted scores should not go below zero (we cannot have negative variances) So we transform to obtain a linear equation. \[log(\sigma^{2}_{ei}) = \alpha_{0}^{2} + 2\alpha_{1}x_i\] In the log space we get …
ggplot(data = plotvars[1:10,], aes(x=x_i,y = logvarpred), legend=FALSE) +
#geom_rect(mapping=aes(xmin=day-.5, xmax=day+.5, ymin=1, ymax=7, fill=stress), alpha=0.8) +
geom_point() +
geom_line() +
xlab("x") +
ylab("Predicted IIV (residual variance)") + #ylim(-5,5) +
scale_x_continuous(breaks=seq(0,10,by=1))
Now have the possibility to relax the homogeneity of variance assumption. And this leads us into a world where we may be able to do some more precise testing of how between-person difference characteristics are related to IIV.
We have to be careful with how we construct the models, and although lme()
does not make it super easy to test for significance we can probably figure all the details out. Notably, lme()
actually allows quite a few different variance functions (varFunc
), so there is a rich set of potential models to be explored. The most detailed information I have found so far on how to do this in R is Chapter 7 of Galecki, A. & Burzykowski, T. (2013). Linear mixed-effects models using R. A step-by-step approach. Springer, New York. There are a number of papers that need to be written on this topic, and how these models can be applied to experience sampling data.
Let’s try a real data example. We use an expanded set of data from AMIB (I’ve added some between-person variables in another file).
#Daily data
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIBbrief_raw_daily1.csv"
#read in the .csv file using the url() function
daily <- read.csv(file=url(filepath),header=TRUE)
#Little bit of clean-up
var.names.daily <- tolower(colnames(daily))
colnames(daily)<-var.names.daily
#subsetting data
daily1 <- daily[,c("id","day","posaff","negaff")]
#Person data
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIB_person1.csv"
#read in the .csv file using the url() function
person <- read.csv(file=url(filepath),header=TRUE)
#make a centered variable
person$bfi_Nc <- person$bfi_N - mean(person$bfi_N, na.rm=TRUE)
#making a factor variable
person$sex_f <- factor(person$sex, labels = c("male","female"))
#str(person$sex_f)
#Merging person and daily together
data.all <- merge(person,daily1,by="id")
# Examine first few rows of the data set
head(data.all,10)
## id sex bfi_N bfi_Nc sex_f day posaff negaff
## 1 101 1 2 -0.9815789 female 0 3.9 3.0
## 2 101 1 2 -0.9815789 female 1 3.8 2.3
## 3 101 1 2 -0.9815789 female 2 5.1 1.0
## 4 101 1 2 -0.9815789 female 3 5.6 1.3
## 5 101 1 2 -0.9815789 female 4 4.3 1.1
## 6 101 1 2 -0.9815789 female 5 3.9 1.0
## 7 101 1 2 -0.9815789 female 6 5.1 1.2
## 8 101 1 2 -0.9815789 female 7 4.8 1.1
## 9 102 1 2 -0.9815789 female 0 6.3 1.4
## 10 102 1 2 -0.9815789 female 1 7.0 1.6
Ok - lets consider two individual differences (notice that we do this by subsetting to one row per person):
Sex (factor version)
table(data.all[which(data.all$day==0),]$sex_f)
##
## male female
## 65 125
Neuroticism
describe(data.all[which(data.all$day==0), c("bfi_N","bfi_Nc")])
## vars n mean sd median trimmed mad min max range skew
## bfi_N 1 190 2.98 0.96 3.00 3.00 1.48 1.00 5.00 4 -0.09
## bfi_Nc 2 190 0.00 0.96 0.02 0.02 1.48 -1.98 2.02 4 -0.09
## kurtosis se
## bfi_N -0.82 0.07
## bfi_Nc -0.82 0.07
We can formulate some simple hypotheses about variance heterogeneity based on sterotypes and construct definitions.
A. Females have more variable emotions (e.g., positive affect) than Males.
B. Individuals higher in Neuroticism have higher “intrinsic IIV” in negative affect.
We first run a homogenous variance model.
model.A1 = lme(fixed = posaff ~ 1 + sex_f,
random = list(id = pdSymm(form = ~ 1)),
data = data.all,
na.action = na.exclude,
method = 'REML')
summary(model.A1)
## Linear mixed-effects model fit by REML
## Data: data.all
## AIC BIC logLik
## 4036.608 4057.695 -2014.304
##
## Random effects:
## Formula: ~1 | id
## (Intercept) Residual
## StdDev: 0.6342518 0.8807278
##
## Fixed effects: posaff ~ 1 + sex_f
## Value Std.Error DF t-value p-value
## (Intercept) 4.352704 0.08889986 1251 48.96187 0e+00
## sex_ffemale -0.369189 0.10938668 188 -3.37509 9e-04
## Correlation:
## (Intr)
## sex_ffemale -0.813
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.34222753 -0.57183337 0.05917556 0.59744941 3.67294434
##
## Number of Observations: 1441
## Number of Groups: 190
The between-person variances are given by the upper portion of …
VarCorr(model.A1)
## id = pdSymm(1)
## Variance StdDev
## (Intercept) 0.4022753 0.6342518
## Residual 0.7756815 0.8807278
Specifically, variance of the intercepts is 0.40.
We then run heterogeneous variance models.
model.A2 = lme(fixed = posaff ~ 1 + sex_f,
random = list(id = pdDiag(form = ~ 0 + sex_f)),
data = data.all,
na.action = na.exclude,
method = 'REML')
summary(model.A2)
## Linear mixed-effects model fit by REML
## Data: data.all
## AIC BIC logLik
## 4038.51 4064.869 -2014.255
##
## Random effects:
## Formula: ~0 + sex_f | id
## Structure: Diagonal
## sex_fmale sex_ffemale Residual
## StdDev: 0.6521647 0.6249233 0.880727
##
## Fixed effects: posaff ~ 1 + sex_f
## Value Std.Error DF t-value p-value
## (Intercept) 4.352339 0.09089551 1251 47.88288 0.000
## sex_ffemale -0.368757 0.11058864 188 -3.33450 0.001
## Correlation:
## (Intr)
## sex_ffemale -0.822
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.33591533 -0.57396576 0.06172524 0.60158443 3.67149248
##
## Number of Observations: 1441
## Number of Groups: 190
The between-person variances are given by the upper portion of …
VarCorr(model.A2)
## id = pdDiag(0 + sex_f)
## Variance StdDev
## sex_fmale 0.4253188 0.6521647
## sex_ffemale 0.3905291 0.6249233
## Residual 0.7756801 0.8807270
Hmmm… the variances are …
\(\sigma^{2}_{u0g=male} = 0.425\),
\(\sigma^{2}_{u0g=female} = 0.391\),
These are not so different. Lets check the relative fits of the two models
anova(model.A1,model.A2)
## Model df AIC BIC logLik Test L.Ratio p-value
## model.A1 1 4 4036.608 4057.695 -2014.304
## model.A2 2 5 4038.510 4064.869 -2014.255 1 vs 2 0.09745325 0.7549
Not different - we go with the more parsimonious model for the between-person variance.
model.A3 = lme(fixed = posaff ~ 1 + sex_f,
random = list(id = pdSymm(form = ~ 1)),
weights = varIdent(form = ~ 1 | sex_f),
data = data.all,
na.action = na.exclude,
method = 'REML')
summary(model.A3)
## Linear mixed-effects model fit by REML
## Data: data.all
## AIC BIC logLik
## 4027.781 4054.14 -2008.891
##
## Random effects:
## Formula: ~1 | id
## (Intercept) Residual
## StdDev: 0.6355109 0.9196675
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | sex_f
## Parameter estimates:
## female male
## 1.000000 0.868614
## Fixed effects: posaff ~ 1 + sex_f
## Value Std.Error DF t-value p-value
## (Intercept) 4.351398 0.08738773 1251 49.79415 0e+00
## sex_ffemale -0.367695 0.10856625 188 -3.38682 9e-04
## Correlation:
## (Intr)
## sex_ffemale -0.805
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.47441843 -0.57332917 0.06120628 0.60932794 3.51348142
##
## Number of Observations: 1441
## Number of Groups: 190
The between-person variances are given by the upper portion of …
VarCorr(model.A3)
## id = pdSymm(1)
## Variance StdDev
## (Intercept) 0.4038741 0.6355109
## Residual 0.8457884 0.9196675
The common between-person variance is …
\(\sigma^{2}_{u0} = 0.404\),
For the within-person variances we need to do some multiplication (and squaring to get variances). The Residual StdDev is here ..
summary(model.A3)$sigma
## [1] 0.9196675
and the weights are here … (yeah, I’m not sure why they are so hard to extract) And be careful about the labels - they are easily missed.
coef(model.A3$modelStruct$varStruct, unconstrained=FALSE)
## male
## 0.868614
So the estimated Residual variance for group = female, \(\sigma^{2}_{eg=female}\) is …
(summary(model.A3)$sigma*1.0000)^2
## [1] 0.8457884
and the estimated variance for group = male, \(\sigma^{2}_{eg=male}\) is …
(summary(model.A3)$sigma*coef(model.A3$modelStruct$varStruct, uncons=FALSE))^2
## male
## 0.6381392
Those look like they may be different, with females having greater within-person variance than males.
Lets check the relative fits of the two models before deciding about our hypothesis.
anova(model.A1,model.A3)
## Model df AIC BIC logLik Test L.Ratio p-value
## model.A1 1 4 4036.608 4057.695 -2014.304
## model.A3 2 5 4027.781 4054.140 -2008.891 1 vs 2 10.82627 0.001
Yes! There is a significant difference in fit between the models. The heterogeneity in within-person variance is needed. We conclude that there is a significant difference in IIV between males and females. In this sample, females have more IIV in daily positive affect than males.
We again start with a base model.
model.B1 = lme(fixed = negaff ~ 1 + bfi_Nc,
random = list(id = pdSymm(form = ~ 1)),
data = data.all,
na.action = na.exclude,
method = 'REML')
summary(model.B1)
## Linear mixed-effects model fit by REML
## Data: data.all
## AIC BIC logLik
## 3820.718 3841.805 -1906.359
##
## Random effects:
## Formula: ~1 | id
## (Intercept) Residual
## StdDev: 0.6055919 0.8139183
##
## Fixed effects: negaff ~ 1 + bfi_Nc
## Value Std.Error DF t-value p-value
## (Intercept) 2.4640809 0.04913707 1251 50.14709 0
## bfi_Nc 0.2642412 0.05149343 188 5.13155 0
## Correlation:
## (Intr)
## bfi_Nc 0.006
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.8045055 -0.6157753 -0.1600364 0.4738359 3.8266991
##
## Number of Observations: 1441
## Number of Groups: 190
The between-person variances are given by the upper portion of …
VarCorr(model.B1)
## id = pdSymm(1)
## Variance StdDev
## (Intercept) 0.3667416 0.6055919
## Residual 0.6624629 0.8139183
We skip Model 2 as its not clear how a continuous predictor works in this specific case.
model.B3 = lme(fixed = negaff ~ 1 + bfi_Nc,
random = list(id = pdSymm(form = ~ 1)),
weights = varExp(form = ~ bfi_Nc),
data = data.all,
na.action = na.exclude,
method = 'REML')
summary(model.B3)
## Linear mixed-effects model fit by REML
## Data: data.all
## AIC BIC logLik
## 3798.319 3824.677 -1894.159
##
## Random effects:
## Formula: ~1 | id
## (Intercept) Residual
## StdDev: 0.6031915 0.807518
##
## Variance function:
## Structure: Exponential of variance covariate
## Formula: ~bfi_Nc
## Parameter estimates:
## expon
## 0.09838318
## Fixed effects: negaff ~ 1 + bfi_Nc
## Value Std.Error DF t-value p-value
## (Intercept) 2.4634419 0.04900460 1251 50.26960 0
## bfi_Nc 0.2617456 0.05133223 188 5.09905 0
## Correlation:
## (Intr)
## bfi_Nc 0.042
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.8296045 -0.6260128 -0.1640791 0.4841943 4.0175758
##
## Number of Observations: 1441
## Number of Groups: 190
The between-person variances are given by the upper portion of …
VarCorr(model.B3)
## id = pdSymm(1)
## Variance StdDev
## (Intercept) 0.3638400 0.6031915
## Residual 0.6520853 0.8075180
For the within-person variances we need to do some multiplication (and squaring to get variances). The Residual StdDev, \(\sqrt{\alpha_{0}}\) is here …
summary(model.B3)$sigma
## [1] 0.807518
And the parameter in the exponential variance function is here …
coef(model.B3$modelStruct$varStruct, uncons=FALSE)
## expon
## 0.09838318
So for a given value of bfi_Nc
we expect the residual within-person variance to be …
bfi_Nc <- seq(-2.5,2.5,.1)
varpred <- summary(model.B3)$sigma^2 * exp(coef(2*model.B3$modelStruct$varStruct, uncons=FALSE)*bfi_Nc)
We can make a plot of how the residual variances are related to the predictor
#make a data frame with predictions and back-transform centered variable to original scale.
plotvars <- as.data.frame(cbind(bfi_Nc,varpred))
names(plotvars) <- c("bfi_Nc","varpred")
plotvars$bfi_N <- plotvars$bfi_Nc + mean(person$bfi_N, na.rm=TRUE)
#plotting
ggplot(data = plotvars, aes(x=bfi_N,y = varpred), legend=FALSE) +
geom_point() +
geom_line() +
xlab("BFI Neuroticism") +
ylab("Predicted IIV in Daily NA (residual variance)") + ylim(0,1)
Lets check the relative fits of the two models before deciding if there is actually a relation between Neuroticism and between-person differences in intraindividual variability.
anova(model.B1,model.B3)
## Model df AIC BIC logLik Test L.Ratio p-value
## model.B1 1 4 3820.718 3841.805 -1906.359
## model.B3 2 5 3798.319 3824.677 -1894.159 1 vs 2 24.39873 <.0001
Yes! There is a significant difference in fit between the models. The heterogeneity in within-person variance is needed. We conclude that higher Neuroticism is related to higher IIV in daily Negative Affect.
Ok - we have attempted to push into the variance heterogeneity models. These models are not yet widely used in the analysis of experience sampling data. There are still a variety of details that must be worked out and understood before publishing analyses with these models. For example, we still have to figure out how to display the raw data and the results in ways that facilitate intuitive understanding.
But, it is all within reach!
Be Careful! It is easy to get into trouble! Make sure to do some simulations so that you are confident in your interpretation of the parameters. It is not always 2*straightforward.