1 Overview

A few years ago, I decided that it might be useful to introduce the models for repeated measures data (e.g., RM ANOVA, RM MANOVA, Growth models) along a “continuum”. The idea is that this presentation, wherein the models are all linked together, simplifies the whole world - and makes the similarities and differences among these models easier to identify and understand. In this tutorial, we tie the models together in a multilevel framework, working from RM ANOVA through RM MANOVA to Growth Models.

1.0.0.1 Prelim - Loading libraries used in this script.

library(psych)  #for decriptives etc
library(ggplot2)  #for plotting
library(nlme) #for mixed effects models
# library(lme4) #for mixed effects models
# library(lmerTest)  #to get significance tests from lmer

2 Data Preparation and Description

For our examples, we use 3-occasion WISC data that are equally spaced.

Load the repeated measures data

############################
####### Reading in the Data
############################
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/wisc3raw.csv"
#read in the .csv file using the url() function
wisc3raw <- read.csv(file=url(filepath),header=TRUE)

Subsetting to the variables of interest. 3-occasion equally spaced repeated measures + a person-level “grouping” variable (+ an id variable). Specifically, we include the id, verb2, verb4, verb6, and grad variables.

#Creating a list of variables of interest
varnames <- c("id","verb2","verb4","verb6","grad")
#sub-setting the columns of interest
wiscsub <- wisc3raw[ ,varnames]

#Looking at the sample-level descriptives for the chosen variables
describe(wiscsub)

	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
id	1	204	102.5000000	59.0338886	102.500	102.5000000	75.612600	1.00	204.00	203.00	0.0000000	-1.2176609	4.1331989
verb2	2	204	25.4153431	6.1063772	25.980	25.4026220	6.567918	5.95	39.85	33.90	-0.0639307	-0.3445494	0.4275319
verb4	3	204	32.6077451	7.3198841	32.820	32.4240244	7.183197	12.60	52.84	40.24	0.2317998	-0.0776524	0.5124944
verb6	4	204	43.7499020	10.6650511	42.545	43.4560976	11.297412	17.35	72.59	55.24	0.2356459	-0.3605210	0.7467029
grad	5	204	0.2254902	0.4189328	0.000	0.1585366	0.000000	0.00	1.00	1.00	1.3040954	-0.3007374	0.0293312

Multilevel modeling analyses typically require a long data set. So, we also reshape from wide to long in order to have a long data set.

#reshaping wide to long
verblong <- reshape(data=wiscsub, 
                     varying=c("verb2","verb4","verb6"), 
                     timevar="grade", idvar="id", 
                     direction="long", sep="")
#reordering for convenience
verblong <- verblong[order(verblong$id,verblong$grade),c("id","grade","verb","grad")]
#looking at dataset
head(verblong,12)

	id	grade	verb	grad
1.2	1	2	26.98	0
1.4	1	4	39.61	0
1.6	1	6	55.64	0
2.2	2	2	14.38	0
2.4	2	4	21.92	0
2.6	2	6	37.81	0
3.2	3	2	33.51	1
3.4	3	4	34.30	1
3.6	3	6	50.18	1
4.2	4	2	28.39	1
4.4	4	4	42.16	1
4.6	4	6	44.72	1

For clarity, lets consider the basic information representation of the 3-occasion repeated measures data. In particular, data (even non-repeated measures data) are summarized (at the sample-level) as …
Means Vector,
Variances-Covariance Matrix …

#mean vector (from wide data)
meanvector <- sapply(wiscsub[ ,c("verb2","verb4","verb6")], mean, na.rm=TRUE)
meanvector

##    verb2    verb4    verb6 
## 25.41534 32.60775 43.74990

#variance-covariance matrix (from wide data)
varcovmatrix <- cov(wiscsub[ ,c("verb2","verb4","verb6")], use="pairwise.complete.obs")
varcovmatrix

##          verb2    verb4     verb6
## verb2 37.28784 33.81957  47.40488
## verb4 33.81957 53.58070  62.25489
## verb6 47.40488 62.25489 113.74332

Lets make a note of this vector and matrix (i.e., on the board) as we will come back to them. Making visual counterparts can also be extremely useful - especially for faciliating higher-level conversations in a research group.
Basic Sample-level descriptions in visual form.

#boxplot by grade (from long data)
#note that the time variable has been converted to a factor = categorical
ggplot(data=verblong, aes(x=factor(grade), y=verb)) + 
  geom_boxplot(notch = TRUE) +
  stat_summary(fun.y="mean", geom="point", shape=23, size=3, fill="white") +
  labs(x = "Grade", y = "Verbal Ability")

#correlations across grade
#in the psych library (from wide data)
pairs.panels(wiscsub[,c("verb2","verb4","verb6")])

Always be careful about the scaling of the x- and y-axes in these plots.

OK - now we have some basics in place.

One additional recoding for convenience … (we will treat this in more detail later) is to center and scale our time variable. This gives us a specific 0 point and an intuitive 0,1,2 scale that is useful for our didactic purposes.

unique(verblong$grade)

## [1] 2 4 6

#Making a new time variable that recenters and rescales grade 
# from 2,4,6 to 0,1,2
verblong$time0 <- (verblong$grade-2)/2
unique(verblong$time0)

## [1] 0 1 2

#looking at dataset
head(verblong,12)

	id	grade	verb	grad	time0
1.2	1	2	26.98	0	0
1.4	1	4	39.61	0	1
1.6	1	6	55.64	0	2
2.2	2	2	14.38	0	0
2.4	2	4	21.92	0	1
2.6	2	6	37.81	0	2
3.2	3	2	33.51	1	0
3.4	3	4	34.30	1	1
3.6	3	6	50.18	1	2
4.2	4	2	28.39	1	0
4.4	4	4	42.16	1	1
4.6	4	6	44.72	1	2

Plotting the raw data along this new time variable.

#plotting intraindividual change RAW DATA
ggplot(data = verblong, aes(x = time0, y = verb, group = id)) +
  ggtitle("Raw Data") +
  geom_point() + 
  geom_line() +
  xlab("Time") + 
  ylab("WISC Verbal Score") + 
  ylim(0,100) + xlim(0,2)

Note that the time variable in this plot has NOT been converted to a factor. It is a continuous variable.

3 Introducing the Multilevel Model

We now frame up a series of models using the multilevel framework. Some good general resources … http://www.ats.ucla.edu/stat/r/seminars/Repeated_Measures/repeated_measures.htm
http://www.statpower.net/Content/GCM/Lectures/SW07.pdf
https://www.zoology.ubc.ca/~schluter/R/fit-model/

The presentation of all of these models here is an attempt to integrate traditions that are typically kept separate or pit against each other. In reality, they are just a few examples of the many, many possible models that exist. Each model is useful in specific situations.

The objective of all of our analyses is to ‘deconstruct’ the data into meaningful/interpretable pieces. Each ‘model’ does it in a different way, with different assumptions.

We can “judge” the models based on
1. how well they articulate and test our theory, and/or
2. how well they recover the data (evaluated as misfit to the mean vector and var-cov summaries).

Recall that the regression model may be compactly written as … \[ \mathbf{Y} = \mathbf{X}\mathbf{\beta} + \boldsymbol{e} \]

We make it into a multilevel regression model by adding an additional complication … separation into between-person and within-person components. Note: this is the very same distinction that is made in traditional presentations of ANOVA when examining between-person factors and within-person factors.

The general model becomes … (Please note and forgive that I do not have all the equations in place in a common form - am still working out a notation that works across all the models/traditions in a way I like)

\[ \boldsymbol{Y_{i}} = \boldsymbol{X_{i}}\boldsymbol{\beta} + \boldsymbol{Z_{i}}\boldsymbol{b_{i}} + \boldsymbol{e_{i}} \]

To fit all of the models to the data, we make use of the nlme package. It is also possible to use the lme4 package.

require(nlme)
require(lme4)

## Loading required package: lme4

## Loading required package: Matrix

## 
## Attaching package: 'lme4'

## The following object is masked from 'package:nlme':
## 
##     lmList

Lets start with an unconditional means model

Writing out the equation …

\[y_{it} = 1\beta_{0i} + e_{it}\] \[\beta_{0i} = 1\gamma_{00} + u_{0i}\] Note the explicit inclusion of the “silent” vector of 1s.

If we do the substitution of one equation into the other we get … \[y_{it} = 1\gamma_{00} + 1u_{0i} + e_{it}\]

where \[e_{it} \tilde{} N(0,\sigma^{2}_{e})\], and \[u_{i} \tilde{} N(0,\psi_{u0})\].

For clarity, lets write out the full variance covariance matrix of the within-person residuals (spanning across the T = 3 repeated measures), \[ \sigma^{2}_{e}\boldsymbol{I} = \sigma^{2}_{e} \left[\begin{array} {rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right] = \left[\begin{array} {rrr} \sigma^{2}_{e} & 0 & 0 \\ 0 & \sigma^{2}_{e} & 0 \\ 0 & 0 & \sigma^{2}_{e} \end{array}\right]\], This is the homoscedasticity of errors assumption.

And, the full variance covariance matrix of the between-person residuals, \[\left[\begin{array} {r} \psi_{u0} \end{array}\right] = u_{0i}u_{0i}' \]

and through some replacement (e.g., replacing $u_{i}$ = $b_{i}$, the different vectors of $1$ with $X_{i}$ and $Z_{i}$ = 1$ we can get to … The more general notation often used in the statistics literature … \[ \boldsymbol{Y_{i}} = \boldsymbol{X_{i}}\boldsymbol{\beta} + \boldsymbol{Z_{i}}\boldsymbol{b_{i}} + \boldsymbol{e_{i}} \] the equation we started with, or closer to my usual multilevel notation … \[ \boldsymbol{Y_{i}} = \boldsymbol{X_{i}}\boldsymbol{\gamma} + \boldsymbol{Z_{i}}\boldsymbol{u_{i}} + \boldsymbol{e_{i}} \] Let’s code the model.

um_fit <- lme(fixed= verb ~ 1, 
              random= ~ 1|id, 
              data=verblong,
              na.action = na.exclude)

summary(um_fit)

## Linear mixed-effects model fit by REML
##  Data: verblong 
##       AIC      BIC   logLik
##   4682.66 4695.906 -2338.33
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept) Residual
## StdDev:    4.406063   10.277
## 
## Fixed effects: verb ~ 1 
##                Value Std.Error  DF  t-value p-value
## (Intercept) 33.92433 0.5174359 408 65.56238       0
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.9626206 -0.6924771 -0.1481960  0.5511044  3.1026127 
## 
## Number of Observations: 612
## Number of Groups: 204

# um_fit2 <- lmer(verb ~ 1 + (1|id), 
#               data=verblong,
#               na.action = na.exclude)
# 
# summary(um_fit2)

What is the implied representation of the basic information? Lets re-create from the equations … (these should be collected into a matrix) Writing the equations in script takes a while … not all worked out yet. Sorry.

\[\left[\begin{array} {r} Y_{i0} \\ Y_{i1} \\ Y_{i2} \end{array}\right] = \left[\begin{array} {r} X_{0} \\ X_{1} \\ X_{2} \end{array}\right] \left[\begin{array} {r} \beta_{0} \end{array}\right] + \left[\begin{array} {r} Z_{0} \\ Z_{1} \\ Z_{2} \end{array}\right] \left[\begin{array} {r} u_{0i} \end{array}\right] + \left[\begin{array} {r} e_{i0} \\ e_{i1} \\ e_{i2} \end{array}\right]\]

In this unconditional means model the X and Z design matrices are vectors of 1s, so we get …

\[\left[\begin{array} {r} Y_{i0} \\ Y_{i1} \\ Y_{i2} \end{array}\right] = \left[\begin{array} {r} 1 \\ 1 \\ 1 \end{array}\right] \left[\begin{array} {r} \beta_{0} \end{array}\right] + \left[\begin{array} {r} 1 \\ 1 \\ 1 \end{array}\right] \left[\begin{array} {r} u_{0i} \end{array}\right] + \left[\begin{array} {r} e_{i0} \\ e_{i1} \\ e_{i2} \end{array}\right]\].

Let’s make the implied mean vector. We extract the fixed effects from the model using fixef(), specifically the contents of the $\beta$ matrix.

#parsing the model means structure = (fixed effects)
fixef(um_fit)

## (Intercept) 
##    33.92433

#(Intercept) 
#   33.92433 
beta <- matrix(fixef(um_fit)[1],
               nrow=1,ncol=1)
beta

##          [,1]
## [1,] 33.92433

Create the model design matrix for the fixed effects. In this model this is a matrix of order 3 x 1 …

X <- matrix(c(1,1,1),
            nrow=3,ncol=1)
X

##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1

Creating the model implied mean vector through multiplication \[ \mathbf{Y} = \mathbf{X}\mathbf{\beta} + 0 + 0\]

meanvector_um <- X %*% beta
meanvector_um

##          [,1]
## [1,] 33.92433
## [2,] 33.92433
## [3,] 33.92433

Note this is the overall (grand) mean.

Making the implied variance-covariance matrix.

#parsing the model variances & covariances
VarCorr(um_fit)

## id = pdLogChol(1) 
##             Variance  StdDev   
## (Intercept)  19.41339  4.406063
## Residual    105.61668 10.276997

From this, we need to create the model implied variance-covariance.

The model for time $t$ is … \[ \boldsymbol{Y_{i}} = \boldsymbol{X_{i}}\boldsymbol{\beta} + \boldsymbol{Z_{i}}\boldsymbol{b_{i}} + \boldsymbol{e_{i}} \] where $\mathbf{Z_{t}}$ is the random effects regressor matrix (usually a subset of ); $\_Beta$ contains the fixed effects; $b_{i}$ contains the random effects that are \[ \sim \mathcal{N}(0,\mathbf{\Psi})\] and $e_{i}$ are errors that are \[ \sim \mathcal{N}(0,\mathbf{\sigma^2}\mathbf{\Lambda_{i}})\]

The “standard assumption” is that $\mathbf{\Lambda_{i}} = \mathbf{I}$ (homogeneity of errors)

So, in order to reconstruct the implied variance-covariances, we need to find $\mathbf{\Psi}$, $\mathbf{\sigma^2}$, and do some multiplication.

Lets formalize this some …

We subtract the “means” from both sides … \[\left[\begin{array} {r} Y_{i0} \\ Y_{i1} \\ Y_{i2} \end{array}\right] - \left[\begin{array} {r} X_{0} \\ X_{1} \\ X_{2} \end{array}\right] \left[\begin{array} {r} \beta_{0} \end{array}\right] = \left[\begin{array} {r} Z_{0} \\ Z_{1} \\ Z_{2} \end{array}\right] \left[\begin{array} {r} u_{0i} \end{array}\right] + \left[\begin{array} {r} e_{i0} \\ e_{i1} \\ e_{i2} \end{array}\right]\]

So on the left side we now have de-meaned scores … on the right we have a between-portion part and a within-person part … \[\left[\begin{array} {r} Y^*_{i0} \\ Y^*_{i1} \\ Y^*_{i2} \end{array}\right] = \left[\begin{array} {r} Z_{0} \\ Z_{1} \\ Z_{2} \end{array}\right] \left[\begin{array} {r} u_{0i} \end{array}\right] + \left[\begin{array} {r} e_{i0} \\ e_{i1} \\ e_{i2} \end{array}\right]\]

Lets calculate var-cov matrix …

\[\left[\begin{array} {r} Y^*_{i0} \\ Y^*_{i1} \\ Y^*_{i2} \end{array}\right] \left[\begin{array} {r} Y^*_{i0} & Y^*_{i1} & Y^*_{i2} \end{array}\right] = \left[\begin{array} {r} Z_{0} \\ Z_{1} \\ Z_{2} \end{array}\right] \left[\begin{array} {r} u_{0i} \end{array}\right] \left[\begin{array} {r} u_{0i} \end{array}\right] \left[\begin{array} {r} Z_{0} & Z_{1} & Z_{2} \end{array}\right] + \left[\begin{array} {r} e_{i0} \\ e_{i1} \\ e_{i2} \end{array}\right] \left[\begin{array} {r} e_{i0} & e_{i1} & e_{i2} \end{array}\right]\]

By defintions we get …

\[\left[\begin{array} {r} \hat\Sigma \end{array}\right] = \left[\begin{array} {r} Z_{0} \\ Z_{1} \\ Z_{2} \end{array}\right] \left[\begin{array} {r} \Psi \end{array}\right] \left[\begin{array} {r} Z_{0} & Z_{1} & Z_{2} \end{array}\right] + \left[\begin{array} {r} \sigma^2_{e} \end{array}\right] \left[\begin{array} {r} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right]\]

Parse the between-person variances from the model output.

Psi <- matrix(c(as.numeric(VarCorr(um_fit)[1])),
                nrow=1,ncol=1)  
Psi

##          [,1]
## [1,] 19.41339

#          [,1]
# [1,] 19.41339

Create the model design matrix, Z, for the random effects. In this model this is a matrix of order 3 x 1 …

Z <- matrix(c(1,1,1),
            nrow=3,ncol=1)
Z

##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1

So, the implied varaince-covariance matrix of the between-person random effects for the three occasions is:

Z %*% Psi %*% t(Z)

##          [,1]     [,2]     [,3]
## [1,] 19.41339 19.41339 19.41339
## [2,] 19.41339 19.41339 19.41339
## [3,] 19.41339 19.41339 19.41339

Which in correlation units implies …

Z %*% 1.0 %*% t(Z)

##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    1    1
## [3,]    1    1    1

… pure between-person stability … exactly as intended!

Lets now parse the residual/“error” variance-covariance structure

#parsing sigma2
sigma2 <- as.numeric(VarCorr(um_fit)[2])
sigma2

## [1] 105.6167

#creating Lambda
Lambda <- diag(1,
               nrow=3,ncol=3)
Lambda

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

#doing the multiplication
sigma2 * Lambda

##          [,1]     [,2]     [,3]
## [1,] 105.6167   0.0000   0.0000
## [2,]   0.0000 105.6167   0.0000
## [3,]   0.0000   0.0000 105.6167

… homogeneity and uncorrelated errors … exactly as intended!

Finally - we put the between- and within- pieces together … to calculate the implied variance-covariances

varcovmatrix_um <- Z %*% Psi %*% t(Z) + (sigma2 * Lambda)
varcovmatrix_um

##           [,1]      [,2]      [,3]
## [1,] 125.03007  19.41339  19.41339
## [2,]  19.41339 125.03007  19.41339
## [3,]  19.41339  19.41339 125.03007

Notice the Compound Symmetry structure!

Together with the implied mean vector, we have the entire picture provided by ALL components of the model. Amazing!

Recall what the observed mean and var-cov were …

meanvector

##    verb2    verb4    verb6 
## 25.41534 32.60775 43.74990

varcovmatrix

##          verb2    verb4     verb6
## verb2 37.28784 33.81957  47.40488
## verb4 33.81957 53.58070  62.25489
## verb6 47.40488 62.25489 113.74332

For fun, lets look at the misfit to the data (observed matrix - model implied matrix)

meanvector - meanvector_um

##           [,1]
## [1,] -8.508987
## [2,] -1.316585
## [3,]  9.825572

varcovmatrix - varcovmatrix_um

##           verb2     verb4     verb6
## verb2 -87.74223  14.40618  27.99149
## verb4  14.40618 -71.44937  42.84150
## verb6  27.99149  42.84150 -11.28675

Fit is Not so good … formally …

#???function (copied in from above output)
#     AIC      BIC   logLik
# 4682.66 4695.906 -2338.33

Lets make a picture of the implied model.

#Calculating predicted scores from the models 
verblong$pred_um <- predict(um_fit)
#Making the prototype from the implied means
proto_um <- data.frame(cbind(c(1000,1000,1000),c(0,1,2),meanvector_um))
names(proto_um) <- c("id","time0","pred_um")

#plotting implied individual scores
ggplot(data = verblong, aes(x = time0, y = pred_um, group = id)) +
  ggtitle("Unconditional Means Model") +
  geom_point() + 
  geom_line() +
  geom_point(data=proto_um, color="red", size=2) +
  geom_line(data=proto_um, color="red", size=1) +
  xlab("Time") + 
  ylab("WISC Verbal Score") + 
  ylim(0,100) + xlim(0,2)

4 Repeated Measures ANOVA

OK - now lets move to an RM ANOVA by adding in effects for categorial time

Recall, we must tell R that “time0” is a categorical variable = factor()

verblong$time0 <- factor(verblong$time0, ordered=FALSE)
str(verblong$time0)

##  Factor w/ 3 levels "0","1","2": 1 2 3 1 2 3 1 2 3 1 ...

Here is our RM ANOVA model with time as within-person factor

timecat_fit <- lme(fixed= verb ~ 1 + time0, 
                   random= ~ 1|id, 
                   data=verblong,
                   na.action = na.exclude)
summary(timecat_fit)

## Linear mixed-effects model fit by REML
##  Data: verblong 
##        AIC      BIC    logLik
##   4013.176 4035.235 -2001.588
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept) Residual
## StdDev:    6.915667 4.514145
## 
## Fixed effects: verb ~ 1 + time0 
##                 Value Std.Error  DF  t-value p-value
## (Intercept) 25.415343 0.5782155 406 43.95480       0
## time01       7.192402 0.4469670 406 16.09157       0
## time02      18.334559 0.4469670 406 41.01994       0
##  Correlation: 
##        (Intr) time01
## time01 -0.387       
## time02 -0.387  0.500
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.61263084 -0.54466998 -0.02451831  0.48627468  3.50536334 
## 
## Number of Observations: 612
## Number of Groups: 204

# timecat_fit2 <- lmer(verb ~ 1 + time0 + (1|id), 
#                    data=verblong,
#                    na.action = na.exclude)
# summary(timecat_fit2)

Remember that interpretation is with respect to time0, with the first category set as default intercept which is …

str(verblong$time0)

##  Factor w/ 3 levels "0","1","2": 1 2 3 1 2 3 1 2 3 1 ...

So what is the implied representation of the basic information (means and var-cov)?

Lets be explcit with our model … \[\left[\begin{array} {r} Y_{i0} \\ Y_{i1} \\ Y_{i2} \end{array}\right] = \left[\begin{array} {r} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{array}\right] \left[\begin{array} {r} \beta_{0} \\ \beta_{1} \\ \beta_{2} \end{array}\right] + \left[\begin{array} {r} 1 \\ 1 \\ 1 \end{array}\right] \left[\begin{array} {r} u_{0i} \end{array}\right] + \left[\begin{array} {r} e_{i0} \\ e_{i1} \\ e_{i2} \end{array}\right]\].

Making the implied mean vector.

#parsing the model means structure = (fixed effects)
fixef(timecat_fit)

## (Intercept)      time01      time02 
##   25.415343    7.192402   18.334559

beta <- matrix(c(fixef(timecat_fit)[1], fixef(timecat_fit)[2], fixef(timecat_fit)[3]),
               nrow=3, ncol=1)
beta

##           [,1]
## [1,] 25.415343
## [2,]  7.192402
## [3,] 18.334559

Create the model design matrix for the fixed effects. In this model this is a matrix of order 3 x 3 …

X <- matrix(c(1,1,1,0,1,0,0,0,1),
            nrow=3,ncol=3)
X

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    1    1    0
## [3,]    1    0    1

Creating the model implied mean vector through multiplication \[ \mathbf{Y} = \mathbf{X}\mathbf{\beta} + 0 + 0\]

meanvector_timecat <- X %*% beta
meanvector_timecat

##          [,1]
## [1,] 25.41534
## [2,] 32.60775
## [3,] 43.74990

See the differences in the means across levels of time0.

Making the implied variance-covariance matrix.

#parsing the model variances & covariances
VarCorr(timecat_fit)

## id = pdLogChol(1) 
##             Variance StdDev  
## (Intercept) 47.82645 6.915667
## Residual    20.37751 4.514145

From this, we need to create the model implied variance-covariance.

Parse the between-person variances from the model output.

Psi <- matrix(c(as.numeric(VarCorr(timecat_fit)[1])),
                nrow=1,ncol=1)  
Psi

##          [,1]
## [1,] 47.82645

Create the model design matrix for the random effects. In this model this is a matrix of order 3 x 1 …

Z <- matrix(c(1,1,1),
            nrow=3,ncol=1)
Z

##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1

So, the implied varaince-covariance matrix of the between-person random effects for the three occasions is:

Z %*% Psi %*% t(Z)

##          [,1]     [,2]     [,3]
## [1,] 47.82645 47.82645 47.82645
## [2,] 47.82645 47.82645 47.82645
## [3,] 47.82645 47.82645 47.82645

parse the residual/“error” variance-covariance

sigma2 <- as.numeric(VarCorr(timecat_fit)[2])
sigma2

## [1] 20.37751

Lambda <- diag(1,nrow=3,ncol=3)
Lambda

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

So the residual within-person residual/“error” structure

sigma2 * Lambda

##          [,1]     [,2]     [,3]
## [1,] 20.37751  0.00000  0.00000
## [2,]  0.00000 20.37751  0.00000
## [3,]  0.00000  0.00000 20.37751

As before - homogeneity and unccorrelated errors

- 1. Finally, calculate the implied variance-covariances of total model

varcovmatrix_timecat <- Z %*% Psi %*% t(Z) + (sigma2 * Lambda)
varcovmatrix_timecat

##          [,1]     [,2]     [,3]
## [1,] 68.20396 47.82645 47.82645
## [2,] 47.82645 68.20396 47.82645
## [3,] 47.82645 47.82645 68.20396

Notice the Compound Symmetry structure!

For fun, lets look at the misfit (real matrix - model implied)

#misfit of means
meanvector - meanvector_timecat

##               [,1]
## [1,]  2.486900e-14
## [2,] -2.131628e-14
## [3,] -5.684342e-14

#misfit of var-cov
varcovmatrix - varcovmatrix_timecat

##             verb2     verb4      verb6
## verb2 -30.9161173 -14.00688 -0.4215677
## verb4 -14.0068803 -14.62326 14.4284384
## verb6  -0.4215677  14.42844 45.5393553

Means are perfect, var-cov not so good … particularly on variances

Lets make a picture of the implied model.

#Calculating predicted scores from the models 
verblong$pred_timecat <- predict(timecat_fit)
#Making the prototype from the implied means
proto_timecat <- data.frame(cbind(c(1000,1000,1000),c(0,1,2),meanvector_timecat))
names(proto_timecat) <- c("id","time0","pred_timecat")

#need to convert time0 back into a continuous variable for plotting as intraindividual change
verblong$time0 <- as.numeric(unclass(verblong$time0))-1
#plotting implied individual scores
ggplot(data = verblong, aes(x = time0, y = pred_timecat, group = id)) +
  ggtitle("RM ANOVA (time categorical) Model") +
  geom_point() + 
  geom_line() +
  geom_point(data=proto_timecat, color="red", size=2) +
  geom_line(data=proto_timecat, color="red", size=1, linetype=2) +
  xlab("Time") + 
  ylab("WISC Verbal Score") + 
  ylim(0,100) + xlim(0,2)

Notice that the implied lines are all parallel. This is a model of mean differences.

Can also see what the residuals look like.

#Calculating residual scores from the models 
verblong$resid_timecat <- residuals(timecat_fit)

#plotting implied individual scores
ggplot(data = verblong, aes(x = time0, y = resid_timecat, group = id)) +
  ggtitle("RM ANOVA (time categorical) Model Residuals") +
  geom_point() + 
  xlab("Time") + 
  ylab("WISC Verbal Score") + 
  xlim(0,2)

Note, that the points are not connected - this is to highlight the model and the implication that the within-person residuals are from the occasion-specific mean (NOT from a trajectory). Perhaps the implied trajectory should also NOT have connecting lines.
Note also the heteroskedasticity of the residuals.

5 Repeated Measures MANOVA

OK - now lets adjust the RM ANOVA error structure to get a RM MANOVA.
The “multivariate” part of the MANOVA, has to do with relaxing the assumption on the error structure - more flexible than compound symmetry)

We are going to do this in two steps. Recall, we must tell R that “time0” is a categorical variable = factor()

verblong$time0<-factor(verblong$time0, ordered=FALSE)
str(verblong$time0)

##  Factor w/ 3 levels "0","1","2": 1 2 3 1 2 3 1 2 3 1 ...

Here is our RM ANOVA model with time as within-person factor The error stucture is Compound symmetry with Heterogeneous variances

timecathet_fit <- lme(fixed= verb ~ 1 + time0, 
                      random= ~ 1|id,
                      weights=varIdent(form=~1|time0),
                      data=verblong,
                      na.action = na.exclude)

summary(timecathet_fit)

## Linear mixed-effects model fit by REML
##  Data: verblong 
##        AIC      BIC    logLik
##   3930.253 3961.136 -1958.127
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept) Residual
## StdDev:    6.086682 2.871709
## 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | time0 
##  Parameter estimates:
##        0        1        2 
## 1.000000 1.254186 2.341456 
## Fixed effects: verb ~ 1 + time0 
##                 Value Std.Error  DF  t-value p-value
## (Intercept) 25.415343 0.4712021 406 53.93724       0
## time01       7.192402 0.3225105 406 22.30130       0
## time02      18.334559 0.5119102 406 35.81597       0
##  Correlation: 
##        (Intr) time01
## time01 -0.266       
## time02 -0.168  0.245
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.24995336 -0.55415412 -0.05694858  0.51324627  2.99087738 
## 
## Number of Observations: 612
## Number of Groups: 204

So what is the implied representation of the basic information Making the implied mean vector.

#parsing the model means structure = (fixed effects)
fixef(timecathet_fit)

## (Intercept)      time01      time02 
##   25.415343    7.192402   18.334559

beta <- matrix(c(fixef(timecathet_fit)[1], fixef(timecathet_fit)[2], fixef(timecathet_fit)[3]),
               nrow=3, ncol=1)
beta

##           [,1]
## [1,] 25.415343
## [2,]  7.192402
## [3,] 18.334559

Create the model design matrix, X,for the fixed effects. In this model this is a matrix of order 3 x 3 …(remember there are 3 fixed effects in this model)

X <- matrix(c(1,1,1,0,1,0,0,0,1),
            nrow=3,ncol=3)
X

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    1    1    0
## [3,]    1    0    1

Creating the model implied mean vector through multiplication \[ \mathbf{Y} = \mathbf{X}\mathbf{\beta} + 0 + 0\]

meanvector_timecathet  <- X %*% beta
meanvector_timecathet

##          [,1]
## [1,] 25.41534
## [2,] 32.60775
## [3,] 43.74990

See the differences in the means across levels of time0. This is exactly the same as in the last model

#parsing the model variances & covariances
VarCorr(timecathet_fit)

## id = pdLogChol(1) 
##             Variance  StdDev  
## (Intercept) 37.047702 6.086682
## Residual     8.246713 2.871709

#parse the between-person variances
Psi <- matrix(c(as.numeric(VarCorr(timecathet_fit)[1])),
              nrow=1,ncol=1)  
Psi

##         [,1]
## [1,] 37.0477

#create the model design matrix
Z <- matrix(c(1,1,1),
            nrow=3,ncol=1)
Z

##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1

So the implied varaince covariance of the between-person random effects for the three repeated measures is:

Z %*% Psi %*% t(Z)

##         [,1]    [,2]    [,3]
## [1,] 37.0477 37.0477 37.0477
## [2,] 37.0477 37.0477 37.0477
## [3,] 37.0477 37.0477 37.0477

Now for the within-person residual var-cov.

#parse the residual/"error" variance-covariance
sigma <- as.numeric(VarCorr(timecathet_fit)[4]) #note this is standard deviation
sigma

## [1] 2.871709

#[1] 2.871709

We have antoher step now with the heterogeneous variances. Get the heterogeneous weights for the residaul SD from the summary(timecathet_fit) above

# Variance function:
#   Structure: Different standard deviations per stratum
# Formula: ~1 | time0 
# Parameter estimates:
#        0        1        2 
# 1.000000 1.254186 2.341456

Pulling from the output.

sigmahet <- sigma * (diag(c(1.000000, 1.254186, 2.341456),
               nrow=3,ncol=3))
#Calculate the implied residual error variance-covariance
sigmahet^2           #note that sigma is squared here, after doing all the multiplication.

##          [,1]     [,2]     [,3]
## [1,] 8.246713  0.00000  0.00000
## [2,] 0.000000 12.97193  0.00000
## [3,] 0.000000  0.00000 45.21191

Finally calculating the model implied between- + within var-cov structure

varcovmatrix_timecathet <- Z %*% Psi %*% t(Z) + sigmahet^2
varcovmatrix_timecathet

##          [,1]     [,2]     [,3]
## [1,] 45.29441 37.04770 37.04770
## [2,] 37.04770 50.01964 37.04770
## [3,] 37.04770 37.04770 82.25961

Note the heterogeneity along the diagonal

For fun, lets look at the misfit (real matrix - model implied)

meanvector - meanvector_timecathet

##              [,1]
## [1,] 4.973799e-14
## [2,] 1.065814e-13
## [3,] 8.526513e-14

varcovmatrix - varcovmatrix_timecathet

##           verb2     verb4    verb6
## verb2 -8.006572 -3.228132 10.35718
## verb4 -3.228132  3.561067 25.20719
## verb6 10.357180 25.207186 31.48370

Getting better … formally

anova(timecat_fit, timecathet_fit)

	call	Model	df	AIC	BIC	logLik	Test	L.Ratio	p-value
timecat_fit	lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 \| id, na.action = na.exclude)	1	5	4013.176	4035.235	-2001.588		NA	NA
timecathet_fit	lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 \| id, weights = varIdent(form = ~1 \| time0), na.action = na.exclude)	2	7	3930.253	3961.136	-1958.127	1 vs 2	86.92304	0

The heterogeneity was an improvement

OK - now lets adjust the RM MANOVA error structure all the way = Fully UNSTRUCTURED

Fully Unstructured residual error structure

timecatunst_fit <- lme(fixed= verb ~ 1 + time0, 
                       random= ~ 1|id,
                       weights=varIdent(form=~1|time0),
                       correlation=corSymm(form=~1|id),
                       data=verblong,
                       na.action = na.exclude)

summary(timecatunst_fit)

## Linear mixed-effects model fit by REML
##  Data: verblong 
##       AIC      BIC   logLik
##   3869.06 3913.178 -1924.53
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept) Residual
## StdDev:    5.506836 2.638686
## 
## Correlation Structure: General
##  Formula: ~1 | id 
##  Parameter estimate(s):
##  Correlation: 
##   1     2    
## 2 0.275      
## 3 0.709 0.725
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | time0 
##  Parameter estimates:
##        0        1        2 
## 1.000000 1.827573 3.461330 
## Fixed effects: verb ~ 1 + time0 
##                 Value Std.Error  DF  t-value p-value
## (Intercept) 25.415343 0.4275323 406 59.44660       0
## time01       7.192402 0.3374451 406 21.31428       0
## time02      18.334559 0.5249722 406 34.92482       0
##  Correlation: 
##        (Intr) time01
## time01 -0.118       
## time02  0.221  0.507
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.24169075 -0.62480430 -0.06272641  0.63543916  3.06613520 
## 
## Number of Observations: 612
## Number of Groups: 204

So what is the implied representation of the basic information Making the implied mean vector.

#parsing the model means structure = (fixed effects)
fixef(timecatunst_fit)

## (Intercept)      time01      time02 
##   25.415343    7.192402   18.334559

beta <- matrix(fixef(timecatunst_fit),
               nrow=3, ncol=1)
beta

##           [,1]
## [1,] 25.415343
## [2,]  7.192402
## [3,] 18.334559

Create the model design matrix for the fixed effects. In this model this is a matrix of order 3 x 3 …

X <- matrix(c(1,1,1,0,1,0,0,0,1),
            nrow=3,ncol=3)
X

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    1    1    0
## [3,]    1    0    1

Creating the model implied mean vector through multiplication \[ \mathbf{Y} = \mathbf{X}\mathbf{\beta} + 0 + 0\]

meanvector_timecatunst  <- X %*% beta
meanvector_timecatunst

##          [,1]
## [1,] 25.41534
## [2,] 32.60775
## [3,] 43.74990

See the differences in the means across levels of time0. This is exactly the same as in the last model

Now for the varaince-covariances. First the between- part.

#parsing the model variances & covariances
VarCorr(timecatunst_fit)

## id = pdLogChol(1) 
##             Variance  StdDev  
## (Intercept) 30.325248 5.506836
## Residual     6.962664 2.638686

#creating the model implied variance-covariance
#parse the between-person variances
Psi <- matrix(c(as.numeric(VarCorr(timecatunst_fit)[1])),
              nrow=1,ncol=1)  
Psi

##          [,1]
## [1,] 30.32525

#create the random effects model design matrix
Z <- matrix(c(1,1,1),
            nrow=3,ncol=1)
Z

##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1

So the implied varaince covariance of the between-person random effects for the three repeated measures is:

Z %*% Psi %*% t(Z)

##          [,1]     [,2]     [,3]
## [1,] 30.32525 30.32525 30.32525
## [2,] 30.32525 30.32525 30.32525
## [3,] 30.32525 30.32525 30.32525

Same as before.

Now it gets a bit more complicated … because we have a fully unstructured matrix for the within-person residual variance-covaraince.

#parse the residual/"error" variance-covariance
sigma <- as.numeric(VarCorr(timecatunst_fit)[4]) #note this is standard deviation
#From the summary(timecatunst_fit) above
#  Variance function:
#   Structure: Different standard deviations per stratum
# Formula: ~1 | time0 
# Parameter estimates:
#        0        1        2 
# 1.000000 1.827573 3.461330 
sigmaunst <- sigma * (diag(c(1.000000, 1.827573, 3.461330),
                           nrow=3,ncol=3))
#From the summary(timecatunst_fit) above
# Correlation Structure: General
# Formula: ~1 | id 
# Parameter estimate(s):
#   Correlation: 
#   1     2    
# 2 0.275      
# 3 0.709 0.725
cormatrixunst <- matrix(c(1.000, 0.275, 0.709,
                          0.275, 1.000, 0.725,
                          0.709, 0.725, 1.000),
                        nrow=3,ncol=3)
#Pre and post multiply by SDs to convert Correlation matrix into
#Covariance matrix
covresidunst <- sigmaunst %*% cormatrixunst %*% t(sigmaunst)
covresidunst

##           [,1]      [,2]     [,3]
## [1,]  6.962664  3.499314 17.08695
## [2,]  3.499314 23.255458 31.93237
## [3,] 17.086955 31.932371 83.41832

Note that it is fully “UNSTRUCTRED”

Finally, Calculate the implied between- + within- variance-covariances

varcovmatrix_timecatunst <- Z %*% Psi %*% t(Z) + covresidunst
varcovmatrix_timecatunst

##          [,1]     [,2]      [,3]
## [1,] 37.28791 33.82456  47.41220
## [2,] 33.82456 53.58071  62.25762
## [3,] 47.41220 62.25762 113.74357

For fun, lets look at the misfit (real matrix - model implied)

meanvector - meanvector_timecatunst

##              [,1]
## [1,] 1.598721e-13
## [2,] 7.105427e-14
## [3,] 1.563194e-13

varcovmatrix - varcovmatrix_timecatunst

##               verb2         verb4         verb6
## verb2 -6.906778e-05 -4.991762e-03 -0.0073203738
## verb4 -4.991762e-03 -2.480221e-06 -0.0027309696
## verb6 -7.320374e-03 -2.730970e-03 -0.0002526546

THAT’S PERFECT … we have modeled the data perfectly!

formal improvement

anova(timecathet_fit, timecatunst_fit)

	call	Model	df	AIC	BIC	logLik	Test	L.Ratio	p-value
timecathet_fit	lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 \| id, weights = varIdent(form = ~1 \| time0), na.action = na.exclude)	1	7	3930.253	3961.136	-1958.127		NA	NA
timecatunst_fit	lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 \| id, correlation = corSymm(form = ~1 \| id), weights = varIdent(form = ~1 \| time0), na.action = na.exclude)	2	10	3869.060	3913.178	-1924.530	1 vs 2	67.19366	0

The unstructured bit was useful. Note that we have simply reprepresented the original data in a different way.

6 Growth Model

Now lets go to the growth model … where we add some restrictions back in.

First we have to convert “time0” variable back to continuous

verblong$time0 <- as.numeric(unclass(verblong$time0))-1
#Note that we had to take care here to get the desired coding.
str(verblong$time0)

##  num [1:612] 0 1 2 0 1 2 0 1 2 0 ...

Now lets fit the model. Remember this is different because time0 is continuous. And our design matrices are also different.

lingrowth_fit <- lme(fixed= verb ~ 1 + time0, 
                     random= ~ 1 + time0|id,
                     #weights=varIdent(form=~1|time0), #note this is commented out
                     #correlation=corSymm(form=~1|id), #note this is commented out
                     data=verblong,
                     na.action = na.exclude)

summary(lingrowth_fit)

## Linear mixed-effects model fit by REML
##  Data: verblong 
##        AIC      BIC    logLik
##   3919.721 3946.202 -1953.861
## 
## Random effects:
##  Formula: ~1 + time0 | id
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev   Corr  
## (Intercept) 4.588971 (Intr)
## time0       2.542526 0.944 
## Residual    3.896441       
## 
## Fixed effects: verb ~ 1 + time0 
##                 Value Std.Error  DF  t-value p-value
## (Intercept) 24.757051 0.4065067 407 60.90195       0
## time0        9.167279 0.2624878 407 34.92460       0
##  Correlation: 
##       (Intr)
## time0 0.157 
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.77592608 -0.52613946 -0.03287538  0.53033062  2.97854634 
## 
## Number of Observations: 612
## Number of Groups: 204

# lingrowth_fit2 <- lmer(verb ~ 1 + time0 + (1 + time0|id),
#                      data=verblong,
#                      na.action = na.exclude)
# 
# summary(lingrowth_fit2)

So what is the implied representation of the basic information

Lets be explcit with our model … \[\left[\begin{array} {r} Y_{i0} \\ Y_{i1} \\ Y_{i2} \end{array}\right] = \left[\begin{array} {r} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{array}\right] \left[\begin{array} {r} \beta_{0} \\ \beta_{1} \end{array}\right] + \left[\begin{array} {r} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{array}\right] \left[\begin{array} {r} u_{0i} \\ u_{1i} \end{array}\right] + \left[\begin{array} {r} e_{i0} \\ e_{i1} \\ e_{i2} \end{array}\right]\].

Making the implied mean vector.

#parsing the model means structure = (fixed effects)
fixef(lingrowth_fit)

## (Intercept)       time0 
##   24.757051    9.167279

beta <- matrix(fixef(lingrowth_fit),
               nrow=2, ncol=1)
beta

##           [,1]
## [1,] 24.757051
## [2,]  9.167279

This is different than before.
Create the model design matrix for the fixed effects. In this model this is a matrix of order 3 x 2 …
there are now 2 fixed effects (vs 3 ficed effects in prior models).

X <- matrix(c(1,1,1,0,1,2),
            nrow=3,ncol=2)
X

##      [,1] [,2]
## [1,]    1    0
## [2,]    1    1
## [3,]    1    2

Creating the model implied mean vector through multiplication \[ \mathbf{Y} = \mathbf{X}\mathbf{\beta} + 0 + 0\]

meanvector_lingrowth  <- X %*% beta
meanvector_lingrowth

##          [,1]
## [1,] 24.75705
## [2,] 33.92433
## [3,] 43.09161

See the differences in the means across specific choices of time0 in the design matrix. This is different than the previous models.

Now again some more complication in getting out the implied between-person covariance matrix.

#parsing the model variances & covariances
VarCorr(lingrowth_fit)

## id = pdLogChol(1 + time0) 
##             Variance StdDev   Corr  
## (Intercept) 21.05865 4.588971 (Intr)
## time0        6.46444 2.542526 0.944 
## Residual    15.18225 3.896441

#parse the between-person standard deviations
btwn_SDs <- diag(c(as.numeric(VarCorr(lingrowth_fit)[4]),
                   as.numeric(VarCorr(lingrowth_fit)[5])),
                 nrow=2,ncol=2)
btwn_SDs

##          [,1]     [,2]
## [1,] 4.588971 0.000000
## [2,] 0.000000 2.542526

#parse the between-person correlations
btwn_Corr <- matrix(c(1.0, as.numeric(VarCorr(lingrowth_fit)[8]),
                      as.numeric(VarCorr(lingrowth_fit)[8]), 1.0),
                    nrow=2,ncol=2)
btwn_Corr

##       [,1]  [,2]
## [1,] 1.000 0.944
## [2,] 0.944 1.000

#Pre and post multiply the correlation matirx by the SD matrix to get the between-person var-cov matrix
Psi <- btwn_SDs %*% btwn_Corr %*% t(btwn_SDs)  
Psi

##          [,1]      [,2]
## [1,] 21.05865 11.014194
## [2,] 11.01419  6.464438

Create the model design matrix. Note there are two random effects now, so this is a 3 x 2 …

Z <- matrix(c(1,1,1,
              0,1,2),
            nrow=3,ncol=2)
Z

##      [,1] [,2]
## [1,]    1    0
## [2,]    1    1
## [3,]    1    2

So the implied varaince covariance of the between-person random effects for the three repeated measures is:

Z %*% Psi %*% t(Z)

##          [,1]     [,2]     [,3]
## [1,] 21.05865 32.07285 43.08704
## [2,] 32.07285 49.55148 67.03011
## [3,] 43.08704 67.03011 90.97318

Different than before. Note the increasing variance and covariance.

Now for the within-person error variance-covariance.

#parse the residual/"error" standard deviaion
sigma <- as.numeric(VarCorr(lingrowth_fit)[6]) #note this is standard deviation
sigma

## [1] 3.896441

#parse the multipliers (in this case they are all 1.0), 
sigmahomogeneous <- sigma * (diag(c(1.0, 1.0, 1.0),
                                  nrow=3,ncol=3))
sigmahomogeneous

##          [,1]     [,2]     [,3]
## [1,] 3.896441 0.000000 0.000000
## [2,] 0.000000 3.896441 0.000000
## [3,] 0.000000 0.000000 3.896441

#parse the correlations (in this case they are uncorrelated)
cormatrixstructure <- matrix(c(1.000, 0.000, 0.000,
                               0.000, 1.000, 0.000,
                               0.000, 0.000, 1.000),
                             nrow=3,ncol=3)
#Compute the residual/"error" var-cov matrix
covresid <- sigmahomogeneous %*% cormatrixstructure %*% t(sigmahomogeneous)
#Notice that sigma gets squared in the multiplication, to become sigma^2 = variance
covresid

##          [,1]     [,2]     [,3]
## [1,] 15.18225  0.00000  0.00000
## [2,]  0.00000 15.18225  0.00000
## [3,]  0.00000  0.00000 15.18225

Note the homgeneity + uncorrelated of the residual errors.

Finally Calculate the implied variance-covariances

varcovmatrix_lingrowth <- Z %*% Psi %*% t(Z) + covresid
varcovmatrix_lingrowth

##          [,1]     [,2]      [,3]
## [1,] 36.24091 32.07285  43.08704
## [2,] 32.07285 64.73373  67.03011
## [3,] 43.08704 67.03011 106.15544

For fun, lets look at the misfit (real matrix - model implied)

meanvector - meanvector_lingrowth

##            [,1]
## [1,]  0.6582925
## [2,] -1.3165850
## [3,]  0.6582925

Just a bit off. Linear is not quite perfect.

varcovmatrix - varcovmatrix_lingrowth

##          verb2      verb4     verb6
## verb2 1.046935   1.746721  4.317840
## verb4 1.746721 -11.153030 -4.775224
## verb6 4.317840  -4.775224  7.587879

Pretty good.

Formal comparison of the models.

anova(timecathet_fit, lingrowth_fit)

	call	Model	df	AIC	BIC	logLik	Test	L.Ratio	p-value
timecathet_fit	lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 \| id, weights = varIdent(form = ~1 \| time0), na.action = na.exclude)	1	7	3930.253	3961.136	-1958.127		NA	NA
lingrowth_fit	lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 + time0 \| id, na.action = na.exclude)	2	6	3919.721	3946.202	-1953.861	1 vs 2	8.532182	0.0034892

Linear Growth is better than Compound Symmetry with Heterogeneous Variances by AIC & BIC!

anova(timecatunst_fit, lingrowth_fit)

	call	Model	df	AIC	BIC	logLik	Test	L.Ratio	p-value
timecatunst_fit	lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 \| id, correlation = corSymm(form = ~1 \| id), weights = varIdent(form = ~1 \| time0), na.action = na.exclude)	1	10	3869.060	3913.178	-1924.530		NA	NA
lingrowth_fit	lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 + time0 \| id, na.action = na.exclude)	2	6	3919.721	3946.202	-1953.861	1 vs 2	58.66148	0

Linear growth is NOT as good as RM MANOVA with fully UNSTRUCTURED residual structure. However, the growth model may, in this case, be closer to a specific theory of change! And also provides something different than “test of mean differences”. It provides a test of a specific kind of “within-person” change.

Maybe a couple of plot to see the intraindividual change implications …

#Calculating predicted scores from two models 
verblong$pred_timecatunst <- predict(timecatunst_fit)
verblong$pred_lingrowth <- predict(lingrowth_fit)

#Making the prototypes from the implied means
proto_timecatunst <- data.frame(cbind(c(1000,1000,1000),c(0,1,2),meanvector_timecatunst))
names(proto_timecatunst) <- c("id","time0","pred_timecatunst")

proto_lingrowth <- data.frame(cbind(c(1000,1000,1000),c(0,1,2),meanvector_lingrowth))
names(proto_lingrowth) <- c("id","time0","pred_lingrowth")

#plotting intraindividual change RAW DATA
ggplot(data = verblong, aes(x = time0, y = verb, group = id)) +
  ggtitle("Raw Data") +
  geom_point() + 
  geom_line() +
  xlab("Time") + 
  ylab("WISC Verbal Score") + 
  ylim(0,100) + xlim(0,2)

#plotting intraindividual change TIMECATUNST
ggplot(data = verblong, aes(x = time0, y = pred_timecatunst, group = id)) +
  ggtitle("Predicted Unstructured RM MANOVA") +
  geom_point() + 
  geom_line() +
  geom_point(data=proto_timecatunst, color="red", size=2) +
  geom_line(data=proto_timecatunst, color="red", size=1) +
  xlab("Time") + 
  ylab("Predicted WISC Verbal Score") + 
  ylim(0,100) + xlim(0,2)

#plotting intraindividual change LINGROWTH
ggplot(data = verblong, aes(x = time0, y = pred_lingrowth, group = id)) +
  ggtitle("Predicted Linear Growth") +
  geom_point() + 
  geom_line() +
  geom_point(data=proto_lingrowth, color="red", size=2) +
  geom_line(data=proto_lingrowth, color="red", size=1) +
  xlab("Time") + 
  ylab("Predicted WISC Verbal Score") + 
  ylim(0,100) + xlim(0,2)

Now that was satisfying!
=:-]

7 Conclusion

Yay for MATRIX ALGEBRA! Yay for models just being a bunch of components that reproduce means and var-covs!

RM ANOVA to RM MANOVA to Growth Curves: Connections within a Multilevel Framework

1 Overview

1.0.0.1 Prelim - Loading libraries used in this script.

2 Data Preparation and Description

3 Introducing the Multilevel Model

4 Repeated Measures ANOVA

5 Repeated Measures MANOVA

6 Growth Model

7 Conclusion