1 Overview

A few years ago, I decided that it might be useful to introduce the models for repeated measures data (e.g., RM ANOVA, RM MANOVA, Growth models) along a “continuum”. The idea is that this presentation, wherein the models are all linked together, simplifies the whole world - and makes the similarities and differences among these models easier to identify and understand. In this tutorial, we tie the models together in a multilevel framework, working from RM ANOVA through RM MANOVA to Growth Models.

2 Data Preparation and Description

For our examples, we use 3-occasion WISC data that are equally spaced.

Load the repeated measures data

Subsetting to the variables of interest. 3-occasion equally spaced repeated measures + a person-level “grouping” variable (+ an id variable). Specifically, we include the id, verb2, verb4, verb6, and grad variables.

vars n mean sd median trimmed mad min max range skew kurtosis se
id 1 204 102.5000000 59.0338886 102.500 102.5000000 75.612600 1.00 204.00 203.00 0.0000000 -1.2176609 4.1331989
verb2 2 204 25.4153431 6.1063772 25.980 25.4026220 6.567918 5.95 39.85 33.90 -0.0639307 -0.3445494 0.4275319
verb4 3 204 32.6077451 7.3198841 32.820 32.4240244 7.183197 12.60 52.84 40.24 0.2317998 -0.0776524 0.5124944
verb6 4 204 43.7499020 10.6650511 42.545 43.4560976 11.297412 17.35 72.59 55.24 0.2356459 -0.3605210 0.7467029
grad 5 204 0.2254902 0.4189328 0.000 0.1585366 0.000000 0.00 1.00 1.00 1.3040954 -0.3007374 0.0293312

Multilevel modeling analyses typically require a long data set. So, we also reshape from wide to long in order to have a long data set.

id grade verb grad
1.2 1 2 26.98 0
1.4 1 4 39.61 0
1.6 1 6 55.64 0
2.2 2 2 14.38 0
2.4 2 4 21.92 0
2.6 2 6 37.81 0
3.2 3 2 33.51 1
3.4 3 4 34.30 1
3.6 3 6 50.18 1
4.2 4 2 28.39 1
4.4 4 4 42.16 1
4.6 4 6 44.72 1

For clarity, lets consider the basic information representation of the 3-occasion repeated measures data. In particular, data (even non-repeated measures data) are summarized (at the sample-level) as …
Means Vector,
Variances-Covariance Matrix

##    verb2    verb4    verb6 
## 25.41534 32.60775 43.74990
##          verb2    verb4     verb6
## verb2 37.28784 33.81957  47.40488
## verb4 33.81957 53.58070  62.25489
## verb6 47.40488 62.25489 113.74332

Lets make a note of this vector and matrix (i.e., on the board) as we will come back to them. Making visual counterparts can also be extremely useful - especially for faciliating higher-level conversations in a research group.
Basic Sample-level descriptions in visual form.

Always be careful about the scaling of the x- and y-axes in these plots.

OK - now we have some basics in place.

One additional recoding for convenience … (we will treat this in more detail later) is to center and scale our time variable. This gives us a specific 0 point and an intuitive 0,1,2 scale that is useful for our didactic purposes.

## [1] 2 4 6
## [1] 0 1 2
id grade verb grad time0
1.2 1 2 26.98 0 0
1.4 1 4 39.61 0 1
1.6 1 6 55.64 0 2
2.2 2 2 14.38 0 0
2.4 2 4 21.92 0 1
2.6 2 6 37.81 0 2
3.2 3 2 33.51 1 0
3.4 3 4 34.30 1 1
3.6 3 6 50.18 1 2
4.2 4 2 28.39 1 0
4.4 4 4 42.16 1 1
4.6 4 6 44.72 1 2

Plotting the raw data along this new time variable.

Note that the time variable in this plot has NOT been converted to a factor. It is a continuous variable.

3 Introducing the Multilevel Model

We now frame up a series of models using the multilevel framework. Some good general resources … http://www.ats.ucla.edu/stat/r/seminars/Repeated_Measures/repeated_measures.htm
http://www.statpower.net/Content/GCM/Lectures/SW07.pdf
https://www.zoology.ubc.ca/~schluter/R/fit-model/

The presentation of all of these models here is an attempt to integrate traditions that are typically kept separate or pit against each other. In reality, they are just a few examples of the many, many possible models that exist. Each model is useful in specific situations.

The objective of all of our analyses is to ‘deconstruct’ the data into meaningful/interpretable pieces. Each ‘model’ does it in a different way, with different assumptions.

We can “judge” the models based on
1. how well they articulate and test our theory, and/or
2. how well they recover the data (evaluated as misfit to the mean vector and var-cov summaries).

Recall that the regression model may be compactly written as … \[ \mathbf{Y} = \mathbf{X}\mathbf{\beta} + \boldsymbol{e} \]

We make it into a multilevel regression model by adding an additional complication … separation into between-person and within-person components. Note: this is the very same distinction that is made in traditional presentations of ANOVA when examining between-person factors and within-person factors.

The general model becomes … (Please note and forgive that I do not have all the equations in place in a common form - am still working out a notation that works across all the models/traditions in a way I like)

\[ \boldsymbol{Y_{i}} = \boldsymbol{X_{i}}\boldsymbol{\beta} + \boldsymbol{Z_{i}}\boldsymbol{b_{i}} + \boldsymbol{e_{i}} \]

To fit all of the models to the data, we make use of the nlme package. It is also possible to use the lme4 package.

## Loading required package: lme4
## Loading required package: Matrix
## 
## Attaching package: 'lme4'
## The following object is masked from 'package:nlme':
## 
##     lmList

Lets start with an unconditional means model

Writing out the equation …

\[y_{it} = 1\beta_{0i} + e_{it}\] \[\beta_{0i} = 1\gamma_{00} + u_{0i}\] Note the explicit inclusion of the “silent” vector of 1s.

If we do the substitution of one equation into the other we get … \[y_{it} = 1\gamma_{00} + 1u_{0i} + e_{it}\]

where \[e_{it} \tilde{} N(0,\sigma^{2}_{e})\], and \[u_{i} \tilde{} N(0,\psi_{u0})\].

For clarity, lets write out the full variance covariance matrix of the within-person residuals (spanning across the T = 3 repeated measures), \[ \sigma^{2}_{e}\boldsymbol{I} = \sigma^{2}_{e} \left[\begin{array} {rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right] = \left[\begin{array} {rrr} \sigma^{2}_{e} & 0 & 0 \\ 0 & \sigma^{2}_{e} & 0 \\ 0 & 0 & \sigma^{2}_{e} \end{array}\right]\], This is the homoscedasticity of errors assumption.

And, the full variance covariance matrix of the between-person residuals, \[\left[\begin{array} {r} \psi_{u0} \end{array}\right] = u_{0i}u_{0i}' \]

and through some replacement (e.g., replacing \(u_{i}\) = \(b_{i}\), the different vectors of \(1\) with \(X_{i}\) and \(Z_{i}\) = 1$ we can get to … The more general notation often used in the statistics literature … \[ \boldsymbol{Y_{i}} = \boldsymbol{X_{i}}\boldsymbol{\beta} + \boldsymbol{Z_{i}}\boldsymbol{b_{i}} + \boldsymbol{e_{i}} \] the equation we started with, or closer to my usual multilevel notation … \[ \boldsymbol{Y_{i}} = \boldsymbol{X_{i}}\boldsymbol{\gamma} + \boldsymbol{Z_{i}}\boldsymbol{u_{i}} + \boldsymbol{e_{i}} \] Let’s code the model.

## Linear mixed-effects model fit by REML
##  Data: verblong 
##       AIC      BIC   logLik
##   4682.66 4695.906 -2338.33
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept) Residual
## StdDev:    4.406063   10.277
## 
## Fixed effects: verb ~ 1 
##                Value Std.Error  DF  t-value p-value
## (Intercept) 33.92433 0.5174359 408 65.56238       0
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.9626206 -0.6924771 -0.1481960  0.5511044  3.1026127 
## 
## Number of Observations: 612
## Number of Groups: 204
  1. What is the implied representation of the basic information? Lets re-create from the equations … (these should be collected into a matrix) Writing the equations in script takes a while … not all worked out yet. Sorry.

\[\left[\begin{array} {r} Y_{i0} \\ Y_{i1} \\ Y_{i2} \end{array}\right] = \left[\begin{array} {r} X_{0} \\ X_{1} \\ X_{2} \end{array}\right] \left[\begin{array} {r} \beta_{0} \end{array}\right] + \left[\begin{array} {r} Z_{0} \\ Z_{1} \\ Z_{2} \end{array}\right] \left[\begin{array} {r} u_{0i} \end{array}\right] + \left[\begin{array} {r} e_{i0} \\ e_{i1} \\ e_{i2} \end{array}\right]\]

In this unconditional means model the X and Z design matrices are vectors of 1s, so we get …

\[\left[\begin{array} {r} Y_{i0} \\ Y_{i1} \\ Y_{i2} \end{array}\right] = \left[\begin{array} {r} 1 \\ 1 \\ 1 \end{array}\right] \left[\begin{array} {r} \beta_{0} \end{array}\right] + \left[\begin{array} {r} 1 \\ 1 \\ 1 \end{array}\right] \left[\begin{array} {r} u_{0i} \end{array}\right] + \left[\begin{array} {r} e_{i0} \\ e_{i1} \\ e_{i2} \end{array}\right]\].

Let’s make the implied mean vector. We extract the fixed effects from the model using fixef(), specifically the contents of the \(\beta\) matrix.

## (Intercept) 
##    33.92433
##          [,1]
## [1,] 33.92433

Create the model design matrix for the fixed effects. In this model this is a matrix of order 3 x 1 …

##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1

Creating the model implied mean vector through multiplication \[ \mathbf{Y} = \mathbf{X}\mathbf{\beta} + 0 + 0\]

##          [,1]
## [1,] 33.92433
## [2,] 33.92433
## [3,] 33.92433

Note this is the overall (grand) mean.

Making the implied variance-covariance matrix.

## id = pdLogChol(1) 
##             Variance  StdDev   
## (Intercept)  19.41339  4.406063
## Residual    105.61668 10.276997

From this, we need to create the model implied variance-covariance.

The model for time \(t\) is … \[ \boldsymbol{Y_{i}} = \boldsymbol{X_{i}}\boldsymbol{\beta} + \boldsymbol{Z_{i}}\boldsymbol{b_{i}} + \boldsymbol{e_{i}} \] where \(\mathbf{Z_{t}}\) is the random effects regressor matrix (usually a subset of ); \(\_Beta\) contains the fixed effects; \(b_{i}\) contains the random effects that are \[ \sim \mathcal{N}(0,\mathbf{\Psi})\] and \(e_{i}\) are errors that are \[ \sim \mathcal{N}(0,\mathbf{\sigma^2}\mathbf{\Lambda_{i}})\]

The “standard assumption” is that \(\mathbf{\Lambda_{i}} = \mathbf{I}\) (homogeneity of errors)

So, in order to reconstruct the implied variance-covariances, we need to find \(\mathbf{\Psi}\), \(\mathbf{\sigma^2}\), and do some multiplication.

Lets formalize this some …

We subtract the “means” from both sides … \[\left[\begin{array} {r} Y_{i0} \\ Y_{i1} \\ Y_{i2} \end{array}\right] - \left[\begin{array} {r} X_{0} \\ X_{1} \\ X_{2} \end{array}\right] \left[\begin{array} {r} \beta_{0} \end{array}\right] = \left[\begin{array} {r} Z_{0} \\ Z_{1} \\ Z_{2} \end{array}\right] \left[\begin{array} {r} u_{0i} \end{array}\right] + \left[\begin{array} {r} e_{i0} \\ e_{i1} \\ e_{i2} \end{array}\right]\]

So on the left side we now have de-meaned scores … on the right we have a between-portion part and a within-person part … \[\left[\begin{array} {r} Y^*_{i0} \\ Y^*_{i1} \\ Y^*_{i2} \end{array}\right] = \left[\begin{array} {r} Z_{0} \\ Z_{1} \\ Z_{2} \end{array}\right] \left[\begin{array} {r} u_{0i} \end{array}\right] + \left[\begin{array} {r} e_{i0} \\ e_{i1} \\ e_{i2} \end{array}\right]\]

Lets calculate var-cov matrix …

\[\left[\begin{array} {r} Y^*_{i0} \\ Y^*_{i1} \\ Y^*_{i2} \end{array}\right] \left[\begin{array} {r} Y^*_{i0} & Y^*_{i1} & Y^*_{i2} \end{array}\right] = \left[\begin{array} {r} Z_{0} \\ Z_{1} \\ Z_{2} \end{array}\right] \left[\begin{array} {r} u_{0i} \end{array}\right] \left[\begin{array} {r} u_{0i} \end{array}\right] \left[\begin{array} {r} Z_{0} & Z_{1} & Z_{2} \end{array}\right] + \left[\begin{array} {r} e_{i0} \\ e_{i1} \\ e_{i2} \end{array}\right] \left[\begin{array} {r} e_{i0} & e_{i1} & e_{i2} \end{array}\right]\]

By defintions we get …

\[\left[\begin{array} {r} \hat\Sigma \end{array}\right] = \left[\begin{array} {r} Z_{0} \\ Z_{1} \\ Z_{2} \end{array}\right] \left[\begin{array} {r} \Psi \end{array}\right] \left[\begin{array} {r} Z_{0} & Z_{1} & Z_{2} \end{array}\right] + \left[\begin{array} {r} \sigma^2_{e} \end{array}\right] \left[\begin{array} {r} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right]\]

  1. Parse the between-person variances from the model output.
##          [,1]
## [1,] 19.41339
  1. Create the model design matrix, Z, for the random effects. In this model this is a matrix of order 3 x 1 …
##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1
  1. So, the implied varaince-covariance matrix of the between-person random effects for the three occasions is:
##          [,1]     [,2]     [,3]
## [1,] 19.41339 19.41339 19.41339
## [2,] 19.41339 19.41339 19.41339
## [3,] 19.41339 19.41339 19.41339

Which in correlation units implies …

##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    1    1
## [3,]    1    1    1

… pure between-person stability … exactly as intended!

  1. Lets now parse the residual/“error” variance-covariance structure
## [1] 105.6167
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
##          [,1]     [,2]     [,3]
## [1,] 105.6167   0.0000   0.0000
## [2,]   0.0000 105.6167   0.0000
## [3,]   0.0000   0.0000 105.6167

… homogeneity and uncorrelated errors … exactly as intended!

  1. Finally - we put the between- and within- pieces together … to calculate the implied variance-covariances
##           [,1]      [,2]      [,3]
## [1,] 125.03007  19.41339  19.41339
## [2,]  19.41339 125.03007  19.41339
## [3,]  19.41339  19.41339 125.03007

Notice the Compound Symmetry structure!

Together with the implied mean vector, we have the entire picture provided by ALL components of the model. Amazing!

Recall what the observed mean and var-cov were …

##    verb2    verb4    verb6 
## 25.41534 32.60775 43.74990
##          verb2    verb4     verb6
## verb2 37.28784 33.81957  47.40488
## verb4 33.81957 53.58070  62.25489
## verb6 47.40488 62.25489 113.74332

For fun, lets look at the misfit to the data (observed matrix - model implied matrix)

##           [,1]
## [1,] -8.508987
## [2,] -1.316585
## [3,]  9.825572
##           verb2     verb4     verb6
## verb2 -87.74223  14.40618  27.99149
## verb4  14.40618 -71.44937  42.84150
## verb6  27.99149  42.84150 -11.28675

Fit is Not so good … formally …

Lets make a picture of the implied model.

4 Repeated Measures ANOVA

OK - now lets move to an RM ANOVA by adding in effects for categorial time

Recall, we must tell R that “time0” is a categorical variable = factor()

##  Factor w/ 3 levels "0","1","2": 1 2 3 1 2 3 1 2 3 1 ...

Here is our RM ANOVA model with time as within-person factor

## Linear mixed-effects model fit by REML
##  Data: verblong 
##        AIC      BIC    logLik
##   4013.176 4035.235 -2001.588
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept) Residual
## StdDev:    6.915667 4.514145
## 
## Fixed effects: verb ~ 1 + time0 
##                 Value Std.Error  DF  t-value p-value
## (Intercept) 25.415343 0.5782155 406 43.95480       0
## time01       7.192402 0.4469670 406 16.09157       0
## time02      18.334559 0.4469670 406 41.01994       0
##  Correlation: 
##        (Intr) time01
## time01 -0.387       
## time02 -0.387  0.500
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.61263084 -0.54466998 -0.02451831  0.48627468  3.50536334 
## 
## Number of Observations: 612
## Number of Groups: 204

Remember that interpretation is with respect to time0, with the first category set as default intercept which is …

##  Factor w/ 3 levels "0","1","2": 1 2 3 1 2 3 1 2 3 1 ...

So what is the implied representation of the basic information (means and var-cov)?

Lets be explcit with our model … \[\left[\begin{array} {r} Y_{i0} \\ Y_{i1} \\ Y_{i2} \end{array}\right] = \left[\begin{array} {r} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{array}\right] \left[\begin{array} {r} \beta_{0} \\ \beta_{1} \\ \beta_{2} \end{array}\right] + \left[\begin{array} {r} 1 \\ 1 \\ 1 \end{array}\right] \left[\begin{array} {r} u_{0i} \end{array}\right] + \left[\begin{array} {r} e_{i0} \\ e_{i1} \\ e_{i2} \end{array}\right]\].

Making the implied mean vector.

## (Intercept)      time01      time02 
##   25.415343    7.192402   18.334559
##           [,1]
## [1,] 25.415343
## [2,]  7.192402
## [3,] 18.334559

Create the model design matrix for the fixed effects. In this model this is a matrix of order 3 x 3 …

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    1    1    0
## [3,]    1    0    1

Creating the model implied mean vector through multiplication \[ \mathbf{Y} = \mathbf{X}\mathbf{\beta} + 0 + 0\]

##          [,1]
## [1,] 25.41534
## [2,] 32.60775
## [3,] 43.74990

See the differences in the means across levels of time0.

Making the implied variance-covariance matrix.

## id = pdLogChol(1) 
##             Variance StdDev  
## (Intercept) 47.82645 6.915667
## Residual    20.37751 4.514145

From this, we need to create the model implied variance-covariance.

  1. Parse the between-person variances from the model output.
##          [,1]
## [1,] 47.82645
  1. Create the model design matrix for the random effects. In this model this is a matrix of order 3 x 1 …
##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1
  1. So, the implied varaince-covariance matrix of the between-person random effects for the three occasions is:
##          [,1]     [,2]     [,3]
## [1,] 47.82645 47.82645 47.82645
## [2,] 47.82645 47.82645 47.82645
## [3,] 47.82645 47.82645 47.82645
  1. parse the residual/“error” variance-covariance
## [1] 20.37751
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

So the residual within-person residual/“error” structure

##          [,1]     [,2]     [,3]
## [1,] 20.37751  0.00000  0.00000
## [2,]  0.00000 20.37751  0.00000
## [3,]  0.00000  0.00000 20.37751

As before - homogeneity and unccorrelated errors

      1. Finally, calculate the implied variance-covariances of total model
##          [,1]     [,2]     [,3]
## [1,] 68.20396 47.82645 47.82645
## [2,] 47.82645 68.20396 47.82645
## [3,] 47.82645 47.82645 68.20396

Notice the Compound Symmetry structure!

For fun, lets look at the misfit (real matrix - model implied)

##               [,1]
## [1,]  2.486900e-14
## [2,] -2.131628e-14
## [3,] -5.684342e-14
##             verb2     verb4      verb6
## verb2 -30.9161173 -14.00688 -0.4215677
## verb4 -14.0068803 -14.62326 14.4284384
## verb6  -0.4215677  14.42844 45.5393553

Means are perfect, var-cov not so good … particularly on variances

Lets make a picture of the implied model.

Notice that the implied lines are all parallel. This is a model of mean differences.

Can also see what the residuals look like.

Note, that the points are not connected - this is to highlight the model and the implication that the within-person residuals are from the occasion-specific mean (NOT from a trajectory). Perhaps the implied trajectory should also NOT have connecting lines.
Note also the heteroskedasticity of the residuals.

5 Repeated Measures MANOVA

OK - now lets adjust the RM ANOVA error structure to get a RM MANOVA.
The “multivariate” part of the MANOVA, has to do with relaxing the assumption on the error structure - more flexible than compound symmetry)

We are going to do this in two steps. Recall, we must tell R that “time0” is a categorical variable = factor()

##  Factor w/ 3 levels "0","1","2": 1 2 3 1 2 3 1 2 3 1 ...

Here is our RM ANOVA model with time as within-person factor The error stucture is Compound symmetry with Heterogeneous variances

## Linear mixed-effects model fit by REML
##  Data: verblong 
##        AIC      BIC    logLik
##   3930.253 3961.136 -1958.127
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept) Residual
## StdDev:    6.086682 2.871709
## 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | time0 
##  Parameter estimates:
##        0        1        2 
## 1.000000 1.254186 2.341456 
## Fixed effects: verb ~ 1 + time0 
##                 Value Std.Error  DF  t-value p-value
## (Intercept) 25.415343 0.4712021 406 53.93724       0
## time01       7.192402 0.3225105 406 22.30130       0
## time02      18.334559 0.5119102 406 35.81597       0
##  Correlation: 
##        (Intr) time01
## time01 -0.266       
## time02 -0.168  0.245
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.24995336 -0.55415412 -0.05694858  0.51324627  2.99087738 
## 
## Number of Observations: 612
## Number of Groups: 204

So what is the implied representation of the basic information Making the implied mean vector.

## (Intercept)      time01      time02 
##   25.415343    7.192402   18.334559
##           [,1]
## [1,] 25.415343
## [2,]  7.192402
## [3,] 18.334559

Create the model design matrix, X,for the fixed effects. In this model this is a matrix of order 3 x 3 …(remember there are 3 fixed effects in this model)

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    1    1    0
## [3,]    1    0    1

Creating the model implied mean vector through multiplication \[ \mathbf{Y} = \mathbf{X}\mathbf{\beta} + 0 + 0\]

##          [,1]
## [1,] 25.41534
## [2,] 32.60775
## [3,] 43.74990

See the differences in the means across levels of time0. This is exactly the same as in the last model

## id = pdLogChol(1) 
##             Variance  StdDev  
## (Intercept) 37.047702 6.086682
## Residual     8.246713 2.871709
##         [,1]
## [1,] 37.0477
##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1

So the implied varaince covariance of the between-person random effects for the three repeated measures is:

##         [,1]    [,2]    [,3]
## [1,] 37.0477 37.0477 37.0477
## [2,] 37.0477 37.0477 37.0477
## [3,] 37.0477 37.0477 37.0477

Now for the within-person residual var-cov.

## [1] 2.871709

We have antoher step now with the heterogeneous variances. Get the heterogeneous weights for the residaul SD from the summary(timecathet_fit) above

Pulling from the output.

##          [,1]     [,2]     [,3]
## [1,] 8.246713  0.00000  0.00000
## [2,] 0.000000 12.97193  0.00000
## [3,] 0.000000  0.00000 45.21191

Finally calculating the model implied between- + within var-cov structure

##          [,1]     [,2]     [,3]
## [1,] 45.29441 37.04770 37.04770
## [2,] 37.04770 50.01964 37.04770
## [3,] 37.04770 37.04770 82.25961

Note the heterogeneity along the diagonal

For fun, lets look at the misfit (real matrix - model implied)

##              [,1]
## [1,] 4.973799e-14
## [2,] 1.065814e-13
## [3,] 8.526513e-14
##           verb2     verb4    verb6
## verb2 -8.006572 -3.228132 10.35718
## verb4 -3.228132  3.561067 25.20719
## verb6 10.357180 25.207186 31.48370

Getting better … formally

call Model df AIC BIC logLik Test L.Ratio p-value
timecat_fit lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 | id, na.action = na.exclude) 1 5 4013.176 4035.235 -2001.588 NA NA
timecathet_fit lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 | id, weights = varIdent(form = ~1 | time0), na.action = na.exclude) 2 7 3930.253 3961.136 -1958.127 1 vs 2 86.92304 0

The heterogeneity was an improvement

OK - now lets adjust the RM MANOVA error structure all the way = Fully UNSTRUCTURED

Fully Unstructured residual error structure

## Linear mixed-effects model fit by REML
##  Data: verblong 
##       AIC      BIC   logLik
##   3869.06 3913.178 -1924.53
## 
## Random effects:
##  Formula: ~1 | id
##         (Intercept) Residual
## StdDev:    5.506836 2.638686
## 
## Correlation Structure: General
##  Formula: ~1 | id 
##  Parameter estimate(s):
##  Correlation: 
##   1     2    
## 2 0.275      
## 3 0.709 0.725
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | time0 
##  Parameter estimates:
##        0        1        2 
## 1.000000 1.827573 3.461330 
## Fixed effects: verb ~ 1 + time0 
##                 Value Std.Error  DF  t-value p-value
## (Intercept) 25.415343 0.4275323 406 59.44660       0
## time01       7.192402 0.3374451 406 21.31428       0
## time02      18.334559 0.5249722 406 34.92482       0
##  Correlation: 
##        (Intr) time01
## time01 -0.118       
## time02  0.221  0.507
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.24169075 -0.62480430 -0.06272641  0.63543916  3.06613520 
## 
## Number of Observations: 612
## Number of Groups: 204

So what is the implied representation of the basic information Making the implied mean vector.

## (Intercept)      time01      time02 
##   25.415343    7.192402   18.334559
##           [,1]
## [1,] 25.415343
## [2,]  7.192402
## [3,] 18.334559

Create the model design matrix for the fixed effects. In this model this is a matrix of order 3 x 3 …

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    1    1    0
## [3,]    1    0    1

Creating the model implied mean vector through multiplication \[ \mathbf{Y} = \mathbf{X}\mathbf{\beta} + 0 + 0\]

##          [,1]
## [1,] 25.41534
## [2,] 32.60775
## [3,] 43.74990

See the differences in the means across levels of time0. This is exactly the same as in the last model

Now for the varaince-covariances. First the between- part.

## id = pdLogChol(1) 
##             Variance  StdDev  
## (Intercept) 30.325248 5.506836
## Residual     6.962664 2.638686
##          [,1]
## [1,] 30.32525
##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1

So the implied varaince covariance of the between-person random effects for the three repeated measures is:

##          [,1]     [,2]     [,3]
## [1,] 30.32525 30.32525 30.32525
## [2,] 30.32525 30.32525 30.32525
## [3,] 30.32525 30.32525 30.32525

Same as before.

Now it gets a bit more complicated … because we have a fully unstructured matrix for the within-person residual variance-covaraince.

##           [,1]      [,2]     [,3]
## [1,]  6.962664  3.499314 17.08695
## [2,]  3.499314 23.255458 31.93237
## [3,] 17.086955 31.932371 83.41832

Note that it is fully “UNSTRUCTRED”

Finally, Calculate the implied between- + within- variance-covariances

##          [,1]     [,2]      [,3]
## [1,] 37.28791 33.82456  47.41220
## [2,] 33.82456 53.58071  62.25762
## [3,] 47.41220 62.25762 113.74357

For fun, lets look at the misfit (real matrix - model implied)

##              [,1]
## [1,] 1.598721e-13
## [2,] 7.105427e-14
## [3,] 1.563194e-13
##               verb2         verb4         verb6
## verb2 -6.906778e-05 -4.991762e-03 -0.0073203738
## verb4 -4.991762e-03 -2.480221e-06 -0.0027309696
## verb6 -7.320374e-03 -2.730970e-03 -0.0002526546

THAT’S PERFECT … we have modeled the data perfectly!

formal improvement

call Model df AIC BIC logLik Test L.Ratio p-value
timecathet_fit lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 | id, weights = varIdent(form = ~1 | time0), na.action = na.exclude) 1 7 3930.253 3961.136 -1958.127 NA NA
timecatunst_fit lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 | id, correlation = corSymm(form = ~1 | id), weights = varIdent(form = ~1 | time0), na.action = na.exclude) 2 10 3869.060 3913.178 -1924.530 1 vs 2 67.19366 0

The unstructured bit was useful. Note that we have simply reprepresented the original data in a different way.

6 Growth Model

Now lets go to the growth model … where we add some restrictions back in.

First we have to convert “time0” variable back to continuous

##  num [1:612] 0 1 2 0 1 2 0 1 2 0 ...

Now lets fit the model. Remember this is different because time0 is continuous. And our design matrices are also different.

## Linear mixed-effects model fit by REML
##  Data: verblong 
##        AIC      BIC    logLik
##   3919.721 3946.202 -1953.861
## 
## Random effects:
##  Formula: ~1 + time0 | id
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev   Corr  
## (Intercept) 4.588971 (Intr)
## time0       2.542526 0.944 
## Residual    3.896441       
## 
## Fixed effects: verb ~ 1 + time0 
##                 Value Std.Error  DF  t-value p-value
## (Intercept) 24.757051 0.4065067 407 60.90195       0
## time0        9.167279 0.2624878 407 34.92460       0
##  Correlation: 
##       (Intr)
## time0 0.157 
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.77592608 -0.52613946 -0.03287538  0.53033062  2.97854634 
## 
## Number of Observations: 612
## Number of Groups: 204

So what is the implied representation of the basic information

Lets be explcit with our model … \[\left[\begin{array} {r} Y_{i0} \\ Y_{i1} \\ Y_{i2} \end{array}\right] = \left[\begin{array} {r} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{array}\right] \left[\begin{array} {r} \beta_{0} \\ \beta_{1} \end{array}\right] + \left[\begin{array} {r} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{array}\right] \left[\begin{array} {r} u_{0i} \\ u_{1i} \end{array}\right] + \left[\begin{array} {r} e_{i0} \\ e_{i1} \\ e_{i2} \end{array}\right]\].

Making the implied mean vector.

## (Intercept)       time0 
##   24.757051    9.167279
##           [,1]
## [1,] 24.757051
## [2,]  9.167279

This is different than before.
Create the model design matrix for the fixed effects. In this model this is a matrix of order 3 x 2 …
there are now 2 fixed effects (vs 3 ficed effects in prior models).

##      [,1] [,2]
## [1,]    1    0
## [2,]    1    1
## [3,]    1    2

Creating the model implied mean vector through multiplication \[ \mathbf{Y} = \mathbf{X}\mathbf{\beta} + 0 + 0\]

##          [,1]
## [1,] 24.75705
## [2,] 33.92433
## [3,] 43.09161

See the differences in the means across specific choices of time0 in the design matrix. This is different than the previous models.

Now again some more complication in getting out the implied between-person covariance matrix.

## id = pdLogChol(1 + time0) 
##             Variance StdDev   Corr  
## (Intercept) 21.05865 4.588971 (Intr)
## time0        6.46444 2.542526 0.944 
## Residual    15.18225 3.896441
##          [,1]     [,2]
## [1,] 4.588971 0.000000
## [2,] 0.000000 2.542526
##       [,1]  [,2]
## [1,] 1.000 0.944
## [2,] 0.944 1.000
##          [,1]      [,2]
## [1,] 21.05865 11.014194
## [2,] 11.01419  6.464438

Create the model design matrix. Note there are two random effects now, so this is a 3 x 2 …

##      [,1] [,2]
## [1,]    1    0
## [2,]    1    1
## [3,]    1    2

So the implied varaince covariance of the between-person random effects for the three repeated measures is:

##          [,1]     [,2]     [,3]
## [1,] 21.05865 32.07285 43.08704
## [2,] 32.07285 49.55148 67.03011
## [3,] 43.08704 67.03011 90.97318

Different than before. Note the increasing variance and covariance.

Now for the within-person error variance-covariance.

## [1] 3.896441
##          [,1]     [,2]     [,3]
## [1,] 3.896441 0.000000 0.000000
## [2,] 0.000000 3.896441 0.000000
## [3,] 0.000000 0.000000 3.896441
##          [,1]     [,2]     [,3]
## [1,] 15.18225  0.00000  0.00000
## [2,]  0.00000 15.18225  0.00000
## [3,]  0.00000  0.00000 15.18225

Note the homgeneity + uncorrelated of the residual errors.

Finally Calculate the implied variance-covariances

##          [,1]     [,2]      [,3]
## [1,] 36.24091 32.07285  43.08704
## [2,] 32.07285 64.73373  67.03011
## [3,] 43.08704 67.03011 106.15544

For fun, lets look at the misfit (real matrix - model implied)

##            [,1]
## [1,]  0.6582925
## [2,] -1.3165850
## [3,]  0.6582925

Just a bit off. Linear is not quite perfect.

##          verb2      verb4     verb6
## verb2 1.046935   1.746721  4.317840
## verb4 1.746721 -11.153030 -4.775224
## verb6 4.317840  -4.775224  7.587879

Pretty good.

Formal comparison of the models.

call Model df AIC BIC logLik Test L.Ratio p-value
timecathet_fit lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 | id, weights = varIdent(form = ~1 | time0), na.action = na.exclude) 1 7 3930.253 3961.136 -1958.127 NA NA
lingrowth_fit lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 + time0 | id, na.action = na.exclude) 2 6 3919.721 3946.202 -1953.861 1 vs 2 8.532182 0.0034892

Linear Growth is better than Compound Symmetry with Heterogeneous Variances by AIC & BIC!

call Model df AIC BIC logLik Test L.Ratio p-value
timecatunst_fit lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 | id, correlation = corSymm(form = ~1 | id), weights = varIdent(form = ~1 | time0), na.action = na.exclude) 1 10 3869.060 3913.178 -1924.530 NA NA
lingrowth_fit lme.formula(fixed = verb ~ 1 + time0, data = verblong, random = ~1 + time0 | id, na.action = na.exclude) 2 6 3919.721 3946.202 -1953.861 1 vs 2 58.66148 0

Linear growth is NOT as good as RM MANOVA with fully UNSTRUCTURED residual structure. However, the growth model may, in this case, be closer to a specific theory of change! And also provides something different than “test of mean differences”. It provides a test of a specific kind of “within-person” change.

Maybe a couple of plot to see the intraindividual change implications …

Now that was satisfying!
=:-]

7 Conclusion

Yay for MATRIX ALGEBRA! Yay for models just being a bunch of components that reproduce means and var-covs!