1 Overview

A few years ago, I decided that it might be useful to introduce the models for repeated measures data (e.g., RM ANOVA, RM MANOVA, Growth models) along a “continuum”. The idea is that this presentation, wherein the models are all linked together, simplifies the whole world - and makes the similarities and differences among these models easier to identify and understand. I do not have a totally streamlined implementation of my vision for presenting all of this yet, but am giving it a shot. In Part 1, we work through “traditional” RM ANOVA set-ups and briefly touch on the multilevel implementation. In Part 2, we work in the multilevel framework, starting with RM ANOVA, through RM MANOVA, and into Growth Models.

1.1 Outline for Part 1

1. Data Preparation and Description
2. Analysis of Variance (ANOVA)
3. Repeated Measures Analysis of Variance (RM ANOVA)
4. RM ANOVA in the Multilevel Framework

library(psych)
library(ggplot2)
library(ez)
library(car)
library(nlme) #for mixed effects models

2 Data Preparation and Description

For our examples, we use 3-occasion WISC data that are equally spaced.

############################
############################
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/wisc3raw.csv"
#read in the .csv file using the url() function
wisc3raw <- read.csv(file=url(filepath),header=TRUE)

Subsetting to the variables of interest. 3-occasion equally spaced repeated measures + a person-level “grouping” variable (+ an id variable). Specifically, we include the id, verb2, verb4, verb6, and grad variables.

#Creating a list of variables of interest
#sub-setting the columns of interest
wiscsub <- wisc3raw[ ,varnames]

#Looking at the sample-level descriptives for the chosen variables
describe(wiscsub)
vars n mean sd median trimmed mad min max range skew kurtosis se
id 1 204 102.5000000 59.0338886 102.500 102.5000000 75.612600 1.00 204.00 203.00 0.0000000 -1.2176609 4.1331989
verb2 2 204 25.4153431 6.1063772 25.980 25.4026220 6.567918 5.95 39.85 33.90 -0.0639307 -0.3445494 0.4275319
verb4 3 204 32.6077451 7.3198841 32.820 32.4240244 7.183197 12.60 52.84 40.24 0.2317998 -0.0776524 0.5124944
verb6 4 204 43.7499020 10.6650511 42.545 43.4560976 11.297412 17.35 72.59 55.24 0.2356459 -0.3605210 0.7467029
grad 5 204 0.2254902 0.4189328 0.000 0.1585366 0.000000 0.00 1.00 1.00 1.3040954 -0.3007374 0.0293312

Multilevel modeling analyses typically require a long data set. So, we also reshape from wide to long in order to have a long data set.

#reshaping wide to long
verblong <- reshape(data = wiscsub,
varying = c("verb2","verb4","verb6"),
timevar = "grade", idvar = "id",
direction = "long", sep = "")

#reordering for convenience
verblong <- verblong[order(verblong$id, verblong$grade), c("id", "grade", "verb", "grad")]

#looking at dataset
head(verblong,12)
1.2 1 2 26.98 0
1.4 1 4 39.61 0
1.6 1 6 55.64 0
2.2 2 2 14.38 0
2.4 2 4 21.92 0
2.6 2 6 37.81 0
3.2 3 2 33.51 1
3.4 3 4 34.30 1
3.6 3 6 50.18 1
4.2 4 2 28.39 1
4.4 4 4 42.16 1
4.6 4 6 44.72 1

For clarity, lets consider the basic information representation of the 3-occasion repeated measures data. In particular, data (even non-repeated measures data) are summarized (at the sample-level) as …
Means Vector,
Variances-Covariance Matrix

#mean vector (from wide data)
meanvector <- sapply(wiscsub[ , c("verb2", "verb4", "verb6")], mean, na.rm=TRUE)
meanvector
##    verb2    verb4    verb6
## 25.41534 32.60775 43.74990
#variance-covariance matrix (from wide data)
varcovmatrix <- cov(wiscsub[ , c("verb2", "verb4", "verb6")], use = "pairwise.complete.obs")
varcovmatrix
##          verb2    verb4     verb6
## verb2 37.28784 33.81957  47.40488
## verb4 33.81957 53.58070  62.25489
## verb6 47.40488 62.25489 113.74332

Making visual counterparts can also be extremely useful - especially for faciliating higher-level conversations in a research group.
Basic Sample-level descriptions in visual form.

#boxplot by grade (from long data)
#note that the time variable has been converted to a factor = categorical
ggplot(data = verblong, aes(x = factor(grade), y = verb)) +
geom_boxplot(notch = TRUE) +
stat_summary(fun.y = "mean", geom = "point", shape = 23, size = 3, fill = "white") +
labs(x = "Grade", y = "Verbal Ability")

#correlations across grade
#in the psych library (from wide data)
pairs.panels(wiscsub[, c("verb2", "verb4", "verb6")])

#require(GGally)
#ggpairs(wiscsub[ ,c("verb2","verb4","verb6")])

Always be careful about the scaling of the x- and y-axes in these plots. Side note: What would these correlations be if we had sampled different children at each grade (i.e., in a full between-person design)?

The objective of all of our analyses is to ‘deconstruct’ this information into meaningful/interpretable pieces. Each ‘model’ does it in a different way, with different assumptions.

Key practical differences among models, and within the R analysis framework, have to do with how variables are conceived and treated. For example, the time variable can be considered as having been measured on either a categorical scale or a continuous scale. Different conceptions of time suggest use of different models, and different interpretations and inferences.

We can check how specific variables are ‘coded’ (‘structured’) within R. Note that you control R. It is possible, and often necessary, to change the structure of an object. You can tell R to do what you want!

Let’s check how R is holding our variables - the ‘structure’ of the objects.

str(verblong)
## 'data.frame':    612 obs. of  4 variables:
##  $id : int 1 1 1 2 2 2 3 3 3 4 ... ##$ grade: num  2 4 6 2 4 6 2 4 6 2 ...
##  $verb : num 27 39.6 55.6 14.4 21.9 ... ##$ grad : int  0 0 0 0 0 0 1 1 1 1 ...

And let’s look at some specific variables.
The “id” variable …

str(verblong$id) ## int [1:612] 1 1 1 2 2 2 3 3 3 4 ... We see that “verblong$id” is a vector of integers, and has mean …

mean(verblong$id) ## [1] 102.5 But, we actually treat “id” as a factor, a categorical variable with N = 204 levels. Let’s make this explicit. verblong$id <- factor(verblong$id) str(verblong$id)
##  Factor w/ 204 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3 4 ...

Now, R will treat “id” as a categorical variable. This has consequences. For example, it is no longer appropriate to calculate an arithmetic mean.

mean(verblong$id) ## Warning in mean.default(verblong$id): argument is not numeric or logical:
## returning NA
## [1] NA

str(verblong$grade) ## num [1:612] 2 4 6 2 4 6 2 4 6 2 ... Numeric. Interesting … (and good to know!). Now lets get into the models. We start with a traditional ANOVA framework … 3 Analysis of Variance (ANOVA) In the ANOVA modeling world, we are interested in mean differences across levels of one or more factors. Research questions are phrased as, for example, … Does the mean verbal ability differ between children with mothers who did or did not graduate from high school? Does the mean verbal ability differ between children across grade levels? For review, lets do a one-way ANOVA with a between-person factor. This is not our real situation, so we make a FAKE example Preliminaries: In the traditional ANOVA world we generally treat time as a categorical variable (think experimental design, with random assigment to groups - the full between-person design we considered earlier). We must tell R that “grade” is a categorical variable = factor() str(verblong$grade)
##  num [1:612] 2 4 6 2 4 6 2 4 6 2 ...
verblong$grade <- factor(verblong$grade)

str(verblong$grade) ## Factor w/ 3 levels "2","4","6": 1 2 3 1 2 3 1 2 3 1 ... We must also tell R that “id” is a categorical variable = factor() verblong$id <- factor(verblong$id) Let’s attach the data for convenience. (Be careful doing this! For good programming practice, whenever you make an attach(), ALSO make a detach() and carry it with you) attach(verblong) Let’s first pretend that these are not repeated measures and do a One-Way ANOVA model_foneway <- aov(verb ~ grade, data = verblong) summary(model_foneway) ## Df Sum Sq Mean Sq F value Pr(>F) ## grade 2 34818 17409 255.3 <2e-16 *** ## Residuals 609 41536 68 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Check that the degrees of freedom make sense! In our case we have 612 complete records, which we are pretending are from 612 different children. Signficance of the overall test pushes us to do follow-up tests … TukeyHSD(model_foneway, "grade") ## Tukey multiple comparisons of means ## 95% family-wise confidence level ## ## Fit: aov(formula = verb ~ grade, data = verblong) ## ##$grade
##          diff       lwr       upr p adj
## 4-2  7.192402  5.271206  9.113598     0
## 6-2 18.334559 16.413362 20.255755     0
## 6-4 11.142157  9.220960 13.063353     0

Or as a set of pairwise t.tests, with adjustment of p-values …

with(verblong, pairwise.t.test(verb, grade,
p.adjust.method = "holm", paired = TRUE))
##
##  Pairwise comparisons using paired t tests
##
##
##   2      4
## 4 <2e-16 -
## 6 <2e-16 <2e-16
##
## P value adjustment method: holm

Okay - that was for review about one-way ANOVA (with a between-person factor) …

4 Repeated Measures Analysis of Variance (RM ANOVA)

Really, we should better accommodate the repeated measures design. The “grade” variable is a within-person factor.

Let’s run the one-way RM ANOVA

model_aov1 <- aov(verb ~ grade + Error(id), data = verblong)
summary(model_aov1)
##
## Error: id
##            Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 203  33263   163.9
##
## Error: Within
##            Df Sum Sq Mean Sq F value Pr(>F)
## grade       2  34818   17409   854.3 <2e-16 ***
## Residuals 406   8273      20
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Error: id
#            Df Sum Sq Mean Sq F value Pr(>F)
# Residuals 203  33263   163.9
#
# Error: Within
#            Df Sum Sq Mean Sq F value Pr(>F)
# grade       2  34818   17409   854.3 <2e-16 ***
# Residuals 406   8273      20
# ---

ANOVA models are typically reported along with the means. Get a table of all means (e.g., grand mean, level means, cell means)

model.tables(model_aov1, type = "means", se = TRUE)
## Warning in model.tables.aovlist(model_aov1, type = "means", se = TRUE): SEs
## for type 'means' are not yet implemented
## Tables of means
## Grand mean
##
## 33.92433
##
##     2     4     6
## 25.42 32.61 43.75
model.tables(model_aov1, type = "effects", se = TRUE)
## Tables of effects
##
##      2      4      6
## -8.509 -1.317  9.826
##
## Standard errors of effects
##         0.3161
## replic.    204

Note that formally, here, the standard error being reported is for the mean difference effect, not for the means themselves.

After doing the RMANOVA, the significance of the “grade”" effect suggests that we should locate where the difference is …

How should we do the contrasts between levels of the within-person factor?
BASED ON THEORY!!
(if we are running all possible contrasts, we are in an exploratory world)
Studies are always designed with a purpose (i.e., based on some research questions), so there is usually some theory available. Hopefully, going back to the research question, there were some hypotheses about how mean levels of verbal ability would differ across levels.

But, … we may also take a more data-driven approach. This puts us in an “exploratory” analysis world. Let’s see …
Well, we know that these are ordered, equal-interval repeated measures …
We also know that mathematical functions for change are often based on polynomials (linear, quadratic, cubic, …) …

Let’s go back and tell R that “grade” is an ordered factor. We need this, or the polynomial contrasts won’t work.

R examines the values and uses them to determine the ordering of the levels. The values for “grade” are 2 4 6, so R gets them in the correct order. However, if the values had been words, R would have ordered them alphabetically - BE CAREFUL!

(to reorder use the levels option of factor(), see e.g., http://stackoverflow.com/questions/18413756/re-ordering-factor-levels-in-data-frame)

verblong$grade <- factor(verblong$grade, ordered = TRUE)
str(verblong$grade) ## Ord.factor w/ 3 levels "2"<"4"<"6": 1 2 3 1 2 3 1 2 3 1 ... It seems that it is a good idea to set the default contrasts as below, these are orthogonal contrasts and will give a better sphericity test. options(contrasts = c("contr.helmert", "contr.poly")) Now let’s run the RM Anova model again … RM Anova with 3 equidistant occasions model_aov2 <- aov(verb ~ grade + Error(id), data = verblong, contrasts = c("contr.helmert", "contr.poly")) summary(model_aov2) ## ## Error: id ## Df Sum Sq Mean Sq F value Pr(>F) ## Residuals 203 33263 163.9 ## ## Error: Within ## Df Sum Sq Mean Sq F value Pr(>F) ## grade 2 34818 17409 854.3 <2e-16 *** ## Residuals 406 8273 20 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 This is all the same as above. But the ordering of grade facilitates construction of polynomial contrasts. To get the polynomial contrasts, we use the summary function and the split= option to “split out” the degrees of the polynomial. Note: in the code, the words “Linear” and “Quadratic” are just convenient labels summary(model_aov2, split = list(grade = list("Linear"=1, "Quadratic"=2))) ## ## Error: id ## Df Sum Sq Mean Sq F value Pr(>F) ## Residuals 203 33263 163.9 ## ## Error: Within ## Df Sum Sq Mean Sq F value Pr(>F) ## grade 2 34818 17409 854.33 < 2e-16 *** ## grade: Linear 1 34288 34288 1682.64 < 2e-16 *** ## grade: Quadratic 1 530 530 26.03 5.17e-07 *** ## Residuals 406 8273 20 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Side note: For the contrasts, the df SS and MS values agree with SAS, but the F values do not. SAS decomposes the Residual SS to get a separate error term for each degree of freedom. R uses the single Residual SS and Residual MS in each F test. Not sure whether the SAS or R approach is “best” What did the analysis do? What contrasts did it use? … In the background, R assigned the polynomial contrasts of degree 3 to the grade factor (which is an Ord.factor w/ 3 levels “2”<“4”<“6”) contrasts(grade) <- contr.poly(3) contrasts(grade) ## .L .Q ## 2 -7.071068e-01 0.4082483 ## 4 -7.850462e-17 -0.8164966 ## 6 7.071068e-01 0.4082483 # .L .Q # 2 -7.071068e-01 0.4082483 # 4 -7.850462e-17 -0.8164966 # 6 7.071068e-01 0.4082483 These are the orthonormal polynomial contrasts. I assume that contrast codes are covered better in other courses. When I need them for specific applications, I look them up. Note: In making these contrasts, R does not know the true “distances” between the levels - it only knows the levels are ordered. We are making an interpretation that the 2, 4, 6 indicate equivalent distances. In all cases we must consider what assumptions are embedded in our interpretation of time variables - and try to make them explicit. The Sphericity Assumption RMANOVA models make a strong assumption about the variance-covariance structure (we will get to the specifics further down). We need to test for Sphericity = Mauchly’s criterion (Yes, someone built an R function that does the test, and the traditional corrections - in the ez package. #from the ez pacakge ezANOVA(data = verblong, dv = .(verb), within = .(grade), wid = .(id), type = 3, detailed = TRUE) ##$ANOVA
##        Effect DFn DFd       SSn       SSd         F             p p<.05
## 1 (Intercept)   1 203 704326.42 33262.940 4298.4253 1.423624e-138     *
## 2       grade   2 406  34818.34  8273.268  854.3326 3.211724e-146     *
##         ges
## 1 0.9443112
## 2 0.4560087
##
## $Mauchly's Test for Sphericity ## Effect W p p<.05 ## 2 grade 0.7790063 1.110806e-11 * ## ##$Sphericity Corrections
##   Effect      GGe        p[GG] p[GG]<.05       HFe         p[HF] p[HF]<.05
## 2  grade 0.819005 2.49192e-120         * 0.8247685 3.732699e-121         *

Annotated output below …

# $ANOVA # Effect DFn DFd SSn SSd F p p<.05 ges = generalized eta square, effect size measure # 1 (Intercept) 1 203 704326.42 33262.940 4298.4253 1.423624e-138 * 0.9443112 # 2 grade 2 406 34818.34 8273.268 854.3326 3.211724e-146 * 0.4560087 ### The above test assumes sphericity #$Mauchly's Test for Sphericity
#   Effect         W            p p<.05
# 2  grade 0.7790063 1.110806e-11     *
### This indicates that sphericity was violated
#
#   $Sphericity Corrections # Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05 # 2 grade 0.819005 2.49192e-120 * 0.8247685 3.732699e-121 * ### Correction for sphericity ### GGe = Greenhouse-Geisser epsilon (in this case still significant) ### HFe = Huynh-Feldt epsilon (in this case still significant) ### generally if epsilon > .8 then violation not too severe This next block of code goes into some RM ANOVA procedures that I am now skipping. #Lets do this same set of contrasts a different way; #Starting with the wide data #Make some new variables - #Trend scores, calcualted using orthonormal effect codes (for 3 equally spaced occasions) #This is basically transforming our data wiscsub$verbave <- ( 1*wiscsub$verb2 + 1*wiscsub$verb4 + 1*wiscsub$verb6)/sqrt(3) #flat line wiscsub$verblin <- (-1*wiscsub$verb2 + 0*wiscsub$verb4 + 1*wiscsub$verb6)/sqrt(2) #straight line wiscsub$verbqua <- ( 1*wiscsub$verb2 - 2*wiscsub$verb4 + 1*wiscsub$verb6)/sqrt(6) #curved line describe(wiscsub[ ,c("verbave","verblin","verbqua")]) # vars n mean sd median trimmed mad min max range skew kurtosis se # verbave 1 204 58.76 12.80 57.88 58.52 14.58 20.73 91.80 71.07 0.10 -0.44 0.90 # verblin 2 204 12.96 5.30 12.71 12.76 4.85 1.61 28.14 26.54 0.38 -0.06 0.37 # verbqua 3 204 1.61 3.56 1.76 1.59 3.15 -6.57 11.79 18.36 0.14 -0.08 0.25 #Setting up the models #Using the car package to get Type III sum of squares # For average model_ave <- lm(verbave ~ 1, data = wiscsub) summary(model_ave) #this gives us regression results Anova(model_ave, type="III") #this gives us anova results # For linear trend scores model_lin <- lm(verblin ~ 1, data = wiscsub) summary(model_lin) #this gives us regression results Anova(model_lin, type="III") #this gives us anova results # For quadratic trend scores model_qua <- lm(verbqua ~ 1, data = wiscsub) summary(model_qua) #this gives us regression results Anova(model_qua, type="III") #this gives us anova results #Notes on above - the Sum of Squares is the same as above analysis #The F-tests are different - becasue the residual is treated differently (separate) as noted above There is also a shorter way using Anova() from car package … Prelim: check what is form of the time variable - needs to be an ordered factor. str(verblong$grade)
##  Ord.factor w/ 3 levels "2"<"4"<"6": 1 2 3 1 2 3 1 2 3 1 ...

Anova() analysis:

#from car package
model_quick <- lm(verb ~ C(grade, contr.poly), data = verblong)
summary(model_quick)  #but note that these are t-tests and not F-tests
##
## Call:
## lm(formula = verb ~ C(grade, contr.poly), data = verblong)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -26.3999  -5.8524  -0.1076   4.9842  28.8401
##
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)
## (Intercept)             33.9243     0.3338 101.621  < 2e-16 ***
## C(grade, contr.poly).L  12.9645     0.5782  22.422  < 2e-16 ***
## C(grade, contr.poly).Q   1.6125     0.5782   2.789  0.00546 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.259 on 609 degrees of freedom
## Multiple R-squared:  0.456,  Adjusted R-squared:  0.4542
## F-statistic: 255.3 on 2 and 609 DF,  p-value: < 2.2e-16
Anova(model_quick, type="III") #This lumps them all together into 1 test
Sum Sq Df F value Pr(>F)
(Intercept) 704326.42 1 10326.7683 0
C(grade, contr.poly) 34818.34 2 255.2516 0
Residuals 41536.21 609 NA NA

Ok - that is all good - although does not give us any information about (or formal tests of) the sphericity assumption.

5 RM ANOVA in the Multilevel Framework

Let’s move towards a “modern” multilevel framework. I write “modern”" here because all of my colleagues who are doing RMANOVAs these days, mostly with data from laboratory experiments, are placing them within the multilevel framework. Some good resources … http://www.ats.ucla.edu/stat/r/seminars/Repeated_Measures/repeated_measures.htm http://www.statpower.net/Content/GCM/Lectures/SW07.pdf https://www.zoology.ubc.ca/~schluter/R/fit-model/

Here, we make use of the nlme (linear and nonlinear mixed effects) package. We run the typical model with the gls() function.

We set the model up with the usual Compound Symmetry error structure

Preliminary: First, lets remove the ordedness of grade. Note however, that R will place the intercept at the first value.

verblong$grade <- factor(verblong$grade, ordered = FALSE)
str(verblong\$grade)
##  Factor w/ 3 levels "2","4","6": 1 2 3 1 2 3 1 2 3 1 ...

Ok - and now run the RMANOVA model.

gls_fit1 <- gls(verb ~ grade,
correlation = corCompSymm(form = ~1|id),
data = verblong)

summary(gls_fit1)
## Generalized least squares fit by REML
##   Data: verblong
##       AIC      BIC   logLik
##   4016.76 4038.819 -2003.38
##
## Correlation Structure: Compound symmetry
##  Formula: ~1 | id
##  Parameter estimate(s):
##       Rho
## 0.7012269
##
## Coefficients:
##                Value Std.Error  t-value p-value
## (Intercept) 33.92433 0.5174359 65.56238       0
## grade1       3.59620 0.2234835 16.09157       0
## grade2       4.91279 0.1290283 38.07527       0
##
##  Correlation:
##
## Standardized residuals:
##         Min          Q1         Med          Q3         Max
## -3.19666804 -0.70864605 -0.01303162  0.60352054  3.49214250
##
## Residual standard error: 8.258568
## Degrees of freedom: 612 total; 609 residual
anova(gls_fit1)
numDF F-value p-value
(Intercept) 1 4298.4253 0

Let’s look at the model implied means vector.

#Make a fake dataset

#evaluate the model at those values
predict(gls_fit1, newdata = evaldata)
## [1] 25.41534 32.60775 43.74990
## attr(,"label")
## [1] "Predicted values"

Let’s look at the model implied variance-covariance matrix (note this is a difficult to find function)

getVarCov(gls_fit1)
## Marginal variance covariance matrix
##        [,1]   [,2]   [,3]
## [1,] 68.204 47.826 47.826
## [2,] 47.826 68.204 47.826
## [3,] 47.826 47.826 68.204
##   Standard Deviations: 8.2586 8.2586 8.2586

The “spherical” assumption - Compound Symmetry structure embedded in this model is evident.

How do these compare with the observed means and variance-covariances.

#mean vector of observed data
meanvector
##    verb2    verb4    verb6
## 25.41534 32.60775 43.74990
#variance-covariance matrix of observed data
varcovmatrix
##          verb2    verb4     verb6
## verb2 37.28784 33.81957  47.40488
## verb4 33.81957 53.58070  62.25489
## verb6 47.40488 62.25489 113.74332

We see that the means vector is recovered precisely, but that the variance-covariances are not.

With this basic backround/review in place, we move into Part 2: A multilevel approach from the ground up - embedded with some Matrix Algebra.

Don’t forget the detach()!

detach(verblong)