Overview

In this tutorial we walk through the very basics of testing measurement invariance in the context if a longitudinal factor model. This is not a comprehensive coverage, just something to get started. This is a one-off done as part of a guest lecture.

Outline

In this session we cover …

A. Introduction to Factorial Invariance
B. Configural Invariance Model C. Weak Invariance Model D. Strong Invariance Model E. Comparisons

Prelim - Loading libraries used in this script.

library(psych)
library(ggplot2)
library(corrplot) #plotting correlation matrices
library(lavaan)  #for fitting structural equation models
library(semPlot)  #for automatically making diagrams 

A. Introduction to the Common Factor Model

Recall from last time that the basic factor analysis model can be written as a matrix equation …

\[ \boldsymbol{Y_{i}} = \boldsymbol{\Lambda}\boldsymbol{F_{i}} + \boldsymbol{U_{i}} \]

where \(\boldsymbol{Y_{i}}\) is a \(p\) x 1 vector of observed variable scores, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{F_{i}}\) is a \(q\) x 1 vector of common factor scores, and \(\boldsymbol{U_{i}}\) is a p x 1 vector of unique factor scores.

We can rewrite the model in terms of variance-covariance and mean expectations. For example, the expected covariance matrix, \(\boldsymbol{\Sigma} = \boldsymbol{Y}'\boldsymbol{Y}\), becomes

\[ \boldsymbol{\Sigma} = \boldsymbol{\Lambda}\boldsymbol{\Psi}\boldsymbol{\Lambda}' + \boldsymbol{\Theta} \] where \(\boldsymbol{\Sigma}\) is a p x p covariance (or correlation) matrix of the observed variables, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{\Psi}\) is a q x q covariance matrix of the latent factor variables, and \(\boldsymbol{\Theta}\) is a diagonal matrix of unique factor variances.

and the expected p x 1 mean vector, $ $ becomes \[ \boldsymbol{\mu} = \boldsymbol{\tau} + \boldsymbol{\Lambda}\boldsymbol{\alpha} \]

where \(\boldsymbol{\tau}\) is a p x 1 vector of manifest variable means, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, and \(\boldsymbol{\alpha}\) is a q x 1 vector of latent variable means.

We can then extend the model to multipe-group or multiple-occasion settings with a group or occasion-specific subscript, so that (e.g., for the two-occasion case)

\[ \boldsymbol{\Sigma_g} = \boldsymbol{\Lambda_g}\boldsymbol{\Psi_g}\boldsymbol{\Lambda_g}' + \boldsymbol{\Theta_g} \] and \[ \boldsymbol{\mu_g} = \boldsymbol{\tau_g} + \boldsymbol{\Lambda_g}\boldsymbol{\alpha_g} \]

Different levels of measurement invariance are established by testing (or requiring) that various matrices are equal.

Specifically, …
For Configural Invaraince we establish that the structure of \(\boldsymbol{\Lambda_g}\) is equivalent across occasions.

For Weak Invariance we additionally establish that the factor loadings are equivalent across occasions, \(\boldsymbol{\Lambda_g} = \boldsymbol{\Lambda}\).

For Strong Invariance we additionally establish that the manifest means are equivalent across occasions, \(\boldsymbol{\tau_g} = \boldsymbol{\tau}\). and

For Strict Invariance we additionally establish that the residaul variances are also equivalent across occasions \(\boldsymbol{\Theta_g} = \boldsymbol{\Theta}\).

Measurement Invariance testsing is usually conduced within a Structural Equation Modeling (SEM) framework. Here we illustrate how this may be done using the lavaan package. Lots of good information and instruction (including about invariance testing - in a multiple-group setting) can be found on the package website … http://lavaan.ugent.be.

Prelim - Reading in Repeated Measures Data

For this example, we use classic data from Osborne & Sudick (1972) that has been the basis for many examples in Jack McArdle and colleagues’ papers, courses, and workshops.
The data were obtained from 204 participants (rows) who completed a vareity of cognitive ability tests that are part of the WISC.

#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/WISC_MIexample.csv"
#read in the .csv file using the url() function
dat <- read.csv(file=url(filepath),header=TRUE)

Prelim - Descriptives

Lets have a quick look at the data file and the descriptives.

#data structure
head(dat,10)
##    id  info1  comp1  simi1  voca1  info6  comp6  simi6  voca6
## 1   1 31.287 25.627 22.932 22.215 69.883 44.424 68.045 51.162
## 2   2 13.801 14.787  7.581 15.373 41.871 44.862 33.897 37.741
## 3   3 34.970 34.675 28.052 26.841 60.424 50.260 35.844 55.477
## 4   4 24.795 31.391  8.208 20.197 52.865 42.669 45.802 35.987
## 5   5 25.263 30.263 15.977 35.417 67.368 86.654 72.368 60.417
## 6   6 15.402 23.399 11.453 20.560 46.437 52.956 22.537 47.716
## 7   7 15.380 -1.253  2.318 13.004 53.977 36.341 39.912 35.373
## 8   8 19.883  6.704 14.160 14.868 26.901 33.020 21.679 22.763
## 9   9 12.632 13.847 10.276 23.224 54.737 40.163 36.591 54.803
## 10 10 23.690 25.446 11.416 21.786 45.119 52.232 53.508 53.929
#descriptives (means, sds)
describe(dat[,-1])
##       vars   n  mean    sd median trimmed   mad   min   max range  skew
## info1    1 204 19.78  6.12  19.13   19.77  5.53  1.04 34.97 33.93 -0.06
## comp1    2 204 21.80  9.74  21.90   21.98 10.32 -1.25 46.49 47.74 -0.13
## simi1    3 204 14.90  7.56  14.54   14.57  7.83 -4.52 37.76 42.28  0.39
## voca1    4 204 20.40  6.29  19.84   20.11  6.52  2.77 39.66 36.89  0.37
## info6    5 204 48.51 12.79  46.34   47.80 13.27 24.76 80.00 55.24  0.47
## comp6    6 204 45.17 12.97  44.53   45.22 12.32  0.19 86.65 86.47 -0.01
## simi6    7 204 41.30 14.52  39.44   40.68 14.42 14.29 77.96 63.67  0.39
## voca6    8 204 44.45 11.05  43.80   44.43 12.48 18.17 69.87 51.70  0.01
##       kurtosis   se
## info1     0.17 0.43
## comp1    -0.49 0.68
## simi1    -0.14 0.53
## voca1     0.02 0.44
## info6    -0.65 0.90
## comp6     0.65 0.91
## simi6    -0.38 1.02
## voca6    -0.63 0.77
#correlation matrix
round(cor(dat[,-1]),2)
##       info1 comp1 simi1 voca1 info6 comp6 simi6 voca6
## info1  1.00  0.51  0.54  0.56  0.47  0.36  0.44  0.49
## comp1  0.51  1.00  0.45  0.58  0.40  0.36  0.40  0.46
## simi1  0.54  0.45  1.00  0.44  0.35  0.29  0.33  0.38
## voca1  0.56  0.58  0.44  1.00  0.55  0.44  0.54  0.60
## info6  0.47  0.40  0.35  0.55  1.00  0.62  0.67  0.75
## comp6  0.36  0.36  0.29  0.44  0.62  1.00  0.53  0.70
## simi6  0.44  0.40  0.33  0.54  0.67  0.53  1.00  0.66
## voca6  0.49  0.46  0.38  0.60  0.75  0.70  0.66  1.00
#visusal correlation matrix
corrplot(cor(dat[,-1]), order = "original", tl.col='black', tl.cex=.75) 

B. Configural Invariance Model

Configural invariance model

Model Specification

configural <- '
# Define the latent factors.
verb_1 =~ NA*info1 + lambda1*info1 + voca1 + comp1 + simi1
verb_6 =~ NA*info6 + lambda1*info6 + voca6 + comp6 + simi6

# Intercepts
info1 ~ i1*1
voca1 ~ 1
comp1 ~ 1
simi1 ~ 1
info6 ~ i1*1
voca6 ~ 1
comp6 ~ 1
simi6 ~ 1

# Unique Variances
info1 ~~ info1
voca1 ~~ voca1
comp1 ~~ comp1
simi1 ~~ simi1
info6 ~~ info6
voca6 ~~ voca6
comp6 ~~ comp6
simi6 ~~ simi6

# Latent Variable Means
verb_1 ~ 0*1
verb_6 ~ 1

# Latent Variable Variances and Covariance
verb_1 ~~ 1*verb_1
verb_6 ~~ verb_6
verb_1 ~~ verb_6
'

Model Estimation and Interpretation

#Model fitting
fit_configural <- cfa(configural, data = dat, mimic = "mplus")
summary(fit_configural, fit.measures = TRUE)
## lavaan (0.5-23.1097) converged normally after  99 iterations
## 
##   Number of observations                           204
## 
##   Number of missing patterns                         1
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               25.968
##   Degrees of freedom                                19
##   P-value (Chi-square)                           0.131
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              847.740
##   Degrees of freedom                                28
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.991
##   Tucker-Lewis Index (TLI)                       0.987
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -5601.115
##   Loglikelihood unrestricted model (H1)      -5588.131
## 
##   Number of free parameters                         25
##   Akaike (AIC)                               11252.229
##   Bayesian (BIC)                             11335.182
##   Sample-size adjusted Bayesian (BIC)        11255.975
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.042
##   90 Percent Confidence Interval          0.000  0.079
##   P-value RMSEA <= 0.05                          0.588
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.031
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   verb_1 =~                                           
##     info1   (lmb1)    4.451    0.400   11.137    0.000
##     voca1             5.039    0.396   12.728    0.000
##     comp1             6.850    0.637   10.745    0.000
##     simi1             4.590    0.520    8.821    0.000
##   verb_6 =~                                           
##     info6   (lmb1)    4.451    0.400   11.137    0.000
##     voca6             4.102    0.453    9.057    0.000
##     comp6             4.006    0.489    8.194    0.000
##     simi6             4.551    0.545    8.346    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   verb_1 ~~                                           
##     verb_6            1.837    0.215    8.558    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .info1     (i1)   19.776    0.427   46.273    0.000
##    .voca1            20.396    0.439   46.416    0.000
##    .comp1            21.797    0.680   32.036    0.000
##    .simi1            14.903    0.528   28.223    0.000
##    .info6     (i1)   19.776    0.427   46.273    0.000
##    .voca6            17.970    1.844    9.747    0.000
##    .comp6            19.317    2.299    8.404    0.000
##    .simi6            11.922    2.516    4.738    0.000
##     verb_1            0.000                           
##     verb_6            6.455    0.606   10.649    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .info1            17.448    2.240    7.789    0.000
##    .voca1            13.999    2.133    6.563    0.000
##    .comp1            47.511    5.754    8.257    0.000
##    .simi1            35.810    4.031    8.884    0.000
##    .info6            47.096    6.432    7.322    0.000
##    .voca6            23.267    4.182    5.564    0.000
##    .comp6            73.850    8.388    8.805    0.000
##    .simi6            88.920   10.354    8.588    0.000
##     verb_1            1.000                           
##     verb_6            5.834    1.167    4.997    0.000
#Model Diagram 
semPaths(fit_configural, what="est", 
         sizeLat = 7, sizeMan = 7, edge.label.cex = .75)
## Warning in qgraph(Edgelist, labels = nLab, bidirectional = Bidir, directed
## = Directed, : The following arguments are not documented and likely not
## arguments of qgraph and thus ignored: loopRotation; residuals; residScale;
## residEdge; CircleEdgeEnd