This tutorial will demonstrate multilevel and structural equation modeling approaches to linear growth models with time invariant covariates.
This tutorial provides line-by-line code for a linear model with time invariant covariates using the following R packages: 1. Multilevel modeling with nlme
and lmer
2. SEM modeling with lavaan
Load required libraries and read in the data. We will use data in long format for the multilevel approaches and data in wide format for SEM approaches.
We will use data in the long format for the MLM programs and data in the wide format for the SEM model.
library(nlme)
library(lme4)
## Loading required package: Matrix
##
## Attaching package: 'lme4'
## The following object is masked from 'package:nlme':
##
## lmList
library(lavaan)
## This is lavaan 0.5-20
## lavaan is BETA software! Please report any bugs.
#set filepath
filepath<-"https://quantdev.ssri.psu.edu/sites/qdev/files/nlsy_math_long.csv"
#read in the .csv file using the url() function
nlsy_math_long <- read.csv(file=url(filepath),header=TRUE)
head(nlsy_math_long)
## id female lb_wght anti_k1 math grade occ age men spring anti
## 1 201 1 0 0 38 3 2 111 0 1 0
## 2 201 1 0 0 55 5 3 135 1 1 0
## 3 303 1 0 1 26 2 2 121 0 1 2
## 4 303 1 0 1 33 5 3 145 0 1 2
## 5 2702 0 0 0 56 2 2 100 NA 1 0
## 6 2702 0 0 0 58 4 3 125 NA 1 2
#set filepath
filepath<-"https://quantdev.ssri.psu.edu/sites/qdev/files/nlsy_math_wide.csv"
#read in the .csv file using the url() function
nlsy_math_wide <- read.csv(file=url(filepath),header=TRUE)
head(nlsy_math_wide)
## id female lb_wght anti_k1 math2 math3 math4 math5 math6 math7 math8
## 1 201 1 0 0 NA 38 NA 55 NA NA NA
## 2 303 1 0 1 26 NA NA 33 NA NA NA
## 3 2702 0 0 0 56 NA 58 NA NA NA 80
## 4 4303 1 0 0 NA 41 58 NA NA NA NA
## 5 5002 0 0 4 NA NA 46 NA 54 NA 66
## 6 5005 1 0 0 35 NA 50 NA 60 NA 59
## age2 age3 age4 age5 age6 age7 age8 men2 men3 men4 men5 men6 men7 men8
## 1 NA 111 NA 135 NA NA NA NA 0 NA 1 NA NA NA
## 2 121 NA NA 145 NA NA NA 0 NA NA 0 NA NA NA
## 3 100 NA 125 NA NA NA 173 NA NA NA NA NA NA NA
## 4 NA 115 135 NA NA NA NA NA 0 0 NA NA NA NA
## 5 NA NA 117 NA 145 NA 167 NA NA NA NA NA NA NA
## 6 93 NA 115 NA 139 NA 163 0 NA 0 NA 0 NA 1
## spring2 spring3 spring4 spring5 spring6 spring7 spring8 anti2 anti3
## 1 NA 1 NA 1 NA NA NA NA 0
## 2 1 NA NA 1 NA NA NA 2 NA
## 3 1 NA 1 NA NA NA 1 0 NA
## 4 NA 0 1 NA NA NA NA NA 1
## 5 NA NA 1 NA 0 NA 1 NA NA
## 6 0 NA 1 NA 1 NA 1 2 NA
## anti4 anti5 anti6 anti7 anti8
## 1 NA 0 NA NA NA
## 2 NA 2 NA NA NA
## 3 2 NA NA NA 2
## 4 2 NA NA NA NA
## 5 4 NA 1 NA 5
## 6 2 NA 1 NA 3
The level-1 equation is: \[math_{ti}=b_{1i}+b_{2i}~(grade_{ti}-2)+u_{ti}\] This is a linear growth model with the intercept centered at second grade (when measurement commenced).
The level-2 equations are: \[b_{1i}=\beta_{01}+\beta_{11}LowBirthWght_i+\beta_{21}AntiK1_i+d_{1i}\] \[b_{2i}=\beta_{02}+\beta_{12}LowBirthWght_i+\beta_{22}AntiK1_i+d_{2i}\] These allow low birth weight and antisocial behaviors from kindergarten to first grade to have fixed effects on the slope and intercept.
All together, the multilevel model becomes: \[math_{ti}=(\beta_{01}+\beta_{11}LowBirthWght_i+\beta_{21}AntiK1_i+d_{1i})+\\(\beta_{02}+\beta_{12}LowBirthWght_i+\beta_{22}AntiK1_i+d_{2i})*(grade_{ti}-2)+u_{ti}\]
nlme
(in the nlme
library)lg.math.nlme <- nlme(math ~ (beta_01 + beta_11*lb_wght + beta_21*anti_k1 + d_1i) +
(beta_02 + beta_12*lb_wght + beta_22*anti_k1 +d_2i)*(grade-2),
data=nlsy_math_long,
fixed=beta_01+beta_11+beta_21+beta_02+beta_12+beta_22~1,
random=d_1i+d_2i~1,
group=~id,
start=c(beta_01=30, beta_11=0, beta_21=0, beta_02=4, beta_12=0, beta_22=0),
na.action=na.exclude)
summary(lg.math.nlme)
## Nonlinear mixed-effects model fit by maximum likelihood
## Model: math ~ (beta_01 + beta_11 * lb_wght + beta_21 * anti_k1 + d_1i) + (beta_02 + beta_12 * lb_wght + beta_22 * anti_k1 + d_2i) * (grade - 2)
## Data: nlsy_math_long
## AIC BIC logLik
## 15943.14 16000.2 -7961.57
##
## Random effects:
## Formula: list(d_1i ~ 1, d_2i ~ 1)
## Level: id
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## d_1i 7.9413032 d_1i
## d_2i 0.8445021 -0.012
## Residual 6.0213598
##
## Fixed effects: beta_01 + beta_11 + beta_21 + beta_02 + beta_12 + beta_22 ~ 1
## Value Std.Error DF t-value p-value
## beta_01 36.28987 0.4969763 1284 73.02132 0.0000
## beta_11 -2.71621 1.2953400 1284 -2.09691 0.0362
## beta_21 -0.55088 0.2327789 1284 -2.36655 0.0181
## beta_02 4.31516 0.1207519 1284 35.73574 0.0000
## beta_12 0.62464 0.3335737 1284 1.87256 0.0614
## beta_22 -0.01929 0.0589360 1284 -0.32731 0.7435
## Correlation:
## bet_01 bet_11 bet_21 bet_02 bet_12
## beta_11 -0.194
## beta_21 -0.671 -0.025
## beta_02 -0.529 0.096 0.358
## beta_12 0.091 -0.532 0.026 -0.168
## beta_22 0.343 0.026 -0.529 -0.660 -0.055
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.080147132 -0.525141518 -0.008659959 0.530785841 2.534566917
##
## Number of Observations: 2221
## Number of Groups: 932
lmer
(in the lme4
library)nlsy_math_long$grade_c2 <- nlsy_math_long$grade-2
lg.math.lmer <- lmer(math ~ 1 + grade_c2 + lb_wght + anti_k1 + I(grade_c2*lb_wght) + I(grade_c2*anti_k1) + (1 + grade_c2 | id),
data = nlsy_math_long,
REML = FALSE,
na.action = na.exclude)
summary(lg.math.lmer)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula:
## math ~ 1 + grade_c2 + lb_wght + anti_k1 + I(grade_c2 * lb_wght) +
## I(grade_c2 * anti_k1) + (1 + grade_c2 | id)
## Data: nlsy_math_long
##
## AIC BIC logLik deviance df.resid
## 15943.1 16000.2 -7961.6 15923.1 2211
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.08015 -0.52514 -0.00866 0.53079 2.53457
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 63.0644 7.9413
## grade_c2 0.7132 0.8445 -0.01
## Residual 36.2568 6.0214
## Number of obs: 2221, groups: id, 932
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 36.28987 0.49630 73.12
## grade_c2 4.31516 0.12059 35.78
## lb_wght -2.71621 1.29359 -2.10
## anti_k1 -0.55088 0.23246 -2.37
## I(grade_c2 * lb_wght) 0.62464 0.33312 1.88
## I(grade_c2 * anti_k1) -0.01929 0.05886 -0.33
##
## Correlation of Fixed Effects:
## (Intr) grd_c2 lb_wgh ant_k1 I(_2*l
## grade_c2 -0.529
## lb_wght -0.194 0.096
## anti_k1 -0.671 0.358 -0.025
## I(grd_2*l_) 0.091 -0.168 -0.532 0.026
## I(gr_2*a_1) 0.343 -0.660 0.026 -0.529 -0.055
lavaan
The path model for a linear growth model with time invariant covariates is given in Figure 5.2 (p.103).
Specify the lavaan model for linear growth with time invariant covariates
lavaan.model<-'#latent variable definitions
#intercept
eta1 =~ 1*math2+
1*math3+
1*math4+
1*math5+
1*math6+
1*math7+
1*math8
#linear slope
eta2 =~ 0*math2+
1*math3+
2*math4+
3*math5+
4*math6+
5*math7+
6*math8
#factor variances
eta1 ~~ eta1
eta2 ~~ eta2
#factor covariance
eta1 ~~ eta2
#manifest variances (set equal by naming theta)
math2 ~~ theta*math2
math3 ~~ theta*math3
math4 ~~ theta*math4
math5 ~~ theta*math5
math6 ~~ theta*math6
math7 ~~ theta*math7
math8 ~~ theta*math8
#latent means (freely estimated)
eta1 ~ 1
eta2 ~ 1
#manifest means (fixed to zero)
math2 ~ 0*1
math3 ~ 0*1
math4 ~ 0*1
math5 ~ 0*1
math6 ~ 0*1
math7 ~ 0*1
math8 ~ 0*1
#Time invariant covaraite
#regression of time-invariant covariate on intercept and slope factors
eta1~lb_wght+anti_k1
eta2~lb_wght+anti_k1
#variance of TIV covariates
lb_wght ~~ lb_wght
anti_k1 ~~ anti_k1
#covariance of TIV covaraites
lb_wght ~~ anti_k1
#means of TIV covariates (freely estimated)
lb_wght ~ 1
anti_k1 ~ 1'
lavaan.fit<-sem(lavaan.model,
data = nlsy_math_wide,
meanstructure = TRUE,
estimator = "ML",
missing = "fiml",
fixed.x = FALSE)
## Warning in lav_data_full(data = data, group = group, group.label =
## group.label, : lavaan WARNING: some observed variances are (at least) a
## factor 1000 times larger than others; use varTable(fit) to investigate
## Warning in lav_data_full(data = data, group = group, group.label =
## group.label, : lavaan WARNING: due to missing values, some pairwise
## combinations have less than 10% coverage
summary(lavaan.fit, fit.measures=TRUE)
## lavaan (0.5-20) converged normally after 118 iterations
##
## Number of observations 933
##
## Number of missing patterns 61
##
## Estimator ML
## Minimum Function Test Statistic 220.221
## Degrees of freedom 39
## P-value (Chi-square) 0.000
##
## Model test baseline model:
##
## Minimum Function Test Statistic 892.616
## Degrees of freedom 36
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.788
## Tucker-Lewis Index (TLI) 0.805
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -9785.085
## Loglikelihood unrestricted model (H1) -9674.975
##
## Number of free parameters 15
## Akaike (AIC) 19600.171
## Bayesian (BIC) 19672.747
## Sample-size adjusted Bayesian (BIC) 19625.108
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.071
## 90 Percent Confidence Interval 0.062 0.080
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.097
##
## Parameter Estimates:
##
## Information Observed
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err Z-value P(>|z|)
## eta1 =~
## math2 1.000
## math3 1.000
## math4 1.000
## math5 1.000
## math6 1.000
## math7 1.000
## math8 1.000
## eta2 =~
## math2 0.000
## math3 1.000
## math4 2.000
## math5 3.000
## math6 4.000
## math7 5.000
## math8 6.000
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## eta1 ~
## lb_wght -2.716 1.294 -2.099 0.036
## anti_k1 -0.551 0.232 -2.369 0.018
## eta2 ~
## lb_wght 0.625 0.333 1.873 0.061
## anti_k1 -0.019 0.059 -0.327 0.743
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## eta1 ~~
## eta2 -0.078 1.145 -0.068 0.945
## lb_wght ~~
## anti_k1 0.007 0.014 0.548 0.584
##
## Intercepts:
## Estimate Std.Err Z-value P(>|z|)
## eta1 36.290 0.497 73.052 0.000
## eta2 4.315 0.122 35.420 0.000
## math2 0.000
## math3 0.000
## math4 0.000
## math5 0.000
## math6 0.000
## math7 0.000
## math8 0.000
## lb_wght 0.080 0.009 9.031 0.000
## anti_k1 1.454 0.050 29.216 0.000
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## eta1 63.064 5.609 11.242 0.000
## eta2 0.713 0.326 2.185 0.029
## math2 (thet) 36.257 1.868 19.406 0.000
## math3 (thet) 36.257 1.868 19.406 0.000
## math4 (thet) 36.257 1.868 19.406 0.000
## math5 (thet) 36.257 1.868 19.406 0.000
## math6 (thet) 36.257 1.868 19.406 0.000
## math7 (thet) 36.257 1.868 19.406 0.000
## math8 (thet) 36.257 1.868 19.406 0.000
## lb_wght 0.074 0.003 21.599 0.000
## anti_k1 2.312 0.107 21.599 0.000
To determine whether the addition of low birth weight and kindergarten antisocial behaviors is useful for model fit, we can fit the linear growth model with regression effects set to zero
lavaan.model1<-'#latent variable definitions
#intercept
eta1 =~ 1*math2+
1*math3+
1*math4+
1*math5+
1*math6+
1*math7+
1*math8
#linear slope
eta2 =~ 0*math2+
1*math3+
2*math4+
3*math5+
4*math6+
5*math7+
6*math8
#factor variances
eta1 ~~ eta1
eta2 ~~ eta2
#factor covariance
eta1 ~~ eta2
#manifest variances (set equal by naming theta)
math2 ~~ theta*math2
math3 ~~ theta*math3
math4 ~~ theta*math4
math5 ~~ theta*math5
math6 ~~ theta*math6
math7 ~~ theta*math7
math8 ~~ theta*math8
#latent means (freely estimated)
eta1 ~ 1
eta2 ~ 1
#manifest means (fixed to zero)
math2 ~ 0*1
math3 ~ 0*1
math4 ~ 0*1
math5 ~ 0*1
math6 ~ 0*1
math7 ~ 0*1
math8 ~ 0*1
#Time invariant covaraite
#regression of time-invariant covariate on intercept and slope factors
eta1~0*lb_wght+0*anti_k1
eta2~0*lb_wght+0*anti_k1
#variance of TIV covariates
lb_wght ~~ lb_wght
anti_k1 ~~ anti_k1
#covariance of TIV covaraites
lb_wght ~~ anti_k1
#means of TIV covariates (freely estimated)
lb_wght ~ 1
anti_k1 ~ 1'
lavaan.fit1<-sem(lavaan.model1,
data = nlsy_math_wide,
meanstructure = TRUE,
estimator = "ML",
missing = "fiml",
fixed.x = FALSE)
## Warning in lav_data_full(data = data, group = group, group.label =
## group.label, : lavaan WARNING: some observed variances are (at least) a
## factor 1000 times larger than others; use varTable(fit) to investigate
## Warning in lav_data_full(data = data, group = group, group.label =
## group.label, : lavaan WARNING: due to missing values, some pairwise
## combinations have less than 10% coverage
summary(lavaan.fit1, fit.measures=TRUE)
## lavaan (0.5-20) converged normally after 92 iterations
##
## Number of observations 933
##
## Number of missing patterns 61
##
## Estimator ML
## Minimum Function Test Statistic 234.467
## Degrees of freedom 43
## P-value (Chi-square) 0.000
##
## Model test baseline model:
##
## Minimum Function Test Statistic 892.616
## Degrees of freedom 36
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.776
## Tucker-Lewis Index (TLI) 0.813
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -9792.208
## Loglikelihood unrestricted model (H1) -9674.975
##
## Number of free parameters 11
## Akaike (AIC) 19606.416
## Bayesian (BIC) 19659.638
## Sample-size adjusted Bayesian (BIC) 19624.703
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.069
## 90 Percent Confidence Interval 0.061 0.078
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.103
##
## Parameter Estimates:
##
## Information Observed
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err Z-value P(>|z|)
## eta1 =~
## math2 1.000
## math3 1.000
## math4 1.000
## math5 1.000
## math6 1.000
## math7 1.000
## math8 1.000
## eta2 =~
## math2 0.000
## math3 1.000
## math4 2.000
## math5 3.000
## math6 4.000
## math7 5.000
## math8 6.000
##
## Regressions:
## Estimate Std.Err Z-value P(>|z|)
## eta1 ~
## lb_wght 0.000
## anti_k1 0.000
## eta2 ~
## lb_wght 0.000
## anti_k1 0.000
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## eta1 ~~
## eta2 -0.181 1.150 -0.158 0.875
## lb_wght ~~
## anti_k1 0.007 0.014 0.548 0.584
##
## Intercepts:
## Estimate Std.Err Z-value P(>|z|)
## eta1 35.267 0.355 99.229 0.000
## eta2 4.339 0.088 49.136 0.000
## math2 0.000
## math3 0.000
## math4 0.000
## math5 0.000
## math6 0.000
## math7 0.000
## math8 0.000
## lb_wght 0.080 0.009 9.031 0.000
## anti_k1 1.454 0.050 29.216 0.000
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## eta1 64.562 5.659 11.408 0.000
## eta2 0.733 0.327 2.238 0.025
## math2 (thet) 36.230 1.867 19.410 0.000
## math3 (thet) 36.230 1.867 19.410 0.000
## math4 (thet) 36.230 1.867 19.410 0.000
## math5 (thet) 36.230 1.867 19.410 0.000
## math6 (thet) 36.230 1.867 19.410 0.000
## math7 (thet) 36.230 1.867 19.410 0.000
## math8 (thet) 36.230 1.867 19.410 0.000
## lb_wght 0.074 0.003 21.599 0.000
## anti_k1 2.312 0.107 21.599 0.000
The chi-square difference test (obtained using ANOVA) indicates that the time invariant predictors are useful to the model. For a more in-depth discussion of model fit and comparison, see pages 110-111 in Growth Modeling.
anova(lavaan.fit,lavaan.fit1)
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## lavaan.fit 39 19600 19673 220.22
## lavaan.fit1 43 19606 19660 234.47 14.245 4 0.006552 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1