Overview

This tutorial will demonstrate multilevel and structural equation modeling approaches to linear growth models with time invariant covariates.

Outline

This tutorial provides line-by-line code for a linear model with time invariant covariates using the following R packages: 1. Multilevel modeling with nlme and lmer 2. SEM modeling with lavaan

Preliminaries: Libraries and Data

Load required libraries and read in the data. We will use data in long format for the multilevel approaches and data in wide format for SEM approaches.

We will use data in the long format for the MLM programs and data in the wide format for the SEM model.

library(nlme)
library(lme4)
## Loading required package: Matrix
## 
## Attaching package: 'lme4'
## The following object is masked from 'package:nlme':
## 
##     lmList
library(lavaan)
## This is lavaan 0.5-20
## lavaan is BETA software! Please report any bugs.
#set filepath
filepath<-"https://quantdev.ssri.psu.edu/sites/qdev/files/nlsy_math_long.csv"
#read in the .csv file using the url() function
nlsy_math_long <- read.csv(file=url(filepath),header=TRUE)
head(nlsy_math_long)
##     id female lb_wght anti_k1 math grade occ age men spring anti
## 1  201      1       0       0   38     3   2 111   0      1    0
## 2  201      1       0       0   55     5   3 135   1      1    0
## 3  303      1       0       1   26     2   2 121   0      1    2
## 4  303      1       0       1   33     5   3 145   0      1    2
## 5 2702      0       0       0   56     2   2 100  NA      1    0
## 6 2702      0       0       0   58     4   3 125  NA      1    2
#set filepath
filepath<-"https://quantdev.ssri.psu.edu/sites/qdev/files/nlsy_math_wide.csv"
#read in the .csv file using the url() function
nlsy_math_wide <- read.csv(file=url(filepath),header=TRUE)
head(nlsy_math_wide)
##     id female lb_wght anti_k1 math2 math3 math4 math5 math6 math7 math8
## 1  201      1       0       0    NA    38    NA    55    NA    NA    NA
## 2  303      1       0       1    26    NA    NA    33    NA    NA    NA
## 3 2702      0       0       0    56    NA    58    NA    NA    NA    80
## 4 4303      1       0       0    NA    41    58    NA    NA    NA    NA
## 5 5002      0       0       4    NA    NA    46    NA    54    NA    66
## 6 5005      1       0       0    35    NA    50    NA    60    NA    59
##   age2 age3 age4 age5 age6 age7 age8 men2 men3 men4 men5 men6 men7 men8
## 1   NA  111   NA  135   NA   NA   NA   NA    0   NA    1   NA   NA   NA
## 2  121   NA   NA  145   NA   NA   NA    0   NA   NA    0   NA   NA   NA
## 3  100   NA  125   NA   NA   NA  173   NA   NA   NA   NA   NA   NA   NA
## 4   NA  115  135   NA   NA   NA   NA   NA    0    0   NA   NA   NA   NA
## 5   NA   NA  117   NA  145   NA  167   NA   NA   NA   NA   NA   NA   NA
## 6   93   NA  115   NA  139   NA  163    0   NA    0   NA    0   NA    1
##   spring2 spring3 spring4 spring5 spring6 spring7 spring8 anti2 anti3
## 1      NA       1      NA       1      NA      NA      NA    NA     0
## 2       1      NA      NA       1      NA      NA      NA     2    NA
## 3       1      NA       1      NA      NA      NA       1     0    NA
## 4      NA       0       1      NA      NA      NA      NA    NA     1
## 5      NA      NA       1      NA       0      NA       1    NA    NA
## 6       0      NA       1      NA       1      NA       1     2    NA
##   anti4 anti5 anti6 anti7 anti8
## 1    NA     0    NA    NA    NA
## 2    NA     2    NA    NA    NA
## 3     2    NA    NA    NA     2
## 4     2    NA    NA    NA    NA
## 5     4    NA     1    NA     5
## 6     2    NA     1    NA     3

MLM model

The level-1 equation is: \[math_{ti}=b_{1i}+b_{2i}~(grade_{ti}-2)+u_{ti}\] This is a linear growth model with the intercept centered at second grade (when measurement commenced).

The level-2 equations are: \[b_{1i}=\beta_{01}+\beta_{11}LowBirthWght_i+\beta_{21}AntiK1_i+d_{1i}\] \[b_{2i}=\beta_{02}+\beta_{12}LowBirthWght_i+\beta_{22}AntiK1_i+d_{2i}\] These allow low birth weight and antisocial behaviors from kindergarten to first grade to have fixed effects on the slope and intercept.

All together, the multilevel model becomes: \[math_{ti}=(\beta_{01}+\beta_{11}LowBirthWght_i+\beta_{21}AntiK1_i+d_{1i})+\\(\beta_{02}+\beta_{12}LowBirthWght_i+\beta_{22}AntiK1_i+d_{2i})*(grade_{ti}-2)+u_{ti}\]

Fitting the MLM with nlme (in the nlme library)

lg.math.nlme <- nlme(math ~ (beta_01 + beta_11*lb_wght + beta_21*anti_k1 + d_1i) +
                            (beta_02 + beta_12*lb_wght + beta_22*anti_k1 +d_2i)*(grade-2),
        data=nlsy_math_long,
        fixed=beta_01+beta_11+beta_21+beta_02+beta_12+beta_22~1,
        random=d_1i+d_2i~1,
        group=~id,
        start=c(beta_01=30, beta_11=0, beta_21=0, beta_02=4, beta_12=0, beta_22=0),
        na.action=na.exclude)

summary(lg.math.nlme)
## Nonlinear mixed-effects model fit by maximum likelihood
##   Model: math ~ (beta_01 + beta_11 * lb_wght + beta_21 * anti_k1 + d_1i) +      (beta_02 + beta_12 * lb_wght + beta_22 * anti_k1 + d_2i) *          (grade - 2) 
##  Data: nlsy_math_long 
##        AIC     BIC   logLik
##   15943.14 16000.2 -7961.57
## 
## Random effects:
##  Formula: list(d_1i ~ 1, d_2i ~ 1)
##  Level: id
##  Structure: General positive-definite, Log-Cholesky parametrization
##          StdDev    Corr  
## d_1i     7.9413032 d_1i  
## d_2i     0.8445021 -0.012
## Residual 6.0213598       
## 
## Fixed effects: beta_01 + beta_11 + beta_21 + beta_02 + beta_12 + beta_22 ~ 1 
##            Value Std.Error   DF  t-value p-value
## beta_01 36.28987 0.4969763 1284 73.02132  0.0000
## beta_11 -2.71621 1.2953400 1284 -2.09691  0.0362
## beta_21 -0.55088 0.2327789 1284 -2.36655  0.0181
## beta_02  4.31516 0.1207519 1284 35.73574  0.0000
## beta_12  0.62464 0.3335737 1284  1.87256  0.0614
## beta_22 -0.01929 0.0589360 1284 -0.32731  0.7435
##  Correlation: 
##         bet_01 bet_11 bet_21 bet_02 bet_12
## beta_11 -0.194                            
## beta_21 -0.671 -0.025                     
## beta_02 -0.529  0.096  0.358              
## beta_12  0.091 -0.532  0.026 -0.168       
## beta_22  0.343  0.026 -0.529 -0.660 -0.055
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.080147132 -0.525141518 -0.008659959  0.530785841  2.534566917 
## 
## Number of Observations: 2221
## Number of Groups: 932

Fitting the MLM using lmer (in the lme4 library)

nlsy_math_long$grade_c2 <- nlsy_math_long$grade-2

lg.math.lmer <- lmer(math ~ 1 + grade_c2 + lb_wght + anti_k1 + I(grade_c2*lb_wght) + I(grade_c2*anti_k1)  + (1 + grade_c2 | id), 
                     data = nlsy_math_long, 
                     REML = FALSE,
                     na.action = na.exclude)

summary(lg.math.lmer)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: 
## math ~ 1 + grade_c2 + lb_wght + anti_k1 + I(grade_c2 * lb_wght) +  
##     I(grade_c2 * anti_k1) + (1 + grade_c2 | id)
##    Data: nlsy_math_long
## 
##      AIC      BIC   logLik deviance df.resid 
##  15943.1  16000.2  -7961.6  15923.1     2211 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.08015 -0.52514 -0.00866  0.53079  2.53457 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  id       (Intercept) 63.0644  7.9413        
##           grade_c2     0.7132  0.8445   -0.01
##  Residual             36.2568  6.0214        
## Number of obs: 2221, groups:  id, 932
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)           36.28987    0.49630   73.12
## grade_c2               4.31516    0.12059   35.78
## lb_wght               -2.71621    1.29359   -2.10
## anti_k1               -0.55088    0.23246   -2.37
## I(grade_c2 * lb_wght)  0.62464    0.33312    1.88
## I(grade_c2 * anti_k1) -0.01929    0.05886   -0.33
## 
## Correlation of Fixed Effects:
##             (Intr) grd_c2 lb_wgh ant_k1 I(_2*l
## grade_c2    -0.529                            
## lb_wght     -0.194  0.096                     
## anti_k1     -0.671  0.358 -0.025              
## I(grd_2*l_)  0.091 -0.168 -0.532  0.026       
## I(gr_2*a_1)  0.343 -0.660  0.026 -0.529 -0.055

SEM modeling with lavaan

The path model for a linear growth model with time invariant covariates is given in Figure 5.2 (p.103).

Specify the lavaan model for linear growth with time invariant covariates

lavaan.model<-'#latent variable definitions
            #intercept
              eta1 =~ 1*math2+
                      1*math3+
                      1*math4+
                      1*math5+
                      1*math6+
                      1*math7+
                      1*math8
            #linear slope
              eta2 =~ 0*math2+
                      1*math3+
                      2*math4+
                      3*math5+
                      4*math6+
                      5*math7+
                      6*math8

          #factor variances
            eta1 ~~ eta1
            eta2 ~~ eta2

          #factor covariance
            eta1 ~~ eta2

          #manifest variances (set equal by naming theta)
            math2 ~~ theta*math2
            math3 ~~ theta*math3
            math4 ~~ theta*math4
            math5 ~~ theta*math5
            math6 ~~ theta*math6
            math7 ~~ theta*math7
            math8 ~~ theta*math8

          #latent means (freely estimated)
            eta1 ~ 1
            eta2 ~ 1

          #manifest means (fixed to zero)
            math2 ~ 0*1
            math3 ~ 0*1
            math4 ~ 0*1
            math5 ~ 0*1
            math6 ~ 0*1
            math7 ~ 0*1
            math8 ~ 0*1

        #Time invariant covaraite
          #regression of time-invariant covariate on intercept and slope factors
            eta1~lb_wght+anti_k1
            eta2~lb_wght+anti_k1

        #variance of TIV covariates
            lb_wght ~~ lb_wght
            anti_k1 ~~ anti_k1

        #covariance of TIV covaraites
            lb_wght ~~ anti_k1

        #means of TIV covariates (freely estimated)
            lb_wght ~ 1
            anti_k1 ~ 1'

lavaan.fit<-sem(lavaan.model,
                data = nlsy_math_wide,
                meanstructure = TRUE,
                estimator = "ML",
                missing = "fiml",
                fixed.x = FALSE)
## Warning in lav_data_full(data = data, group = group, group.label =
## group.label, : lavaan WARNING: some observed variances are (at least) a
## factor 1000 times larger than others; use varTable(fit) to investigate
## Warning in lav_data_full(data = data, group = group, group.label =
## group.label, : lavaan WARNING: due to missing values, some pairwise
## combinations have less than 10% coverage
summary(lavaan.fit, fit.measures=TRUE)
## lavaan (0.5-20) converged normally after 118 iterations
## 
##   Number of observations                           933
## 
##   Number of missing patterns                        61
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic              220.221
##   Degrees of freedom                                39
##   P-value (Chi-square)                           0.000
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              892.616
##   Degrees of freedom                                36
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.788
##   Tucker-Lewis Index (TLI)                       0.805
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -9785.085
##   Loglikelihood unrestricted model (H1)      -9674.975
## 
##   Number of free parameters                         15
##   Akaike (AIC)                               19600.171
##   Bayesian (BIC)                             19672.747
##   Sample-size adjusted Bayesian (BIC)        19625.108
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.071
##   90 Percent Confidence Interval          0.062  0.080
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.097
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   eta1 =~                                             
##     math2             1.000                           
##     math3             1.000                           
##     math4             1.000                           
##     math5             1.000                           
##     math6             1.000                           
##     math7             1.000                           
##     math8             1.000                           
##   eta2 =~                                             
##     math2             0.000                           
##     math3             1.000                           
##     math4             2.000                           
##     math5             3.000                           
##     math6             4.000                           
##     math7             5.000                           
##     math8             6.000                           
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   eta1 ~                                              
##     lb_wght          -2.716    1.294   -2.099    0.036
##     anti_k1          -0.551    0.232   -2.369    0.018
##   eta2 ~                                              
##     lb_wght           0.625    0.333    1.873    0.061
##     anti_k1          -0.019    0.059   -0.327    0.743
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   eta1 ~~                                             
##     eta2             -0.078    1.145   -0.068    0.945
##   lb_wght ~~                                          
##     anti_k1           0.007    0.014    0.548    0.584
## 
## Intercepts:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     eta1             36.290    0.497   73.052    0.000
##     eta2              4.315    0.122   35.420    0.000
##     math2             0.000                           
##     math3             0.000                           
##     math4             0.000                           
##     math5             0.000                           
##     math6             0.000                           
##     math7             0.000                           
##     math8             0.000                           
##     lb_wght           0.080    0.009    9.031    0.000
##     anti_k1           1.454    0.050   29.216    0.000
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     eta1             63.064    5.609   11.242    0.000
##     eta2              0.713    0.326    2.185    0.029
##     math2   (thet)   36.257    1.868   19.406    0.000
##     math3   (thet)   36.257    1.868   19.406    0.000
##     math4   (thet)   36.257    1.868   19.406    0.000
##     math5   (thet)   36.257    1.868   19.406    0.000
##     math6   (thet)   36.257    1.868   19.406    0.000
##     math7   (thet)   36.257    1.868   19.406    0.000
##     math8   (thet)   36.257    1.868   19.406    0.000
##     lb_wght           0.074    0.003   21.599    0.000
##     anti_k1           2.312    0.107   21.599    0.000

To determine whether the addition of low birth weight and kindergarten antisocial behaviors is useful for model fit, we can fit the linear growth model with regression effects set to zero

lavaan.model1<-'#latent variable definitions
            #intercept
              eta1 =~ 1*math2+
                      1*math3+
                      1*math4+
                      1*math5+
                      1*math6+
                      1*math7+
                      1*math8
            #linear slope
              eta2 =~ 0*math2+
                      1*math3+
                      2*math4+
                      3*math5+
                      4*math6+
                      5*math7+
                      6*math8

          #factor variances
            eta1 ~~ eta1
            eta2 ~~ eta2

          #factor covariance
            eta1 ~~ eta2

          #manifest variances (set equal by naming theta)
            math2 ~~ theta*math2
            math3 ~~ theta*math3
            math4 ~~ theta*math4
            math5 ~~ theta*math5
            math6 ~~ theta*math6
            math7 ~~ theta*math7
            math8 ~~ theta*math8

          #latent means (freely estimated)
            eta1 ~ 1
            eta2 ~ 1

          #manifest means (fixed to zero)
            math2 ~ 0*1
            math3 ~ 0*1
            math4 ~ 0*1
            math5 ~ 0*1
            math6 ~ 0*1
            math7 ~ 0*1
            math8 ~ 0*1

        #Time invariant covaraite
          #regression of time-invariant covariate on intercept and slope factors
            eta1~0*lb_wght+0*anti_k1
            eta2~0*lb_wght+0*anti_k1

        #variance of TIV covariates
            lb_wght ~~ lb_wght
            anti_k1 ~~ anti_k1

        #covariance of TIV covaraites
            lb_wght ~~ anti_k1

        #means of TIV covariates (freely estimated)
            lb_wght ~ 1
            anti_k1 ~ 1'

lavaan.fit1<-sem(lavaan.model1,
                data = nlsy_math_wide,
                meanstructure = TRUE,
                estimator = "ML",
                missing = "fiml",
                fixed.x = FALSE)
## Warning in lav_data_full(data = data, group = group, group.label =
## group.label, : lavaan WARNING: some observed variances are (at least) a
## factor 1000 times larger than others; use varTable(fit) to investigate
## Warning in lav_data_full(data = data, group = group, group.label =
## group.label, : lavaan WARNING: due to missing values, some pairwise
## combinations have less than 10% coverage
summary(lavaan.fit1, fit.measures=TRUE)
## lavaan (0.5-20) converged normally after  92 iterations
## 
##   Number of observations                           933
## 
##   Number of missing patterns                        61
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic              234.467
##   Degrees of freedom                                43
##   P-value (Chi-square)                           0.000
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              892.616
##   Degrees of freedom                                36
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.776
##   Tucker-Lewis Index (TLI)                       0.813
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -9792.208
##   Loglikelihood unrestricted model (H1)      -9674.975
## 
##   Number of free parameters                         11
##   Akaike (AIC)                               19606.416
##   Bayesian (BIC)                             19659.638
##   Sample-size adjusted Bayesian (BIC)        19624.703
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.069
##   90 Percent Confidence Interval          0.061  0.078
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.103
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   eta1 =~                                             
##     math2             1.000                           
##     math3             1.000                           
##     math4             1.000                           
##     math5             1.000                           
##     math6             1.000                           
##     math7             1.000                           
##     math8             1.000                           
##   eta2 =~                                             
##     math2             0.000                           
##     math3             1.000                           
##     math4             2.000                           
##     math5             3.000                           
##     math6             4.000                           
##     math7             5.000                           
##     math8             6.000                           
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   eta1 ~                                              
##     lb_wght           0.000                           
##     anti_k1           0.000                           
##   eta2 ~                                              
##     lb_wght           0.000                           
##     anti_k1           0.000                           
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   eta1 ~~                                             
##     eta2             -0.181    1.150   -0.158    0.875
##   lb_wght ~~                                          
##     anti_k1           0.007    0.014    0.548    0.584
## 
## Intercepts:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     eta1             35.267    0.355   99.229    0.000
##     eta2              4.339    0.088   49.136    0.000
##     math2             0.000                           
##     math3             0.000                           
##     math4             0.000                           
##     math5             0.000                           
##     math6             0.000                           
##     math7             0.000                           
##     math8             0.000                           
##     lb_wght           0.080    0.009    9.031    0.000
##     anti_k1           1.454    0.050   29.216    0.000
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     eta1             64.562    5.659   11.408    0.000
##     eta2              0.733    0.327    2.238    0.025
##     math2   (thet)   36.230    1.867   19.410    0.000
##     math3   (thet)   36.230    1.867   19.410    0.000
##     math4   (thet)   36.230    1.867   19.410    0.000
##     math5   (thet)   36.230    1.867   19.410    0.000
##     math6   (thet)   36.230    1.867   19.410    0.000
##     math7   (thet)   36.230    1.867   19.410    0.000
##     math8   (thet)   36.230    1.867   19.410    0.000
##     lb_wght           0.074    0.003   21.599    0.000
##     anti_k1           2.312    0.107   21.599    0.000

The chi-square difference test (obtained using ANOVA) indicates that the time invariant predictors are useful to the model. For a more in-depth discussion of model fit and comparison, see pages 110-111 in Growth Modeling.

anova(lavaan.fit,lavaan.fit1)
## Chi Square Difference Test
## 
##             Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)   
## lavaan.fit  39 19600 19673 220.22                                 
## lavaan.fit1 43 19606 19660 234.47     14.245       4   0.006552 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1