Overview

This tutorial will demonstrate in multilevel modeling and structural equation modeling approaches to growth modeling with a continuous time metric (sometimes referred to as individually varying time metric).

Outline

This tutorial provides code to:

  1. Fit a multilevel growth model with nlme
  2. Fit a growth model to continuous-time data using the time window approach with lavaan

Preliminaries: Load data and libraries

First, load the necessary libraries and import the datafiles from the QuantDev website. We will use data in the long format for the multilevel model, and wide-formatted data for the SEM model.

library(nlme)
library(lavaan)
## This is lavaan 0.5-20
## lavaan is BETA software! Please report any bugs.
#read in long data set
#set filepath
filepath<-"https://quantdev.ssri.psu.edu/sites/qdev/files/nlsy_math_long.csv"
#read in the .csv file using the url() function
nlsy_math_long <- read.csv(file=url(filepath),header=TRUE)
head(nlsy_math_long)
##     id female lb_wght anti_k1 math grade occ age men spring anti
## 1  201      1       0       0   38     3   2 111   0      1    0
## 2  201      1       0       0   55     5   3 135   1      1    0
## 3  303      1       0       1   26     2   2 121   0      1    2
## 4  303      1       0       1   33     5   3 145   0      1    2
## 5 2702      0       0       0   56     2   2 100  NA      1    0
## 6 2702      0       0       0   58     4   3 125  NA      1    2
#read in wide data set
#set filepath
filepath<-"https://quantdev.ssri.psu.edu/sites/qdev/files/nlsy_math_wide_age.csv"
#read in the .csv file using the url() function
nlsy_math_age <- read.csv(file=url(filepath),header=TRUE)

Multilevel Modeling: Implementation with nlme

The handling of continuous time metrics in the MLM framework is straightforward. Time is incorporated into the model as an independent variable; therefore, times do not have to be equally spaced between persons for the model to be estimated.

Here, we fit a linear growth MLM with age centered at 8. Age in our data set is expressed in months, so we divide by 12 to rescale age to years (rescaling the time metric).

The Level-1 equation is: \[math_{ti}=b_{1i}+b_{2i}\left(\frac{age_{ti}}{12}-8\right)+u_{ti}\] The Level-2 equations are: \[b_{1i}=\beta_1+d_{1i}\] \[b_{2i}=\beta_2+d_{2i}\] The nlme code for this model is given below.

lg.math.age.nlme <- nlme(math~b_1i+b_2i*(age/12-8),
                         data=nlsy_math_long,
                         fixed=b_1i+b_2i~1,
                         random=b_1i+b_2i~1,
                         group=~id,
                         start=c(b_1i=35, b_2i=4),
                         na.action = na.exclude)
summary(lg.math.age.nlme)
## Nonlinear mixed-effects model fit by maximum likelihood
##   Model: math ~ b_1i + b_2i * (age/12 - 8) 
##  Data: nlsy_math_long 
##        AIC      BIC    logLik
##   15855.14 15889.37 -7921.568
## 
## Random effects:
##  Formula: list(b_1i ~ 1, b_2i ~ 1)
##  Level: id
##  Structure: General positive-definite, Log-Cholesky parametrization
##          StdDev    Corr
## b_1i     8.0516838 b_1i
## b_2i     0.8216858 0.2 
## Residual 5.6705748     
## 
## Fixed effects: b_1i + b_2i ~ 1 
##         Value Std.Error   DF   t-value p-value
## b_1i 35.17313 0.3463334 1288 101.55858       0
## b_2i  4.25656 0.0804631 1288  52.90072       0
##  Correlation: 
##      b_1i  
## b_2i -0.453
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.382126561 -0.516299320  0.003860165  0.528128495  2.651445505 
## 
## Number of Observations: 2221
## Number of Groups: 932

These results are the same as those given in Chapter 4 (p. 80).

SEM: Time window approach

Analysis of continuous time metrics is not so straightforward in SEM.

The time window approach is one potential method for analyzing data with individually varying measurement occasions. Essentially, the time window approach aims to closely approximate the individually varying time metric on a discrete scale. For example, this can be achieved by rounding time/age to the nearest half- or quarter-year.

This method is of course still an approximation of time. One can acheive greater precision by using smaller windows, but if the data matrix becomes too sparse, estimation is difficult.

In this example, the time windows are defined as half years.

The lavaan code for the time window approach in this example is given below:

lg.math.age.lavaan <- '
  # latent variable definitions
      #intercept (note intercept is a reserved term)
      eta_1 =~ 1*math70 +
                1*math75 +
                1*math80 +
                1*math85 +
                1*math90 +
                1*math95 +
                1*math100 +
                1*math105 +
                1*math110 +
                1*math115 +
                1*math120 +
                1*math125 +
                1*math130 +
                1*math135 +
                1*math140 +
                1*math145 

      #linear slope (note intercept is a reserved term)
      eta_2 =~ -1*math70 +
                -0.5*math75 +
                0*math80 +
                0.5*math85 +
                1*math90 +
                1.5*math95 +
                2*math100 +
                2.5*math105 +
                3*math110 +
                3.5*math115 +
                4*math120 +
                4.5*math125 +
                5*math130 +
                5.5*math135 +
                6*math140 +
                6.5*math145

  # factor variances
      eta_1 ~~ start(65)*eta_1
      eta_2 ~~ start(.75)*eta_2

  # covariances among factors 
      eta_1 ~~ start(1)*eta_2

  # manifest variances (made equivalent by naming theta)
      math70 ~~ theta*math70
      math75 ~~ theta*math75
      math80 ~~ theta*math80
      math85 ~~ theta*math85
      math90 ~~ theta*math90
      math95 ~~ theta*math95
      math100 ~~ theta*math100
      math105 ~~ theta*math105
      math110 ~~ theta*math110
      math115 ~~ theta*math115
      math120 ~~ theta*math120
      math125 ~~ theta*math125
      math130 ~~ theta*math130
      math135 ~~ theta*math135
      math140 ~~ theta*math140
      math145 ~~ theta*math145
  # manifest means (fixed at zero)
      math70 ~ 0*1
      math75 ~ 0*1
      math80 ~ 0*1
      math85 ~ 0*1
      math90 ~ 0*1
      math95 ~ 0*1
      math100 ~ 0*1
      math105 ~ 0*1
      math110 ~ 0*1
      math115 ~ 0*1
      math120 ~ 0*1
      math125 ~ 0*1
      math130 ~ 0*1
      math135 ~ 0*1
      math140 ~ 0*1
      math145 ~ 0*1

  # factor means (estimated freely)
      eta_1 ~ start(35)*1
      eta_2 ~ start(4)*1
'

We then fit the lavaan model using lavaan’s maximum likelihood estimator and full information maximum likelihood to handle missing values.

#estimating the model using sem() function
lg.math.lavaan_fit <- sem(lg.math.age.lavaan, 
                          data = nlsy_math_age,
                          meanstructure = TRUE,
                          estimator = "ML",
                          missing = "fiml")
## Warning in lav_data_full(data = data, group = group, group.label =
## group.label, : lavaan WARNING: due to missing values, some pairwise
## combinations have less than 10% coverage
summary(lg.math.lavaan_fit, fit.measures=TRUE)
## lavaan (0.5-20) converged normally after  24 iterations
## 
##   Number of observations                           932
## 
##   Number of missing patterns                       143
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic              287.849
##   Degrees of freedom                               146
##   P-value (Chi-square)                           0.000
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic             1051.227
##   Degrees of freedom                               120
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.848
##   Tucker-Lewis Index (TLI)                       0.875
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -7927.945
##   Loglikelihood unrestricted model (H1)      -7784.021
## 
##   Number of free parameters                          6
##   Akaike (AIC)                               15867.890
##   Bayesian (BIC)                             15896.914
##   Sample-size adjusted Bayesian (BIC)        15877.859
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.032
##   90 Percent Confidence Interval          0.027  0.038
##   P-value RMSEA <= 0.05                          1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.400
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   eta_1 =~                                            
##     math70            1.000                           
##     math75            1.000                           
##     math80            1.000                           
##     math85            1.000                           
##     math90            1.000                           
##     math95            1.000                           
##     math100           1.000                           
##     math105           1.000                           
##     math110           1.000                           
##     math115           1.000                           
##     math120           1.000                           
##     math125           1.000                           
##     math130           1.000                           
##     math135           1.000                           
##     math140           1.000                           
##     math145           1.000                           
##   eta_2 =~                                            
##     math70           -1.000                           
##     math75           -0.500                           
##     math80            0.000                           
##     math85            0.500                           
##     math90            1.000                           
##     math95            1.500                           
##     math100           2.000                           
##     math105           2.500                           
##     math110           3.000                           
##     math115           3.500                           
##     math120           4.000                           
##     math125           4.500                           
##     math130           5.000                           
##     math135           5.500                           
##     math140           6.000                           
##     math145           6.500                           
## 
## Covariances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##   eta_1 ~~                                            
##     eta_2             1.165    1.015    1.147    0.251
## 
## Intercepts:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     math70            0.000                           
##     math75            0.000                           
##     math80            0.000                           
##     math85            0.000                           
##     math90            0.000                           
##     math95            0.000                           
##     math100           0.000                           
##     math105           0.000                           
##     math110           0.000                           
##     math115           0.000                           
##     math120           0.000                           
##     math125           0.000                           
##     math130           0.000                           
##     math135           0.000                           
##     math140           0.000                           
##     math145           0.000                           
##     eta_1            35.074    0.349  100.626    0.000
##     eta_2             4.229    0.081   51.988    0.000
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)
##     eta_1            65.120    5.562   11.708    0.000
##     eta_2             0.728    0.275    2.646    0.008
##     math70  (thet)   32.219    1.686   19.107    0.000
##     math75  (thet)   32.219    1.686   19.107    0.000
##     math80  (thet)   32.219    1.686   19.107    0.000
##     math85  (thet)   32.219    1.686   19.107    0.000
##     math90  (thet)   32.219    1.686   19.107    0.000
##     math95  (thet)   32.219    1.686   19.107    0.000
##     math100 (thet)   32.219    1.686   19.107    0.000
##     math105 (thet)   32.219    1.686   19.107    0.000
##     math110 (thet)   32.219    1.686   19.107    0.000
##     math115 (thet)   32.219    1.686   19.107    0.000
##     math120 (thet)   32.219    1.686   19.107    0.000
##     math125 (thet)   32.219    1.686   19.107    0.000
##     math130 (thet)   32.219    1.686   19.107    0.000
##     math135 (thet)   32.219    1.686   19.107    0.000
##     math140 (thet)   32.219    1.686   19.107    0.000
##     math145 (thet)   32.219    1.686   19.107    0.000