This tutorial will demonstrate in multilevel modeling and structural equation modeling approaches to growth modeling with a continuous time metric (sometimes referred to as individually varying time metric).
This tutorial provides code to:
First, load the necessary libraries and import the datafiles from the QuantDev website. We will use data in the long format for the multilevel model, and wide-formatted data for the SEM model.
library(nlme)
library(lavaan)
## This is lavaan 0.5-20
## lavaan is BETA software! Please report any bugs.
#read in long data set
#set filepath
filepath<-"https://quantdev.ssri.psu.edu/sites/qdev/files/nlsy_math_long.csv"
#read in the .csv file using the url() function
nlsy_math_long <- read.csv(file=url(filepath),header=TRUE)
head(nlsy_math_long)
## id female lb_wght anti_k1 math grade occ age men spring anti
## 1 201 1 0 0 38 3 2 111 0 1 0
## 2 201 1 0 0 55 5 3 135 1 1 0
## 3 303 1 0 1 26 2 2 121 0 1 2
## 4 303 1 0 1 33 5 3 145 0 1 2
## 5 2702 0 0 0 56 2 2 100 NA 1 0
## 6 2702 0 0 0 58 4 3 125 NA 1 2
#read in wide data set
#set filepath
filepath<-"https://quantdev.ssri.psu.edu/sites/qdev/files/nlsy_math_wide_age.csv"
#read in the .csv file using the url() function
nlsy_math_age <- read.csv(file=url(filepath),header=TRUE)
The handling of continuous time metrics in the MLM framework is straightforward. Time is incorporated into the model as an independent variable; therefore, times do not have to be equally spaced between persons for the model to be estimated.
Here, we fit a linear growth MLM with age centered at 8. Age in our data set is expressed in months, so we divide by 12 to rescale age to years (rescaling the time metric).
The Level-1 equation is: \[math_{ti}=b_{1i}+b_{2i}\left(\frac{age_{ti}}{12}-8\right)+u_{ti}\] The Level-2 equations are: \[b_{1i}=\beta_1+d_{1i}\] \[b_{2i}=\beta_2+d_{2i}\] The nlme code for this model is given below.
lg.math.age.nlme <- nlme(math~b_1i+b_2i*(age/12-8),
data=nlsy_math_long,
fixed=b_1i+b_2i~1,
random=b_1i+b_2i~1,
group=~id,
start=c(b_1i=35, b_2i=4),
na.action = na.exclude)
summary(lg.math.age.nlme)
## Nonlinear mixed-effects model fit by maximum likelihood
## Model: math ~ b_1i + b_2i * (age/12 - 8)
## Data: nlsy_math_long
## AIC BIC logLik
## 15855.14 15889.37 -7921.568
##
## Random effects:
## Formula: list(b_1i ~ 1, b_2i ~ 1)
## Level: id
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## b_1i 8.0516838 b_1i
## b_2i 0.8216858 0.2
## Residual 5.6705748
##
## Fixed effects: b_1i + b_2i ~ 1
## Value Std.Error DF t-value p-value
## b_1i 35.17313 0.3463334 1288 101.55858 0
## b_2i 4.25656 0.0804631 1288 52.90072 0
## Correlation:
## b_1i
## b_2i -0.453
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.382126561 -0.516299320 0.003860165 0.528128495 2.651445505
##
## Number of Observations: 2221
## Number of Groups: 932
These results are the same as those given in Chapter 4 (p. 80).
Analysis of continuous time metrics is not so straightforward in SEM.
The time window approach is one potential method for analyzing data with individually varying measurement occasions. Essentially, the time window approach aims to closely approximate the individually varying time metric on a discrete scale. For example, this can be achieved by rounding time/age to the nearest half- or quarter-year.
This method is of course still an approximation of time. One can acheive greater precision by using smaller windows, but if the data matrix becomes too sparse, estimation is difficult.
In this example, the time windows are defined as half years.
The lavaan code for the time window approach in this example is given below:
lg.math.age.lavaan <- '
# latent variable definitions
#intercept (note intercept is a reserved term)
eta_1 =~ 1*math70 +
1*math75 +
1*math80 +
1*math85 +
1*math90 +
1*math95 +
1*math100 +
1*math105 +
1*math110 +
1*math115 +
1*math120 +
1*math125 +
1*math130 +
1*math135 +
1*math140 +
1*math145
#linear slope (note intercept is a reserved term)
eta_2 =~ -1*math70 +
-0.5*math75 +
0*math80 +
0.5*math85 +
1*math90 +
1.5*math95 +
2*math100 +
2.5*math105 +
3*math110 +
3.5*math115 +
4*math120 +
4.5*math125 +
5*math130 +
5.5*math135 +
6*math140 +
6.5*math145
# factor variances
eta_1 ~~ start(65)*eta_1
eta_2 ~~ start(.75)*eta_2
# covariances among factors
eta_1 ~~ start(1)*eta_2
# manifest variances (made equivalent by naming theta)
math70 ~~ theta*math70
math75 ~~ theta*math75
math80 ~~ theta*math80
math85 ~~ theta*math85
math90 ~~ theta*math90
math95 ~~ theta*math95
math100 ~~ theta*math100
math105 ~~ theta*math105
math110 ~~ theta*math110
math115 ~~ theta*math115
math120 ~~ theta*math120
math125 ~~ theta*math125
math130 ~~ theta*math130
math135 ~~ theta*math135
math140 ~~ theta*math140
math145 ~~ theta*math145
# manifest means (fixed at zero)
math70 ~ 0*1
math75 ~ 0*1
math80 ~ 0*1
math85 ~ 0*1
math90 ~ 0*1
math95 ~ 0*1
math100 ~ 0*1
math105 ~ 0*1
math110 ~ 0*1
math115 ~ 0*1
math120 ~ 0*1
math125 ~ 0*1
math130 ~ 0*1
math135 ~ 0*1
math140 ~ 0*1
math145 ~ 0*1
# factor means (estimated freely)
eta_1 ~ start(35)*1
eta_2 ~ start(4)*1
'
We then fit the lavaan model using lavaan’s maximum likelihood estimator and full information maximum likelihood to handle missing values.
#estimating the model using sem() function
lg.math.lavaan_fit <- sem(lg.math.age.lavaan,
data = nlsy_math_age,
meanstructure = TRUE,
estimator = "ML",
missing = "fiml")
## Warning in lav_data_full(data = data, group = group, group.label =
## group.label, : lavaan WARNING: due to missing values, some pairwise
## combinations have less than 10% coverage
summary(lg.math.lavaan_fit, fit.measures=TRUE)
## lavaan (0.5-20) converged normally after 24 iterations
##
## Number of observations 932
##
## Number of missing patterns 143
##
## Estimator ML
## Minimum Function Test Statistic 287.849
## Degrees of freedom 146
## P-value (Chi-square) 0.000
##
## Model test baseline model:
##
## Minimum Function Test Statistic 1051.227
## Degrees of freedom 120
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.848
## Tucker-Lewis Index (TLI) 0.875
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -7927.945
## Loglikelihood unrestricted model (H1) -7784.021
##
## Number of free parameters 6
## Akaike (AIC) 15867.890
## Bayesian (BIC) 15896.914
## Sample-size adjusted Bayesian (BIC) 15877.859
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.032
## 90 Percent Confidence Interval 0.027 0.038
## P-value RMSEA <= 0.05 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.400
##
## Parameter Estimates:
##
## Information Observed
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err Z-value P(>|z|)
## eta_1 =~
## math70 1.000
## math75 1.000
## math80 1.000
## math85 1.000
## math90 1.000
## math95 1.000
## math100 1.000
## math105 1.000
## math110 1.000
## math115 1.000
## math120 1.000
## math125 1.000
## math130 1.000
## math135 1.000
## math140 1.000
## math145 1.000
## eta_2 =~
## math70 -1.000
## math75 -0.500
## math80 0.000
## math85 0.500
## math90 1.000
## math95 1.500
## math100 2.000
## math105 2.500
## math110 3.000
## math115 3.500
## math120 4.000
## math125 4.500
## math130 5.000
## math135 5.500
## math140 6.000
## math145 6.500
##
## Covariances:
## Estimate Std.Err Z-value P(>|z|)
## eta_1 ~~
## eta_2 1.165 1.015 1.147 0.251
##
## Intercepts:
## Estimate Std.Err Z-value P(>|z|)
## math70 0.000
## math75 0.000
## math80 0.000
## math85 0.000
## math90 0.000
## math95 0.000
## math100 0.000
## math105 0.000
## math110 0.000
## math115 0.000
## math120 0.000
## math125 0.000
## math130 0.000
## math135 0.000
## math140 0.000
## math145 0.000
## eta_1 35.074 0.349 100.626 0.000
## eta_2 4.229 0.081 51.988 0.000
##
## Variances:
## Estimate Std.Err Z-value P(>|z|)
## eta_1 65.120 5.562 11.708 0.000
## eta_2 0.728 0.275 2.646 0.008
## math70 (thet) 32.219 1.686 19.107 0.000
## math75 (thet) 32.219 1.686 19.107 0.000
## math80 (thet) 32.219 1.686 19.107 0.000
## math85 (thet) 32.219 1.686 19.107 0.000
## math90 (thet) 32.219 1.686 19.107 0.000
## math95 (thet) 32.219 1.686 19.107 0.000
## math100 (thet) 32.219 1.686 19.107 0.000
## math105 (thet) 32.219 1.686 19.107 0.000
## math110 (thet) 32.219 1.686 19.107 0.000
## math115 (thet) 32.219 1.686 19.107 0.000
## math120 (thet) 32.219 1.686 19.107 0.000
## math125 (thet) 32.219 1.686 19.107 0.000
## math130 (thet) 32.219 1.686 19.107 0.000
## math135 (thet) 32.219 1.686 19.107 0.000
## math140 (thet) 32.219 1.686 19.107 0.000
## math145 (thet) 32.219 1.686 19.107 0.000