This is an illustrative example of using dynr.mi to implement multiple imputation with a vector autoregressive model.
References:
[1] Schermerhorn, A. C., Chow, S.-M., & Cummings, E. M. (2010). Developmental family processes and interparental conflict: Patterns of microlevel influences. Developmental Psychology, 46(4), 869-885.
[2] Ji, L., Chow, S.-M., Schermerhorn, A. C., Jacobson, N. C., & Cummings, E.M. (2018). Handling missing data in the modeling of intensive longitudinal data. Structural Equation Modeling: A Multidisciplinary Journal, 1-22.
[3] Li, Y., Ji, L., Oravecz, Z., Brick, T.R., Hunter, M.D., & Chow, S-M. (2019). dynr.mi: An R program for Multiple Imputation in Dynamic Modeling. International Journal of Computer, Electrical, Automation, Control and Information Engineering, 13(5), 302 - 311.
Loading the libraries used in this script.
library(dynr)
## Loading required package: ggplot2
Getting the simulated data.
data(VARsim)
VARsim$ca <- as.factor(VARsim$ca)
This example is inspired by a published study aimed at exploring the dynamics of inter-parental emotion states and behaviors at the end of conflicts, and associations with child emotions and behaviors during conflicts (Schermerhorn, Chow, & Cummings, 2010). To analyze over-time and lagged dependencies in the couples’ dynamics and possible associations with child-related variable, we considered the following model:
\[ \begin{bmatrix} wp_{i,t} \\ hp_{i,t} \end{bmatrix} = \begin{bmatrix} a & b\\ b_1 & a_1 \end{bmatrix} \begin{bmatrix} wp_{i,t-1}\\ hp_{i,t-1} \end{bmatrix}+ \begin{bmatrix} c & d\\ c_1 & d_1 \end{bmatrix} \begin{bmatrix} ca_{i,t}\\ cn_{i,t} \end{bmatrix}+ \begin{bmatrix} we_{i,t}\\ he_{i,t} \end{bmatrix}\]
\[\begin{bmatrix} we_{i,t}\\ he_{i,t} \end{bmatrix} \sim N( \begin{bmatrix} 0\\ 0 \end{bmatrix}, \begin{bmatrix} v_{wp} & c_{hw}\\ c_{hw} & v_{hp} \end{bmatrix} )\]
where \(wp\)and \(hp\) are two dependent variables representing the emotional ratings of wife and husband respectively; \(ca\) is a covariate representing an aggregate measure of the child’s agentic behavior (e.g. helping out, taking sides, and comforting the parents); \(cn\) is a covariate representing an aggregate measure of the child’s negative emotions and dysregulated behaviors (e.g., anger, sadness, and fear).
The simulated data were generated from the above model. x1 and x2 are two auxiliary varialbes used in the generation of missing data. The data contain 100 subjects (N = 100) and 100 time points (T = 100) for each subject. Missing data were under missing at random (MAR) condtion with around 30% missing rate.
The trajectories of emotional ratings of wife and husband for two families are shown below.
ggplot(data = subset(VARsim,ID>=5 & ID <=6)) +
geom_line(aes(x = Time, y = wp, col = "wp")) +
geom_line(aes(x = Time, y = hp, col = "hp")) +
labs(col="Variable",
x="Time",
y="wp and hp")+
theme_classic() +
#scale_colour_manual(values=c("pink", "purple")) +
facet_wrap( ~ paste0("ID:",ID),dir = "v")
rawdata <- dynr.data(VARsim, id="ID", time="Time",
observed=c("wp","hp"),covariates=c("ca","cn"))
meas <- prep.measurement(
values.load=matrix(c(1,0,
0,1),ncol=2,byrow=T),
params.load=matrix(rep("fixed",4),ncol=2),
state.names=c("wp","hp"),
obs.names=c("wp","hp")
)
formula =list(wp ~ a*wp + b*hp + c*ca + d*cn,
hp ~ a1*hp + b1*wp +c1*ca + d1*cn)
dynm <- prep.formulaDynamics(formula=formula,
startval=c(a = .4, b = -.3, b1=-.2, a1=.3,
c = .3, c1=.3, d=-.5, d1=-.4
), isContinuousTime=FALSE)
initial <- prep.initial(
values.inistate=c(.15,.15),
params.inistate=c('mu_wp', 'mu_hp'),
values.inicov=matrix(c(1,0.1,
0.1,1),byrow=T,ncol=2),
params.inicov=matrix(c("v_11","c_21",
"c_21","v_22"),byrow=T,ncol=2))
mdcov <- prep.noise(
values.latent=matrix(c(1,0.3,
0.3,1),byrow=T,ncol=2),
params.latent=matrix(c("v_wp","c_hw",
"c_hw","v_hp"),byrow=T,ncol=2),
values.observed=diag(rep(0,2)),
params.observed=diag(c('fixed','fixed'),2))
model <- dynr.model(dynamics=dynm, measurement=meas,
noise=mdcov, initial=initial, data=rawdata,
outfile=paste("trial.c",sep=""))
plotFormula(dynrModel = model, ParameterAs = model$param.names, printDyn = TRUE, printMeas = TRUE)
dynr.mi() function is used to implement multiple imputation and parameter estimation procedures. In this particular trial, imp.obs
was set as FALSE
to request a partial MI approach.
Two auxiliary variables (i.e., x1 and x2) were included in the imputation model given their associations with missing data. We also included lagged information for dependent variables up to the previous time point (i.e., lag of 1) in the imputation model.
The number of imputations was set as 5, which is the default number of imputations in the mice() function in MICE package. The number of MCMC iterations in each imputation was set as 10 for the diagnostic purpose.
result <- dynr.mi(model, which.aux=c("x1","x2"),
which.lag=c("wp","hp"), lag=1,
which.lead=NULL, lead=0,
m=5, iter=10,
imp.obs=FALSE, imp.exo=TRUE,
diag = TRUE, Rhat=1.1,
conf.level=0.95,
verbose=FALSE, seed=12345)
truep <- c(a=0.4, b=-0.3, b1=-0.2, a1=0.3,
c=0.3, c1=0.3, d=-0.5, d1=-0.4,
v_wp = 1, c_hw = 0.3, v_hp = 1)
estp <- result$estimation.result[1:11, ]
data.frame(truep, estp)
truep | Estimate | Std..Error | t.value | ci.lower | ci.upper | Pr…t.. | |
---|---|---|---|---|---|---|---|
a | 0.4 | 0.3860217 | 0.0105734 | 36.50869 | 0.3652956 | 0.4067477 | 0 |
b | -0.3 | -0.2967944 | 0.0118591 | -25.02674 | -0.3200406 | -0.2735482 | 0 |
b1 | -0.2 | -0.2060782 | 0.0106906 | -19.27651 | -0.2270340 | -0.1851224 | 0 |
a1 | 0.3 | 0.2872139 | 0.0118920 | 24.15176 | 0.2639030 | 0.3105247 | 0 |
c | 0.3 | 0.2972987 | 0.0186882 | 15.90838 | 0.2606661 | 0.3339312 | 0 |
c1 | 0.3 | 0.2818793 | 0.0184995 | 15.23710 | 0.2456165 | 0.3181421 | 0 |
d | -0.5 | -0.4974293 | 0.0110962 | -44.82897 | -0.5191800 | -0.4756786 | 0 |
d1 | -0.4 | -0.3989903 | 0.0110657 | -36.05641 | -0.4206813 | -0.3772992 | 0 |
v_wp | 1.0 | 1.0463369 | 0.0188458 | 55.52100 | 1.0093953 | 1.0832784 | 0 |
c_hw | 0.3 | 0.3156262 | 0.0149351 | 21.13313 | 0.2863503 | 0.3449020 | 0 |
v_hp | 1.0 | 1.0225720 | 0.0182833 | 55.92941 | 0.9867331 | 1.0584109 | 0 |
With dynr.mi(), we obtained good recovery of parameters of interest.
#Rhat plot
result$Rhat.plot
From the Rhat plot, we can see that 6 iterations in each imputation would be sufficient to obtain stable imputation results.