This document carries the code and output used in Brick and Bailey (submitted). Specifically, we illustrate the use of Matrices of Implied Causation using a mediation model and three examples from the literature. They are:
First we need MICr, OpenMx, and also some tools for plotting.
library(MICr)
library(OpenMx)
# Turn on plots if semPlot is available:
if(library(semPlot, logical.return=TRUE)) drawPlots<-TRUE else drawPlots <- FALSE
The first is a dummy example:
XYZmodel <- mxModel("XYZ", manifestVars=c("X", "Y", "Z"), type="RAM",
mxPath(from="Z", to="Y", values=.6, labels="\\Beta"),
mxPath(from="X", to=c("Y", "Z"), values=c(.2, .8), labels=c("\\lambda", "\\alpha")),
mxMatrix("Iden",3,name="I"))
Whenever we run a model like this, I’ll add a sempaths plot. These models are complex, so the plot generated there is rarely a good view of what the model should look like. The figures in the paper itself are a better view of the models, but for generation purposes we create them here. They are generated with Onyx (see http://onyx.brandmaier.de/ for more), although some are modified for readability.
if(drawPlots) semPaths(XYZmodel)
We can render the MIC for this model directly.
MIC(XYZmodel)
X Y Z
X 1.00 0 0.0
Y 0.68 1 0.6
Z 0.80 0 1.0
The MIC is read like the A
matrix in a RAM model: the path from X to Z is in the column for X and the row for Z (here, it is .8). As expected, this shows the total effect–the model-implied causal influence–from each variable to each other variable.
This can be rendered more succinctly as a MIC Table:
MICTable(XYZmodel)
Table: Implied Causation Table: XYZ
from to XYZ
----- --- -----
X Z 0.80
X Y 0.68
Z Y 0.60
The first example shows three cases from Tomarken & Waller’s 2003 paper. These three models fit identically, but make very different causal predictions.
The first model is a
tomarken1A <- mxModel("Tomarken1A",
type="RAM", manifestVars = c("X", "Y", "Z"), latentVars = c("Ry", "Rz"),
# Latent to manifest paths
mxPath(from="Ry", to="Y", free=FALSE, value=1.0, label=c("Ry_to_Y") ),
mxPath(from="Rz", to="Z", free=FALSE, value=1.0, label=c("Rz_to_Z") ),
# Manifest chain
mxPath(from="X", to="Y", free=TRUE, values=0.8, labels=c("\\alpha") ),
mxPath(from="Y", to="Z", free=TRUE, values=0.8, labels=c("\\beta") ),
# Variances
mxPath(from=c("Ry", "Rz", "X"),
arrows=2, free=TRUE, values=1.0,
labels=c("V_Ry", "V_Rz", "V_X") )
)
Sempaths plots of these models show the differences fairly well.
if(drawPlots) semPaths(tomarken1A)
Model 1B resembles model 1A, but the chain flows in the opposite direction, with Z predicting Y and Y predicting X.
tomarken1B <- mxModel("Tomarken1B",
type="RAM", manifestVars = c("X","Y","Z"), latentVars = c("Rx", "Ry"),
# Latent to manifest paths
mxPath(from="Ry", to="Y", free=FALSE, value=1.0, label=c("Ry_to_Y") ),
mxPath(from="Rx", to="X", free=FALSE, value=1.0, label=c("Rx_to_X") ),
# Manifest chain
mxPath(from="Z", to="Y", free=TRUE, values=0.8, labels=c("\\beta") ),
mxPath(from="Y", to="X", free=TRUE, values=0.8, labels=c("\\alpha") ),
# Variances
mxPath(from=c("Ry", "Rx", "Z"),
arrows=2, free=TRUE, values=1.0,
labels=c("V_Ry", "V_Rx", "V_Z") )
)
# Throws a warning because the model has not been run.
if(drawPlots) semPaths(tomarken1B)
The last Tomarken example model is a fork, with Y predicting X and Z.
tomarken1C <- mxModel("Tomarken1C",
type="RAM", manifestVars = c("X","Y","Z"), latentVars = c("Rz","Rx"),
# Fork loadings
mxPath(from="Y", to=c("Z","X"), free=TRUE, values=.8, labels=c("\\beta","\\alpha") ),
# Latent residual terms
mxPath(from="Rz", to="Z", free=TRUE, value=1.0, label="Rz_to_Z" ),
mxPath(from="Rx", to="X", free=TRUE, value=1.0, label="Ry_to_Y" ),
mxPath(from=c("Y", "Rx", "Rz"),
arrows=2, free=FALSE, values=1.0,
labels=c("V_Y", "VRx", "V_Rz"))
)
# Throws a warning because the model has not been run.
if(drawPlots) semPaths(tomarken1C)
The MICs can be rendered directly as matrices.
MIC(tomarken1A)
X Y Z Ry Rz
X 1.00 0.0 0 0.0 0
Y 0.80 1.0 0 1.0 0
Z 0.64 0.8 1 0.8 1
Ry 0.00 0.0 0 1.0 0
Rz 0.00 0.0 0 0.0 1
MIC(tomarken1B)
X Y Z Rx Ry
X 1 0.8 0.64 1 0.8
Y 0 1.0 0.80 0 1.0
Z 0 0.0 1.00 0 0.0
Rx 0 0.0 0.00 1 0.0
Ry 0 0.0 0.00 0 1.0
MIC(tomarken1C)
X Y Z Rz Rx
X 1 0.8 0 0 1
Y 0 1.0 0 0 0
Z 0 0.8 1 1 0
Rz 0 0.0 0 1 0
Rx 0 0.0 0 0 1
A more succinct approach is to plot the MICTable that compares all three. Because the residuals are not of particular interest, we focus only on the influences from and to X, Y, and Z. Note that this table differs in its sorting function from the one displayed in the paper.
MICTable(tomarken1A, tomarken1B, tomarken1C, from=c("X", "Y", "Z"), to=c("X", "Y", "Z"))
Table: Implied Causation Table: Tomarken1A, Tomarken1B, Tomarken1C Existence & Sign & Scale
from to Tomarken1A Tomarken1B Tomarken1C
----- --- ----------- ----------- -----------
X Y 0.80 0.00 0.0
Y Z 0.80 0.00 0.8
X Z 0.64 0.00 0.0
Y X 0.00 0.80 0.8
Z X 0.00 0.64 0.0
Z Y 0.00 0.80 0.0
Example 2 comes from Duncan(2007) courtesy of Bailey et al. (2018?). Three fairly intricate models, and we’ll do one better by dropping the person-level intercepts.
First, need the data and the correlation matrix–it’s in the paper, luckily.
means <- c(68.08, 93.09, 107.40, 116.80, 49.91, 73.87, 90.65, 102.87)
times <- c("K", 1:3)
nTimes <- length(times)
manifests <- paste(rep(c("Read", "Math"), each=nTimes), rep(times, 2), sep="")
occasions <- paste(rep(c("OR", "OM"), each=nTimes), rep(times, 2), sep="")
sds <- c(14.17, 18.02, 15.51, 14.82, 12.84, 17.64, 16.66, 15.63)
corMat <- matrix(c(
1.00, 0.79, 0.71, 0.64, 0.73, 0.66, 0.62, 0.59,
0.79, 1.00, 0.86, 0.78, 0.71, 0.73, 0.70, 0.67,
0.71, 0.86, 1.00, 0.85, 0.69, 0.71, 0.74, 0.69,
0.64, 0.78, 0.85, 1.00, 0.67, 0.70, 0.73, 0.72,
0.73, 0.71, 0.69, 0.67, 1.00, 0.83, 0.78, 0.75,
0.66, 0.73, 0.71, 0.70, 0.83, 1.00, 0.86, 0.82,
0.62, 0.70, 0.74, 0.73, 0.78, 0.86, 1.00, 0.88,
0.59, 0.67, 0.69, 0.72, 0.75, 0.82, 0.88, 1.00), nrow=8)
covMat <- diag(sds) %&% corMat
names(means) <- manifests
dimnames(covMat) <- list(manifests, manifests)
dimnames(corMat) <- list(manifests, manifests)
Our first model has fixed 0 loadings from the latent intercepts.
ModelCLPM1 <- mxModel("CLPM1", type="RAM", manifestVars = manifests,
latentVars=c("Reading", "Math", occasions),
# OR/OM -> Math/Reading paths
mxPath(from=occasions, to=manifests,
arrows=1, values=1, free=FALSE),
# Occasion variances
mxPath(from=occasions, arrows=2, values=1, free=TRUE,
labels=paste0("V_", occasions), lbound=.1),
# Covariation of error/process
mxPath(from=occasions[1:4], to=occasions[5:8],
arrows=2, free=TRUE, values=.25,
labels=paste0("ORM_", times)),
# AR of Reading
mxPath(from=occasions[1:3], to=occasions[2:4],
arrows=1, free=TRUE, values=.5,
labels=paste0("AR_R_", times[1:3])),
# AR of Math
mxPath(from=occasions[5:7], to=occasions[6:8],
arrows=1, free=TRUE, values=.3,
labels=paste0("AR_M_", times[1:3])),
# Cross-lagged R->M
mxPath(from=occasions[1:3], to=occasions[6:8],
arrows=1, free=TRUE, values=.10,
labels=paste0("CL_R", times[1:3], "_M", times[2:4])),
# Cross-lagged M->R
mxPath(from=occasions[5:7], to=occasions[2:4],
arrows=1, free=TRUE, values=.03,
labels=paste0("CL_M", times[1:3], "_R", times[2:4])),
mxData(type="cor", observed = corMat, numObs = 9612)
)
CLPM1 <- mxTryHard(ModelCLPM1)
if(drawPlots) semPaths(CLPM1)
CLPM2 is modified from CLPM1, and adds loadings and a free covariance for the latent intercepts so that they have influence on the model. We mark the random intercepts as having identical loadings on the manifest measurements across time.
ModelCLPM2 <- mxModel(CLPM1,
# RI factor loadings (fixed across time)
mxPath(from="Reading", to=paste0("Read", times),
values=1, free=TRUE, labels=paste0("Read_Loadings")),
mxPath(from="Math", to=paste0("Math", times),
values=1, free=TRUE, labels=paste0("Math_Loadings")),
# Free RI covariance; RIs are standardized for identification.
mxPath(from=c("Reading", "Math"),
arrows=2, connect="unique.pairs", free=c(TRUE, FALSE, TRUE),
values=c(1,.8, 1), labels=c("V_R", "CV_RM", "V_M")),
name="CLPM2"
)
CLPM2 <- mxTryHard(ModelCLPM2)
if(drawPlots) semPaths(CLPM2)
Once we have the details of more specific models, we can examine them in detail to compare the differences. We are specifically interested in the effects of Kindergarten occasion measurements on the final outcomes.
MICTable(CLPM1, CLPM2, from=c("OMK", "ORK"),
to=paste0(rep(c("Read", "Math"), each=3), 1:3), se=FALSE)
Table: Implied Causation Table: CLPM1, CLPM2
from to CLPM1 CLPM2
----- ------ ---------- ----------
OMK Math1 0.7454506 0.2353760
OMK Math2 0.6010844 0.0928142
ORK Read1 0.5816741 0.4725914
OMK Math3 0.5199614 0.0484702
ORK Read2 0.4458971 0.3108513
OMK Read3 0.3669879 0.0384978
ORK Read3 0.3447392 0.1939866
OMK Read2 0.3399478 0.0446693
OMK Read1 0.2853778 0.0522892
ORK Math3 0.1823172 0.0379360
ORK Math2 0.1764470 0.0646919
ORK Math1 0.1158210 0.0611458
We take the example from CLPM2A, and we add in a few more elments. Specifically, we add two interventions that are latent variables. Note that these are approximated based on theory or prior work and not statistically estimated from this data set. We fix the loadings in the estimation of these data so that they do not influence the rest of the fitting procedures.
ModelIxn<- mxModel("InterventionModel", type="RAM", manifestVars = manifests,
latentVars=c("Reading", "Math", occasions, "XRead", "ReadStudy", "AllStudy"),
# RI factor loadings
mxPath(from="Reading", to=paste0("Read", times),
arrows=1, values=1, free=TRUE,
labels=paste("Read", times, sep="_")),
mxPath(from="Math", to=paste0("Math", times),
arrows=1, values=1, free=TRUE,
labels=paste("Math", times, sep="_")),
# Latent math/reading paths
mxPath(from=occasions, to=manifests, arrows=1, values=1, free=FALSE),
# Occasion variances
mxPath(from=occasions, arrows=2, values=1, free=TRUE,
labels=paste0("V_", occasions), lbound=.1),
# RI Var/Covar. Icpts standardized by definition.
mxPath(from=c("Reading", "Math"),
arrows=2, connect="unique.pairs", values=c(1,.8, 1),
free=c(FALSE, TRUE, FALSE), labels=c("VR", "CRM", "VM")),
# Covariation of error/process
mxPath(from=occasions[1:4], to=occasions[5:8], arrows=2,
free=TRUE, values=.25, labels=paste0("ORM_", times)),
# AR of Reading
mxPath(from=occasions[1:3], to=occasions[2:4], arrows=1,
free=TRUE, values=.5, labels=paste0("AR_R_", times[1:3])),
# AR of Math
mxPath(from=occasions[5:7], to=occasions[6:8], arrows=1,
free=TRUE, values=.3, labels=paste0("AR_M_", times[1:3])),
# Cross-lagged R->M
mxPath(from=occasions[1:3], to=occasions[6:8], arrows=1,
free=TRUE, values=.10,
labels=paste0("CL_R", times[1:3], "_M", times[2:4])),
# Cross-lagged M->R
mxPath(from=occasions[5:7], to=occasions[2:4], arrows=1,
free=TRUE, values=.03,
labels=paste0("CL_M", times[1:3], "_R", times[2:4])),
# RI factor loadings
mxPath(from="Reading", to=paste0("Read", times), arrows=1,
values=1, free=TRUE, labels=paste0("Read_Loadings")),
mxPath(from="Math", to=paste0("Math", times), arrows=1,
values=1, free=TRUE, labels=paste0("Math_Loadings")),
# Interventions:
mxPath(from="XRead", to="OR1",
values=1, free=FALSE),
mxPath(from="ReadStudy", to=c("OR1", "OR2"),
values=.6, free=FALSE),
mxPath(from="AllStudy", to=c("OM1", "OM2", "OR1", "OR2"),
values=c(.4, .3, .4, .3),
free=FALSE),
mxData(type="cor", observed = corMat, numObs = 9612)
)
Ixn <- mxTryHard(ModelIxn, extraTries = 100)
Again, we can plot this, but it’s sufficiently complicated that it’s difficult to see all the details in the plot.
if(drawPlots) semPaths(Ixn)
Of particular import for this comparison, we can look at the influence of the three interventions based on this model, including standard errors computed using the difference method.
MICTable(Ixn, from=c("XRead", "ReadStudy", "AllStudy"), to=c("Read3"))
Table: Implied Causation Table: InterventionModel
from to InterventionModel InterventionModel_SE
---------- ------ ------------------ ---------------------
ReadStudy Read3 0.5990486 0.0186693
AllStudy Read3 0.4075674 0.0153353
XRead Read3 0.4008333 0.0173369
From here, we can clearly see that for reading at grade 3, the ReadStudy intervention has the largest model-implied causal influence on the outcome. Note once again that this may not be the whole story.
There are cases where a researcher may wish to compare potential interventions on both the cost and effectiveness of interventions, rather than simply the effectiveness. While the discussion of multiobjective optimization as a whole is beyond the scope of this vignette, we provide a basic function to allow this type of comparison in the MICr
package.
Specifically, the package permits researchers to generate a cost table that lays out the costs of implementation of their predicted interventions, and to generate a plot of the Pareto Frontier, which shows the most effective interventions at each level of cost.
In practice, cost functions can be complicated to compute, since challenges ranging from effort, employee time, logistical issues, recruitment, participant buy-in and dropout, etc., may need to be weighed in this computation. When all else fails, total monetary cost (e.g. in thousands of dollars) to apply the intervention to a given number of participants is often used as a general factor. Here we generate a cost function arbitrarily for pedagogical purposes; this should not be interpreted to be related to any real-world intervention.
costFrame <- data.frame(name=c("XRead", "ReadStudy", "AllStudy"), cost=c(100,250,280))
The cost frame and the model can be submitted to the paretoFront() function, which will compute and plot the pareto frontier for a given outcome and cost.
paretoFront(mic=Ixn, outcome="Read3", costs=costFrame)
.DisplayName name cost effect optimal
3 XRead XRead 100 0.4008333 TRUE
2 ReadStudy ReadStudy 250 0.5990486 TRUE
1 AllStudy AllStudy 280 0.4075674 FALSE
The Pareto plot shows cost on the X-axis and effect (here in terms of grade-3 reading) on the Y-axis. The line indicates the Pareto Frontier–the maximum effect that can be expected at a given level of cost. Following the line to the left from a given cost shows lowest-cost intervention with that effectiveness. Here, for a cost of 200, the best intervention is XRead, which is also the best at a cost of 100. For a cost of 300 or less, the model-implied causal effect of ReadStudy on third grade reading is higher than that of any other intervention.
It is worth noting that reading at grade 3 may not be the only outcome of interest. For example, the plot changes drastically if we examine math achievement at grade 3.
paretoFront(mic=Ixn, outcome="Math3", costs=costFrame)
.DisplayName name cost effect optimal
3 XRead XRead 100 0.05521542 TRUE
2 ReadStudy ReadStudy 250 0.04203542 FALSE
1 AllStudy AllStudy 280 0.25852865 TRUE
Researchers should think hard about what outcomes are important, and potentially consider several cost functions in determining the results.