This tutorial reviews the Actor-Partner Interdependence Model (APIM; Kashy & Kenny, 2000; Kenny, Kashy, & Cook, 2006), which is often used to examine the association (1) between two constructs for two people using cross-sectional data, or (2) between the same construct from two people across two time points.
In this tutorial, we’ll cover…
nlme
packageBefore we begin, let’s load the libraries we need.
library(ggplot2) #for plotting
library(nlme) #for model fitting
library(psych) #for general functions
library(reshape) #for data management
We’ll be using our trusty WISC data, so we’ll read that data set in.
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/wisc3raw_gender.csv"
#read in the .csv file using the url() function
wisc3raw <- read.csv(file=url(filepath),header=TRUE)
#subset to variables of interest
data <- wisc3raw[, c("id", "verb1", "verb6", "perfo1", "perfo6")]
head(data)
id | verb1 | verb6 | perfo1 | perfo6 |
---|---|---|---|---|
1 | 24.42 | 55.64 | 19.84 | 44.19 |
2 | 12.44 | 37.81 | 5.90 | 40.38 |
3 | 32.43 | 50.18 | 27.64 | 77.72 |
4 | 22.69 | 44.72 | 33.16 | 61.66 |
5 | 28.23 | 70.95 | 27.64 | 64.22 |
6 | 16.06 | 39.94 | 8.45 | 39.08 |
We’ll also make a long version of the data set for later use.
In this tutorial, we are going to examine the association between verbal and performance ability using measures from first grade and sixth grade. We are interested in simultaneously examining whether (1) verbal ability in the first grade is predictive of verbal ability in the sixth grade, (2) performance ability in the first grade is predictive of performance ability in the sixth grade, (3) verbal ability in the first grade is predictive of performance ability in the sixth grade, and (4) performance ability in the first grade is predictive of verbal ability in the sixth grade.
When working with dyads, the above points 1 and 2 are often referred to as “actor effects” and points 3 and 4 are often referred to as “partner effects.”
While this example is not a “traditional dyad” - i.e., two distinguishable people - the analytic processes demonstrated here for bivariate data are applicable to the examination of dyadic data.
Before we run our models, it is often useful to get more familiar with the data via plotting and descriptives statistics.
Let’s begin with descriptive statistics of our four variables of interest: first grade verbal and performance ability, and sixth grade verbal and performance ability.
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
X1 | 1 | 204 | 19.58505 | 5.807695 | 19.335 | 19.49768 | 5.41149 | 3.33 | 35.15 | 31.82 | 0.1301071 | -0.0528376 | 0.40662 |
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
X1 | 1 | 204 | 43.7499 | 10.66505 | 42.545 | 43.4561 | 11.29741 | 17.35 | 72.59 | 55.24 | 0.2356459 | -0.360521 | 0.7467029 |
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
X1 | 1 | 204 | 17.97706 | 8.349782 | 17.66 | 17.69116 | 8.295147 | 0 | 46.58 | 46.58 | 0.3510232 | -0.1089726 | 0.5846017 |
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
X1 | 1 | 204 | 50.93162 | 12.47987 | 51.765 | 51.07073 | 13.26927 | 10.26 | 89.01 | 78.75 | -0.0637532 | 0.1836101 | 0.873766 |
We can see that both the mean and standard deviation of verbal and performance ability increase from first to sixth grade. While this is worth noting, the APIM will not be examining changes in mean levels of verbal and performance ability.
Let’s also examine the correlations (i.e., associations and rank order stability) among the four variables.
## verb1 verb6 perfo1 perfo6
## verb1 1.0000000 0.6541040 0.6101379 0.4779672
## verb6 0.6541040 1.0000000 0.6183155 0.6106694
## perfo1 0.6101379 0.6183155 1.0000000 0.6958321
## perfo6 0.4779672 0.6106694 0.6958321 1.0000000
We can see that all four varibles are relatively nomrally distirbuted.
We can see there are strong, positive correlations both across time within-construct and between-constructs.
Let’s also look at a few individuals to examine the within-person association between verbal and performance ability from the first to sixth grade.
ggplot(data = subset(data_long, id <= 9), aes(x = grade, group = id), legend = FALSE) +
geom_point(aes(x = grade, y = verb), shape = 17, size = 3, color = "purple") +
geom_point(aes(x = grade, y = perfo), shape = 19, size = 3, color = "pink") +
geom_line(aes(x = grade, y = verb), lty = 1, size=1, color = "purple") +
geom_line(aes(x = grade, y = perfo), lty = 1, size=1, color = "pink") +
xlab("Grade") +
ylab("Verbal and Performance Ability") + ylim(0, 80) +
scale_x_continuous(breaks=seq(0, 7, by = 1)) +
theme_classic() +
facet_wrap( ~ id)
Within these nine participants, there is some inter-individual variability. For example, some participants’ verbal and performance ability increase more than others, some participants have higher levels of verbal ability than performance ability (or vice versa) across time. The APIM model is all about rank-order interindividual differences, so these intra-individual change plots are not quite matched to the APIM-type research questions. (see the tutorial on two types of 2-occasion change models).
We have learned to manipulate data from “wide” to “long,” and vice versa. Dyadic/bivariate analyses require further manipulation in order to get the data in the correct format for our analyses. We will walk through the data prep in two steps.
First, we need to create one column that has the information for both outcome variables - i.e., for each person, the verb6 and perfo6 values will alternate. This is almost like repeated measures data, but instead of having multiple time points nested within person, we have multiple (two) variables to nest within person.
data_melt <- reshape::melt(data = data,
id.vars = c("id", "verb1", "perfo1"),
na.rm=FALSE)
head(data_melt)
id | verb1 | perfo1 | variable | value |
---|---|---|---|---|
1 | 24.42 | 19.84 | verb6 | 55.64 |
2 | 12.44 | 5.90 | verb6 | 37.81 |
3 | 32.43 | 27.64 | verb6 | 50.18 |
4 | 22.69 | 33.16 | verb6 | 44.72 |
5 | 28.23 | 27.64 | verb6 | 70.95 |
6 | 16.06 | 8.45 | verb6 | 39.94 |
#rename "variable" and "value" variables to "grade6_variable" and "grade6_outcome"
colnames(data_melt)[4:5] <- c("grade6_variable", "grade6_outcome")
#re-order for convenience
data_melt <- data_melt[order(data_melt$id, data_melt$grade6_variable), ]
#look at updated data set
head(data_melt)
id | verb1 | perfo1 | grade6_variable | grade6_outcome | |
---|---|---|---|---|---|
1 | 1 | 24.42 | 19.84 | verb6 | 55.64 |
205 | 1 | 24.42 | 19.84 | perfo6 | 44.19 |
2 | 2 | 12.44 | 5.90 | verb6 | 37.81 |
206 | 2 | 12.44 | 5.90 | perfo6 | 40.38 |
3 | 3 | 32.43 | 27.64 | verb6 | 50.18 |
207 | 3 | 32.43 | 27.64 | perfo6 | 77.72 |
Second, we need to create two dummy variables (each 0/1) that will be useful in our analyses to “turn on/off” a row (more on this later). We will create a new variable, “verb_on” that indicates which rows are for the “verb6” outcome, We will create a second variable, “perform_on” that indicates which rows are for the “perfo6” oucome.
data_melt$verb_on <- ifelse(data_melt$grade6_variable == "verb6", 1, 0)
data_melt$perform_on <- ifelse(data_melt$grade6_variable == "perfo6", 1, 0)
head(data_melt)
id | verb1 | perfo1 | grade6_variable | grade6_outcome | verb_on | perform_on | |
---|---|---|---|---|---|---|---|
1 | 1 | 24.42 | 19.84 | verb6 | 55.64 | 1 | 0 |
205 | 1 | 24.42 | 19.84 | perfo6 | 44.19 | 0 | 1 |
2 | 2 | 12.44 | 5.90 | verb6 | 37.81 | 1 | 0 |
206 | 2 | 12.44 | 5.90 | perfo6 | 40.38 | 0 | 1 |
3 | 3 | 32.43 | 27.64 | verb6 | 50.18 | 1 | 0 |
207 | 3 | 32.43 | 27.64 | perfo6 | 77.72 | 0 | 1 |
There are many alternative ways to prepare dyadic data. For example, see https://github.com/RandiLGarcia/2day-dyad-workshop/blob/master/Day%201/R%20Code/Day%201-Data%20Restructuring.Rmd). It will depend on how you choose to run your analysis (described further later).
nlme
packageNow that we know a bit more about the data we are working with and have the data prepared in a single-outcome (double-entry) format, we can set up our APIM model. We’ll run this model in the nlme
package.
Specifically, we’ll examine whether:
Before running this full model, we will examine the empty model to determine how much variability there is within- and between-persons. Specifically,
\[Grade6Outcome_{i} = \beta_{0V}VerbOn_{it} + \beta_{0P}PerformOn_{it} + e_{Vi} +e_{Pi}\]
Empty model.
apim_empty <- gls(grade6_outcome ~ -1 + #turnd OFF the intercept
verb_on + #intercept for verbal scores
perform_on, #intercept for performance scores
correlation = corSymm(form= ~1|id),
weights = varIdent(form= ~1|verb_on), #allows for different error terms for the two variables
data = data_melt,
na.action = na.exclude)
summary(apim_empty)
## Generalized least squares fit by REML
## Model: grade6_outcome ~ -1 + verb_on + perform_on
## Data: data_melt
## AIC BIC logLik
## 3063.861 3083.893 -1526.931
##
## Correlation Structure: General
## Formula: ~1 | id
## Parameter estimate(s):
## Correlation:
## 1
## 2 0.611
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | verb_on
## Parameter estimates:
## 1 0
## 1.000000 1.170165
##
## Coefficients:
## Value Std.Error t-value p-value
## verb_on 43.74990 0.7467029 58.59077 0
## perform_on 50.93162 0.8737660 58.28977 0
##
## Correlation:
## verb_n
## perform_on 0.611
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.25897660 -0.72994511 -0.02108775 0.72002989 3.05118321
##
## Residual standard error: 10.66505
## Degrees of freedom: 408 total; 406 residual
We examine the correlation of the verbal and performance error terms to determine the degree of non-independence in the data. We can see that Rho = 0.61, indicating the correlation across dyad members, children who have higher verbal ability tend to also have higher performance ability.
Other things to note in this output…
Both of these expected values correspond to their respective means in the raw data (see descriptives above).
The grade 6 verbal and performance scores also each have estimated “standard error” values. The estimated standard error for Grade 6 verbal scores is 10.67 and the estimated standard error for Grade 6 performance scores is 12.48 = (1.17*10.67). Note that these estiamtes exactly match the respective standard devaitions in the raw data (see descriptives above).
For completeness of reppresenting the data we can get the means and variances of, and correlation between, the two predictors from the raw data.
Next, we are going to run our full APIM model using the two-intercept approach. Specifically,
\[Grade6Outcome_{i} = \beta_{0V}VerbOn_{it} + \beta_{1V}VerbOn_{it}*Verb1_{it} + \beta_{2V}VerbOn_{it}*Perform1_{it} + \beta_{0P}PerformOn_{it} + \beta_{1P}PerformOn_{it}*Perform1_{it} + \beta_{2P}PerformOn_{it}*Verb1_{it} + e_{Vi} +e_{Pi}\]
So when “verb_on” is equal to 0:
\[Grade6Outcome_{i} = \beta_{0P}PerformOn_{it} + \beta_{1P}PerformOn_{it}*Perform1_{it} + \beta_{2P}PerformOn_{it}*Verb1_{it} + e_{Vi} +e_{Pi}\]
and when “perform_on” is equal to 0:
\[Grade6Outcome_{it} = \beta_{0V}VerbOn_{it} + \beta_{1V}VerbOn_{it}*Verb1_{it} + \beta_{2V}VerbOn_{it}*Perform1_{it} + e_{Vi} +e_{Pi}\]
Full model.
apim_full <- gls(grade6_outcome ~ -1 + #turns OFF intercept
verb_on + #verbal intercept
verb_on:verb1 + #verbal "actor" effect
verb_on:perfo1 + #performance "partner" effect
perform_on + #performance intercept
perform_on:perfo1 + #performance "actor" effect
perform_on:verb1, #verbal "partner" effect
correlation = corSymm(form=~1|id),
weights = varIdent(form=~1|verb_on), #allows for different error terms for the two variables
data = data_melt,
na.action = na.exclude)
summary(apim_full)
## Generalized least squares fit by REML
## Model: grade6_outcome ~ -1 + verb_on + verb_on:verb1 + verb_on:perfo1 + perform_on + perform_on:perfo1 + perform_on:verb1
## Data: data_melt
## AIC BIC logLik
## 2879.029 2914.997 -1430.514
##
## Correlation Structure: General
## Formula: ~1 | id
## Parameter estimate(s):
## Correlation:
## 1
## 2 0.312
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | verb_on
## Parameter estimates:
## 1 0
## 1.000000 1.188537
##
## Coefficients:
## Value Std.Error t-value p-value
## verb_on 19.869325 1.8634459 10.662679 0.0000
## perform_on 30.049124 2.2147736 13.567582 0.0000
## verb_on:verb1 0.809886 0.1150893 7.037017 0.0000
## verb_on:perfo1 0.446064 0.0800504 5.572292 0.0000
## perfo1:perform_on 0.962419 0.0951429 10.115507 0.0000
## verb1:perform_on 0.182846 0.1367879 1.336709 0.1821
##
## Correlation:
## verb_n prfrm_ vrb_n:v1 vrb_n:p1 prf1:_
## perform_on 0.312
## verb_on:verb1 -0.738 -0.230
## verb_on:perfo1 -0.034 -0.011 -0.610
## perfo1:perform_on -0.011 -0.034 -0.190 0.312
## verb1:perform_on -0.230 -0.738 0.312 -0.190 -0.610
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.55877996 -0.66034371 -0.02441614 0.62804539 3.69258907
##
## Residual standard error: 7.545255
## Degrees of freedom: 408 total; 402 residual
Let’s interpret the results!
The expected verbal score at grade 6 = 19.87 and the expected performance score at grade 6 = 30.05 when verbal and performance scores at grade 1 are equal to zero.
We’ve walked through one way of running an APIM model in R. There are, of course, lots of alternatives. Here are a few resources if you’d like to learn more about running an APIM model or dyadic data analyses in general.
Thanks for playing!