This tutorial walks through a few helpful initial steps before analysis of experience sampling and EMA data (or any analyses, actually). Specifically, this tutorial demonstrates how to get the data in and obtain initial descriptive statistics and plots that are useful when making decisions about analyses.
Our examples make use of The AMIB Data, a multiple time-scale data set that has been shared for teaching purposes.
This script covers …
A. Loading The AMIB Data
B. Describing some aspects of the data
C. A few data visualizations
Loading libraries used in this script.
library(psych) # for describing the data
library(plyr) #for data manipulation
library(ggplot2) # for data visualization
The first step, and often the most frustrating part of data analysis is … getting the data into the program!
In recent years people are generally exchanging data files in a .csv format as these files port well and can interfce with many different software packages.
Here, we make use of the person-level and interaction-level (EMA-type) AMIB data files, (which can be merged easily using the id and day variables).
Loading person-level file (N = 190)
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIBshare_persons_2019_0501.csv"
#read in the .csv file using the url() function
AMIB_persons <- read.csv(file=url(filepath),header=TRUE)
Loading interaction-level file (T = many, on average 43 per person).
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIBshare_interaction_2019_0501.csv"
#read in the .csv file using the url() function
AMIB_interaction <- read.csv(file=url(filepath),header=TRUE)
Once the data are in, we can begin learning about them.
Subsetting to a few variables.
#subsetting to a few trait variables
AMIB_persons <- AMIB_persons[ ,c("id","sex",
"bfi_o","bfi_c","bfi_e","bfi_a","bfi_n")]
Descriptives of traits using person-level data (sex, personality).
#basic descriptives (using describe() from the psych package)
describe(AMIB_persons)
## vars n mean sd median trimmed mad min max range skew
## id 1 190 318.29 130.44 321.5 318.99 151.23 101.0 532 431.0 -0.04
## sex 2 190 1.66 0.48 2.0 1.70 0.00 1.0 2 1.0 -0.66
## bfi_o 3 190 3.60 0.96 3.5 3.64 0.74 1.0 5 4.0 -0.34
## bfi_c 4 190 3.76 0.85 4.0 3.77 0.74 1.5 5 3.5 -0.12
## bfi_e 5 190 3.38 1.00 3.5 3.40 0.74 1.0 5 4.0 -0.21
## bfi_a 6 190 3.61 0.88 3.5 3.69 0.74 1.0 5 4.0 -0.72
## bfi_n 7 190 2.98 0.96 3.0 3.00 1.48 1.0 5 4.0 -0.09
## kurtosis se
## id -1.09 9.46
## sex -1.57 0.03
## bfi_o -0.48 0.07
## bfi_c -0.90 0.06
## bfi_e -0.58 0.07
## bfi_a 0.14 0.06
## bfi_n -0.82 0.07
#psych::describe(AMIB_persons)
#correlations
cor(AMIB_persons[ ,-1]) #dropping 1st column (id)
## sex bfi_o bfi_c bfi_e bfi_a
## sex 1.00000000 0.06161000 0.03394367 0.18873546 0.04422936
## bfi_o 0.06161000 1.00000000 0.03093131 0.13087435 0.03581446
## bfi_c 0.03394367 0.03093131 1.00000000 -0.01870259 0.08472053
## bfi_e 0.18873546 0.13087435 -0.01870259 1.00000000 0.08558411
## bfi_a 0.04422936 0.03581446 0.08472053 0.08558411 1.00000000
## bfi_n 0.18389358 -0.05557229 -0.02838956 -0.16160302 -0.12293379
## bfi_n
## sex 0.18389358
## bfi_o -0.05557229
## bfi_c -0.02838956
## bfi_e -0.16160302
## bfi_a -0.12293379
## bfi_n 1.00000000
#plot matrix (using describe() from the psych package)
pairs.panels(AMIB_persons[ ,-1])
Note that the person-level descriptes are “cross-sectional”.
Subsetting to a few variables.
ID and TIME variables: “id”,“day”,“interaction”,“timea”
Outcome variables: “partner_gender”,“agval”,“stress”
#subsetting to a few variables
AMIB_interaction <- AMIB_interaction[ ,c("id","day","interaction", "timea",
"partner_gender","agval","stress")]
Often, our analyses will make use of both the repeated measures data and the person-level data. For illustration, we merge them here.
Merging person-level data into repeated meausures interaction-level data.
#merging repeated mesaures and person-level data
interaction_long <- merge(AMIB_interaction, AMIB_persons, by="id")
#Look at first few rows of data
head(interaction_long,10)
## id day interaction timea partner_gender agval stress sex bfi_o bfi_c
## 1 101 1 1 700 0 3 1 2 4 4
## 2 101 1 2 1230 1 8 1 2 4 4
## 3 101 1 3 1245 1 8 1 2 4 4
## 4 101 1 4 1330 1 8 1 2 4 4
## 5 101 1 5 1420 1 6 1 2 4 4
## 6 101 1 6 1445 1 8 1 2 4 4
## 7 101 1 7 1920 1 8 1 2 4 4
## 8 101 1 8 2030 1 8 1 2 4 4
## 9 101 2 9 30 0 9 0 2 4 4
## 10 101 2 10 0 1 8 0 2 4 4
## bfi_e bfi_a bfi_n
## 1 3.5 1.5 2
## 2 3.5 1.5 2
## 3 3.5 1.5 2
## 4 3.5 1.5 2
## 5 3.5 1.5 2
## 6 3.5 1.5 2
## 7 3.5 1.5 2
## 8 3.5 1.5 2
## 9 3.5 1.5 2
## 10 3.5 1.5 2
#checking number of persons
length(unique(interaction_long$id))
## [1] 184
Note that there are only 184 persons in the data now (vs. N = 190 in the person-leevl file). In this case, the discrepancy is becauese there were 6 persons that completed baseline questionnaires, but did not provide any EMA data.
Descriptives of the merged interaction-level and person-level data.
#basic descriptives (using describe() from the psych package)
describe(interaction_long)
## vars n mean sd median trimmed mad min max
## id 1 7568 330.55 122.47 328.0 334.09 145.29 101.0 532
## day 2 7568 3.97 1.99 4.0 3.96 2.97 1.0 7
## interaction 3 7568 23.31 14.74 22.0 22.56 17.79 1.0 56
## timea 4 7500 1496.82 445.29 1500.0 1495.34 489.26 0.0 2800
## partner_gender 5 6884 0.60 0.49 1.0 0.62 0.00 0.0 1
## agval 6 7553 6.68 1.96 7.0 6.92 1.48 1.0 9
## stress 7 7544 1.31 1.33 1.0 1.14 1.48 0.0 5
## sex 8 7568 1.70 0.46 2.0 1.75 0.00 1.0 2
## bfi_o 9 7568 3.60 0.95 3.5 3.64 0.74 1.0 5
## bfi_c 10 7568 3.77 0.85 4.0 3.79 0.74 1.5 5
## bfi_e 11 7568 3.41 0.97 3.5 3.43 0.74 1.0 5
## bfi_a 12 7568 3.61 0.87 3.5 3.69 0.74 1.0 5
## bfi_n 13 7568 3.00 0.96 3.0 3.00 1.48 1.0 5
## range skew kurtosis se
## id 431.0 -0.17 -0.91 1.41
## day 6.0 0.02 -1.24 0.02
## interaction 55.0 0.35 -0.90 0.17
## timea 2800.0 -0.10 -0.14 5.14
## partner_gender 1.0 -0.39 -1.85 0.01
## agval 8.0 -0.95 0.33 0.02
## stress 5.0 0.77 -0.32 0.02
## sex 1.0 -0.85 -1.27 0.01
## bfi_o 4.0 -0.41 -0.24 0.01
## bfi_c 3.5 -0.15 -0.86 0.01
## bfi_e 4.0 -0.24 -0.52 0.01
## bfi_a 4.0 -0.65 -0.02 0.01
## bfi_n 4.0 -0.03 -0.76 0.01
#plot matrix (using describe() from the psych package)
pairs.panels(interaction_long[ ,c("partner_gender","agval","stress")])
BE CAREFUL! These desriptives ignore the nesting of the data (i.e., are a mix of within-person and between-person information).
Among the first steps when describing experience sampling data is describing the number of observations. The procotol was designed for 8 reports per day for 7 days = 56 observations.
Describing compliance.
#counts of days (day 0 to day 8) completed
table(interaction_long$day)
##
## 1 2 3 4 5 6 7
## 1101 1094 1071 1128 1055 1076 1043
#counting number of days and interaction reports completed for each person
compliance_stats <- ddply(interaction_long, "id", summarize,
num_days = length(unique(day)),
num_obs = length(id))
#Alternative using dplyr package
#compliance_stats <- interaction_long %>% group_by(id) %>%summarize(num_days = length(unique(day)))
#looking at counts
describe(compliance_stats)
## vars n mean sd median trimmed mad min max range skew
## id 1 184 321.97 129.72 323.5 323.39 151.23 101 532 431 -0.07
## num_days 2 184 6.82 0.64 7.0 6.99 0.00 2 7 5 -4.66
## num_obs 3 184 41.13 13.62 43.0 42.45 17.79 10 56 46 -0.53
## kurtosis se
## id -1.09 9.56
## num_days 25.30 0.05
## num_obs -0.94 1.00
table(compliance_stats$num_days, useNA = "always")
##
## 2 3 4 5 6 7 <NA>
## 1 1 1 5 11 165 0
table(compliance_stats$num_obs, useNA = "always")
##
## 10 11 13 14 15 16 17 18 19 20 22 23 25 26 28
## 2 1 1 2 2 3 1 4 2 2 1 5 5 2 4
## 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
## 4 4 6 2 4 6 3 3 2 1 2 3 5 5 6
## 44 45 46 47 48 49 50 51 52 53 54 55 56 <NA>
## 2 4 3 4 6 2 5 6 4 2 2 7 44 0
#histogram
ggplot(data = compliance_stats, aes(x = num_obs)) +
geom_histogram(fill="white", color="black") +
labs(x = "Number of Observations Completed")
Note that the specific way one does these counts depends on the structure of the initial data file - e.g., whether missed reports included or not (here they were not). The counts can then be summarized for inclusion in a report.
Behavioral science tends to use relational data structures - i.e., spreadsheets. Typically the data are stored/configured as a data frame (a “fancy” matrix) with multiple rows and multiple columns. Two main schema are used to accommodate repeated measures data - “Wide Format” and “Long Format”. Different functions work with different kinds of data input. Thus, it is imperative that one can convert the data back and forth between wide and long formats.
There are lots of ways to do this. One way to go from long to wide is using the reshape()
function (not to be confused with the reshape2 package). See also melt()
and cast()
functions in the reshape2 package.
#trick to get variable names for easy cut-and-paste
dput(names(interaction_long))
## c("id", "day", "interaction", "timea", "partner_gender", "agval",
## "stress", "sex", "bfi_o", "bfi_c", "bfi_e", "bfi_a", "bfi_n")
#rehaping long to wide (using reshape2 package)
interaction_wide <- reshape(data=interaction_long,
timevar=c("interaction"),
idvar=c("id","sex","bfi_o","bfi_c","bfi_e","bfi_a","bfi_n"),
v.names=c("timea","partner_gender","agval","stress"),
direction="wide", sep="_",
drop=c("day")) # to drop variables not needed
#looking at part of data file
head(interaction_wide[,1:10], 10)
## id sex bfi_o bfi_c bfi_e bfi_a bfi_n timea_1 partner_gender_1 agval_1
## 1 101 2 4.0 4.0 3.5 1.5 2.0 700 0 3
## 57 103 2 5.0 3.5 4.0 4.5 2.5 1100 0 9
## 71 104 2 3.0 4.5 3.0 4.5 2.5 700 1 7
## 126 105 2 4.5 3.0 3.5 3.5 3.5 1100 1 7
## 159 106 2 3.0 5.0 3.0 3.5 1.5 1100 1 6
## 205 107 1 4.0 5.0 5.0 4.0 1.5 1110 0 9
## 228 108 2 3.0 5.0 3.5 3.0 4.5 1327 NA 1
## 284 109 1 3.0 3.0 3.5 3.0 4.0 1115 1 9
## 315 110 2 5.0 3.5 3.0 3.5 3.5 1010 0 3
## 340 111 1 2.0 3.0 3.0 3.5 3.5 930 NA 2
Note that the interaction_wide data has 184 rows, and that NAs have been filled in for those with less than 56 reports.
There are lots of ways to do this. One way to go from wide to long is using the reshape()
function (not to be confused with the reshape2 package). See also melt()
and cast()
functions in the reshape2 package.
interaction_longNEW <- reshape(data=interaction_wide,
timevar=c("interaction"),
idvar=c("id","sex","bfi_o","bfi_c","bfi_e","bfi_a","bfi_n"),
direction="long", sep="_")
#reording by id and time for convenient viewing
interaction_longNEW <- interaction_longNEW[order(interaction_longNEW$id, interaction_longNEW$interaction), ]
#looking at part of data file
head(interaction_longNEW, 10)
## id sex bfi_o bfi_c bfi_e bfi_a bfi_n interaction
## 101.2.4.4.3.5.1.5.2.1 101 2 4 4 3.5 1.5 2 1
## 101.2.4.4.3.5.1.5.2.2 101 2 4 4 3.5 1.5 2 2
## 101.2.4.4.3.5.1.5.2.3 101 2 4 4 3.5 1.5 2 3
## 101.2.4.4.3.5.1.5.2.4 101 2 4 4 3.5 1.5 2 4
## 101.2.4.4.3.5.1.5.2.5 101 2 4 4 3.5 1.5 2 5
## 101.2.4.4.3.5.1.5.2.6 101 2 4 4 3.5 1.5 2 6
## 101.2.4.4.3.5.1.5.2.7 101 2 4 4 3.5 1.5 2 7
## 101.2.4.4.3.5.1.5.2.8 101 2 4 4 3.5 1.5 2 8
## 101.2.4.4.3.5.1.5.2.9 101 2 4 4 3.5 1.5 2 9
## 101.2.4.4.3.5.1.5.2.10 101 2 4 4 3.5 1.5 2 10
## timea partner_gender agval stress
## 101.2.4.4.3.5.1.5.2.1 700 0 3 1
## 101.2.4.4.3.5.1.5.2.2 1230 1 8 1
## 101.2.4.4.3.5.1.5.2.3 1245 1 8 1
## 101.2.4.4.3.5.1.5.2.4 1330 1 8 1
## 101.2.4.4.3.5.1.5.2.5 1420 1 6 1
## 101.2.4.4.3.5.1.5.2.6 1445 1 8 1
## 101.2.4.4.3.5.1.5.2.7 1920 1 8 1
## 101.2.4.4.3.5.1.5.2.8 2030 1 8 1
## 101.2.4.4.3.5.1.5.2.9 30 0 9 0
## 101.2.4.4.3.5.1.5.2.10 0 1 8 0
Note that the interaction_longNEW data has 10304 rows, whereas the original data interaction_long had only 7568 rows. This is because NAs were filled in for all of those persons with less than 56 reports.
BE CAREFUL WITH RESHAPE ## THE ARGUMENTS NEEDED ARE NOT ALWAYS STRAIGHTFORWARD. Be sure that the missing data (NAs) have been handled in the intended way.
There are lots of ways to visualize repeated measures data. We encourage to engage in lots of data visualilation and learn as much about the data as possible. Here we provide just a few useful time-series oriented plots using the ggplot()
functions in the ggplot2 package.
#plotting intraindividual change
ggplot(data = interaction_long,
aes(x = interaction, group=id), color=factor(id)) +
guides(color="none") + #to suppress guide
#first variable
geom_line(aes(y=agval, color=factor(interaction_long$id))) +
#geom_point(aes(y=agval), color="black") +
#plot layouts
scale_x_continuous(breaks=c(0,8,16,24,32,40,48,56), name="Interaction#") +
scale_y_continuous(breaks=c(0,1,2,3,4,5,6,7,8,9), name="Affect Valence") +
theme_classic() +
theme(axis.title=element_text(size=14),
axis.text=element_text(size=14),
plot.title=element_text(size=14, hjust=.5)) +
ggtitle("Affect Valence")
That “blob” is not very informative. Better to consider intraindividual plots - how each individual person changes over time.
#plotting intraindividual change
ggplot(data = interaction_long[which(interaction_long$id == 104), ],
aes(x = interaction, group= id)) +
#first variable
geom_line(aes(y=agval), color="black") +
geom_point(aes(y=agval), color="black") +
#second variable
geom_line(aes(y=stress), color="red") +
geom_point(aes(y=stress), color="red") +
#plot layouts
scale_x_continuous(breaks=c(0,8,16,24,32,40,48,56), name="Interaction#") +
scale_y_continuous(breaks=c(0,1,2,3,4,5,6,7,8,9), name="Affect Valence & Stress") +
theme_classic() +
theme(axis.title=element_text(size=14),
axis.text=element_text(size=14),
plot.title=element_text(size=14, hjust=.5)) +
ggtitle("ID#104")
Line plots are good for continuous variables. Note however, that the scales for these two variables do overlap, but are not exactly the same.
We can effectively use the background to display categorical variables using the geom_rect()
function within ggplot()
.
#plotting intraindividual change
ggplot(data = interaction_long[which(interaction_long$id == 104), ],
aes(x = interaction, group= id)) +
#categorical variable as background
geom_rect(aes(xmin=interaction-.5,xmax=interaction+.5, ymin=0,ymax=10,
fill=factor(partner_gender)), alpha=0.2) +
#first variable
geom_line(aes(y=agval), color="black") +
geom_point(aes(y=agval), color="black") +
#second variable
geom_line(aes(y=stress), color="red") +
geom_point(aes(y=stress), color="red") +
#plot layouts
scale_fill_manual(name = "Partner Gender",
values = c("blue", "yellow", "gray"),
labels = c("female", "male", "missing")) +
scale_x_continuous(breaks=c(0,8,16,24,32,40,48,56), name="Interaction#") +
scale_y_continuous(breaks=c(0,1,2,3,4,5,6,7,8,9), name="Affect Valence & Stress") +
theme_classic() +
theme(axis.title=element_text(size=14),
axis.text=element_text(size=14),
plot.title=element_text(size=14, hjust=.5)) +
ggtitle("ID#104")
Of course, we are also interested in interindividual differences. To examine those, we can display multiple persons in separate panels using facet_wrap()
.
#plotting intraindividual change
ggplot(data = interaction_long[which(interaction_long$id <= 106), ],
aes(x = interaction, group= id)) +
#categorical variable as background
geom_rect(aes(xmin=interaction-.5,xmax=interaction+.5, ymin=0,ymax=10,
fill=factor(partner_gender)), alpha=0.2) +
#first variable
geom_line(aes(y=agval), color="black") +
geom_point(aes(y=agval), color="black") +
#second variable
geom_line(aes(y=stress), color="red") +
geom_point(aes(y=stress), color="red") +
#plot layouts
scale_fill_manual(name = "Partner Gender",
values = c("blue", "yellow", "gray"),
labels = c("female", "male", "missing")) +
scale_x_continuous(breaks=c(0,8,16,24,32,40,48,56), name="Interaction#") +
scale_y_continuous(breaks=c(0,3,6,9), name="Affect Valence & Stress") +
theme_classic() +
theme(axis.title=element_text(size=14),
axis.text=element_text(size=14),
plot.title=element_text(size=14, hjust=.5)) +
ggtitle("ID 101 to 105") +
facet_wrap(~id, ncol=1)
This tutorial covered some basics, including reading in repeated measures data from an experience sampling study, manipulating those data into two different structures (long and wide), assessing the overall characteristics of the data/protocol, and describing the data (descriptives and plots).
We hope this foundational material prompt development of some general strategies that can be applied whenever approaching new longitudinal data. The best part is that, after looking at those plots, we know that subsequent analyses will be super fun!
Thanks for playing!