Overview

This tutorial walks through a few helpful initial steps before analysis of experience sampling and EMA data (or any analyses, actually). Specifically, this tutorial demonstrates how to get the data in and obtain initial descriptive statistics and plots that are useful when making decisions about analyses.

Our examples make use of The AMIB Data, a multiple time-scale data set that has been shared for teaching purposes.

Outline

This script covers …

A. Loading The AMIB Data

B. Describing some aspects of the data

C. A few data visualizations

Preliminaries

Loading libraries used in this script.

library(psych) # for describing the data
library(plyr) #for data manipulation
library(ggplot2) # for data visualization

A. Loading the AMIB Data

The first step, and often the most frustrating part of data analysis is … getting the data into the program!

In recent years people are generally exchanging data files in a .csv format as these files port well and can interfce with many different software packages.

Here, we make use of the person-level and interaction-level (EMA-type) AMIB data files, (which can be merged easily using the id and day variables).

Loading person-level file (N = 190)

#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIBshare_persons_2019_0501.csv"
#read in the .csv file using the url() function
AMIB_persons <- read.csv(file=url(filepath),header=TRUE)

Loading interaction-level file (T = many, on average 43 per person).

#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/AMIBshare_interaction_2019_0501.csv"
#read in the .csv file using the url() function
AMIB_interaction <- read.csv(file=url(filepath),header=TRUE)

B. Describing The AMIB Data

Once the data are in, we can begin learning about them.

Basic descriptives of person-level data

Subsetting to a few variables.

#subsetting to a few trait variables
AMIB_persons <- AMIB_persons[ ,c("id","sex",
                                 "bfi_o","bfi_c","bfi_e","bfi_a","bfi_n")]

Descriptives of traits using person-level data (sex, personality).

#basic descriptives (using describe() from the psych package)
describe(AMIB_persons)

##       vars   n   mean     sd median trimmed    mad   min max range  skew
## id       1 190 318.29 130.44  321.5  318.99 151.23 101.0 532 431.0 -0.04
## sex      2 190   1.66   0.48    2.0    1.70   0.00   1.0   2   1.0 -0.66
## bfi_o    3 190   3.60   0.96    3.5    3.64   0.74   1.0   5   4.0 -0.34
## bfi_c    4 190   3.76   0.85    4.0    3.77   0.74   1.5   5   3.5 -0.12
## bfi_e    5 190   3.38   1.00    3.5    3.40   0.74   1.0   5   4.0 -0.21
## bfi_a    6 190   3.61   0.88    3.5    3.69   0.74   1.0   5   4.0 -0.72
## bfi_n    7 190   2.98   0.96    3.0    3.00   1.48   1.0   5   4.0 -0.09
##       kurtosis   se
## id       -1.09 9.46
## sex      -1.57 0.03
## bfi_o    -0.48 0.07
## bfi_c    -0.90 0.06
## bfi_e    -0.58 0.07
## bfi_a     0.14 0.06
## bfi_n    -0.82 0.07

#psych::describe(AMIB_persons)

#correlations
cor(AMIB_persons[ ,-1]) #dropping 1st column (id)

##              sex       bfi_o       bfi_c       bfi_e       bfi_a
## sex   1.00000000  0.06161000  0.03394367  0.18873546  0.04422936
## bfi_o 0.06161000  1.00000000  0.03093131  0.13087435  0.03581446
## bfi_c 0.03394367  0.03093131  1.00000000 -0.01870259  0.08472053
## bfi_e 0.18873546  0.13087435 -0.01870259  1.00000000  0.08558411
## bfi_a 0.04422936  0.03581446  0.08472053  0.08558411  1.00000000
## bfi_n 0.18389358 -0.05557229 -0.02838956 -0.16160302 -0.12293379
##             bfi_n
## sex    0.18389358
## bfi_o -0.05557229
## bfi_c -0.02838956
## bfi_e -0.16160302
## bfi_a -0.12293379
## bfi_n  1.00000000

#plot matrix (using describe() from the psych package)
pairs.panels(AMIB_persons[ ,-1])

Note that the person-level descriptes are “cross-sectional”.

Basic descriptives of interaction-level data.

Subsetting to a few variables.
ID and TIME variables: “id”,“day”,“interaction”,“timea”
Outcome variables: “partner_gender”,“agval”,“stress”

#subsetting to a few variables
AMIB_interaction <- AMIB_interaction[ ,c("id","day","interaction", "timea", 
                                         "partner_gender","agval","stress")]

Often, our analyses will make use of both the repeated measures data and the person-level data. For illustration, we merge them here.

Merging person-level data into repeated meausures interaction-level data.

#merging repeated mesaures and person-level data
interaction_long <- merge(AMIB_interaction, AMIB_persons, by="id")

#Look at first few rows of data
head(interaction_long,10)

##     id day interaction timea partner_gender agval stress sex bfi_o bfi_c
## 1  101   1           1   700              0     3      1   2     4     4
## 2  101   1           2  1230              1     8      1   2     4     4
## 3  101   1           3  1245              1     8      1   2     4     4
## 4  101   1           4  1330              1     8      1   2     4     4
## 5  101   1           5  1420              1     6      1   2     4     4
## 6  101   1           6  1445              1     8      1   2     4     4
## 7  101   1           7  1920              1     8      1   2     4     4
## 8  101   1           8  2030              1     8      1   2     4     4
## 9  101   2           9    30              0     9      0   2     4     4
## 10 101   2          10     0              1     8      0   2     4     4
##    bfi_e bfi_a bfi_n
## 1    3.5   1.5     2
## 2    3.5   1.5     2
## 3    3.5   1.5     2
## 4    3.5   1.5     2
## 5    3.5   1.5     2
## 6    3.5   1.5     2
## 7    3.5   1.5     2
## 8    3.5   1.5     2
## 9    3.5   1.5     2
## 10   3.5   1.5     2

#checking number of persons
length(unique(interaction_long$id))

## [1] 184

Note that there are only 184 persons in the data now (vs. N = 190 in the person-leevl file). In this case, the discrepancy is becauese there were 6 persons that completed baseline questionnaires, but did not provide any EMA data.

Descriptives of the merged interaction-level and person-level data.

#basic descriptives (using describe() from the psych package)
describe(interaction_long)

##                vars    n    mean     sd median trimmed    mad   min  max
## id                1 7568  330.55 122.47  328.0  334.09 145.29 101.0  532
## day               2 7568    3.97   1.99    4.0    3.96   2.97   1.0    7
## interaction       3 7568   23.31  14.74   22.0   22.56  17.79   1.0   56
## timea             4 7500 1496.82 445.29 1500.0 1495.34 489.26   0.0 2800
## partner_gender    5 6884    0.60   0.49    1.0    0.62   0.00   0.0    1
## agval             6 7553    6.68   1.96    7.0    6.92   1.48   1.0    9
## stress            7 7544    1.31   1.33    1.0    1.14   1.48   0.0    5
## sex               8 7568    1.70   0.46    2.0    1.75   0.00   1.0    2
## bfi_o             9 7568    3.60   0.95    3.5    3.64   0.74   1.0    5
## bfi_c            10 7568    3.77   0.85    4.0    3.79   0.74   1.5    5
## bfi_e            11 7568    3.41   0.97    3.5    3.43   0.74   1.0    5
## bfi_a            12 7568    3.61   0.87    3.5    3.69   0.74   1.0    5
## bfi_n            13 7568    3.00   0.96    3.0    3.00   1.48   1.0    5
##                 range  skew kurtosis   se
## id              431.0 -0.17    -0.91 1.41
## day               6.0  0.02    -1.24 0.02
## interaction      55.0  0.35    -0.90 0.17
## timea          2800.0 -0.10    -0.14 5.14
## partner_gender    1.0 -0.39    -1.85 0.01
## agval             8.0 -0.95     0.33 0.02
## stress            5.0  0.77    -0.32 0.02
## sex               1.0 -0.85    -1.27 0.01
## bfi_o             4.0 -0.41    -0.24 0.01
## bfi_c             3.5 -0.15    -0.86 0.01
## bfi_e             4.0 -0.24    -0.52 0.01
## bfi_a             4.0 -0.65    -0.02 0.01
## bfi_n             4.0 -0.03    -0.76 0.01

#plot matrix (using describe() from the psych package)
pairs.panels(interaction_long[ ,c("partner_gender","agval","stress")])

BE CAREFUL! These desriptives ignore the nesting of the data (i.e., are a mix of within-person and between-person information).

Describing compliance (and missing data)

Among the first steps when describing experience sampling data is describing the number of observations. The procotol was designed for 8 reports per day for 7 days = 56 observations.

Describing compliance.

#counts of days (day 0 to day 8) completed
table(interaction_long$day)

## 
##    1    2    3    4    5    6    7 
## 1101 1094 1071 1128 1055 1076 1043

#counting number of days and interaction reports completed for each person
compliance_stats <-  ddply(interaction_long, "id", summarize,
                           num_days = length(unique(day)),
                           num_obs = length(id))
#Alternative using dplyr package 
#compliance_stats <- interaction_long %>% group_by(id) %>%summarize(num_days = length(unique(day)))

#looking at counts
describe(compliance_stats)

##          vars   n   mean     sd median trimmed    mad min max range  skew
## id          1 184 321.97 129.72  323.5  323.39 151.23 101 532   431 -0.07
## num_days    2 184   6.82   0.64    7.0    6.99   0.00   2   7     5 -4.66
## num_obs     3 184  41.13  13.62   43.0   42.45  17.79  10  56    46 -0.53
##          kurtosis   se
## id          -1.09 9.56
## num_days    25.30 0.05
## num_obs     -0.94 1.00

table(compliance_stats$num_days, useNA = "always")

## 
##    2    3    4    5    6    7 <NA> 
##    1    1    1    5   11  165    0

table(compliance_stats$num_obs, useNA = "always")

## 
##   10   11   13   14   15   16   17   18   19   20   22   23   25   26   28 
##    2    1    1    2    2    3    1    4    2    2    1    5    5    2    4 
##   29   30   31   32   33   34   35   36   37   38   39   40   41   42   43 
##    4    4    6    2    4    6    3    3    2    1    2    3    5    5    6 
##   44   45   46   47   48   49   50   51   52   53   54   55   56 <NA> 
##    2    4    3    4    6    2    5    6    4    2    2    7   44    0

#histogram
ggplot(data = compliance_stats, aes(x = num_obs)) +
  geom_histogram(fill="white", color="black") + 
  labs(x = "Number of Observations Completed")

Note that the specific way one does these counts depends on the structure of the initial data file - e.g., whether missed reports included or not (here they were not). The counts can then be summarized for inclusion in a report.

Long and Wide Format Data

Behavioral science tends to use relational data structures - i.e., spreadsheets. Typically the data are stored/configured as a data frame (a “fancy” matrix) with multiple rows and multiple columns. Two main schema are used to accommodate repeated measures data - “Wide Format” and “Long Format”. Different functions work with different kinds of data input. Thus, it is imperative that one can convert the data back and forth between wide and long formats.

Reshape the data from Long to Wide

There are lots of ways to do this. One way to go from long to wide is using the reshape() function (not to be confused with the reshape2 package). See also melt() and cast() functions in the reshape2 package.

#trick to get variable names for easy cut-and-paste
dput(names(interaction_long))

## c("id", "day", "interaction", "timea", "partner_gender", "agval", 
## "stress", "sex", "bfi_o", "bfi_c", "bfi_e", "bfi_a", "bfi_n")

#rehaping long to wide (using reshape2 package)
interaction_wide <- reshape(data=interaction_long, 
                    timevar=c("interaction"), 
                    idvar=c("id","sex","bfi_o","bfi_c","bfi_e","bfi_a","bfi_n"),
                    v.names=c("timea","partner_gender","agval","stress"),
                    direction="wide", sep="_",
                    drop=c("day")) # to drop variables not needed  

#looking at part of data file
head(interaction_wide[,1:10], 10)

##      id sex bfi_o bfi_c bfi_e bfi_a bfi_n timea_1 partner_gender_1 agval_1
## 1   101   2   4.0   4.0   3.5   1.5   2.0     700                0       3
## 57  103   2   5.0   3.5   4.0   4.5   2.5    1100                0       9
## 71  104   2   3.0   4.5   3.0   4.5   2.5     700                1       7
## 126 105   2   4.5   3.0   3.5   3.5   3.5    1100                1       7
## 159 106   2   3.0   5.0   3.0   3.5   1.5    1100                1       6
## 205 107   1   4.0   5.0   5.0   4.0   1.5    1110                0       9
## 228 108   2   3.0   5.0   3.5   3.0   4.5    1327               NA       1
## 284 109   1   3.0   3.0   3.5   3.0   4.0    1115                1       9
## 315 110   2   5.0   3.5   3.0   3.5   3.5    1010                0       3
## 340 111   1   2.0   3.0   3.0   3.5   3.5     930               NA       2

Note that the interaction_wide data has 184 rows, and that NAs have been filled in for those with less than 56 reports.

Reshape the data from Wide to Long

There are lots of ways to do this. One way to go from wide to long is using the reshape() function (not to be confused with the reshape2 package). See also melt() and cast() functions in the reshape2 package.

interaction_longNEW <- reshape(data=interaction_wide, 
                    timevar=c("interaction"), 
                    idvar=c("id","sex","bfi_o","bfi_c","bfi_e","bfi_a","bfi_n"),
                    direction="long", sep="_")

#reording by id and time for convenient viewing
interaction_longNEW <- interaction_longNEW[order(interaction_longNEW$id, interaction_longNEW$interaction), ]

#looking at part of data file
head(interaction_longNEW, 10)

##                         id sex bfi_o bfi_c bfi_e bfi_a bfi_n interaction
## 101.2.4.4.3.5.1.5.2.1  101   2     4     4   3.5   1.5     2           1
## 101.2.4.4.3.5.1.5.2.2  101   2     4     4   3.5   1.5     2           2
## 101.2.4.4.3.5.1.5.2.3  101   2     4     4   3.5   1.5     2           3
## 101.2.4.4.3.5.1.5.2.4  101   2     4     4   3.5   1.5     2           4
## 101.2.4.4.3.5.1.5.2.5  101   2     4     4   3.5   1.5     2           5
## 101.2.4.4.3.5.1.5.2.6  101   2     4     4   3.5   1.5     2           6
## 101.2.4.4.3.5.1.5.2.7  101   2     4     4   3.5   1.5     2           7
## 101.2.4.4.3.5.1.5.2.8  101   2     4     4   3.5   1.5     2           8
## 101.2.4.4.3.5.1.5.2.9  101   2     4     4   3.5   1.5     2           9
## 101.2.4.4.3.5.1.5.2.10 101   2     4     4   3.5   1.5     2          10
##                        timea partner_gender agval stress
## 101.2.4.4.3.5.1.5.2.1    700              0     3      1
## 101.2.4.4.3.5.1.5.2.2   1230              1     8      1
## 101.2.4.4.3.5.1.5.2.3   1245              1     8      1
## 101.2.4.4.3.5.1.5.2.4   1330              1     8      1
## 101.2.4.4.3.5.1.5.2.5   1420              1     6      1
## 101.2.4.4.3.5.1.5.2.6   1445              1     8      1
## 101.2.4.4.3.5.1.5.2.7   1920              1     8      1
## 101.2.4.4.3.5.1.5.2.8   2030              1     8      1
## 101.2.4.4.3.5.1.5.2.9     30              0     9      0
## 101.2.4.4.3.5.1.5.2.10     0              1     8      0

Note that the interaction_longNEW data has 10304 rows, whereas the original data interaction_long had only 7568 rows. This is because NAs were filled in for all of those persons with less than 56 reports.

BE CAREFUL WITH RESHAPE ## THE ARGUMENTS NEEDED ARE NOT ALWAYS STRAIGHTFORWARD. Be sure that the missing data (NAs) have been handled in the intended way.

C. Data Visualization

There are lots of ways to visualize repeated measures data. We encourage to engage in lots of data visualilation and learn as much about the data as possible. Here we provide just a few useful time-series oriented plots using the ggplot() functions in the ggplot2 package.

Plotting one variable over time for all persons.

#plotting intraindividual change 
ggplot(data = interaction_long,
       aes(x = interaction, group=id), color=factor(id)) +
  guides(color="none") + #to suppress guide
  #first variable
  geom_line(aes(y=agval, color=factor(interaction_long$id))) +
  #geom_point(aes(y=agval), color="black") +
  #plot layouts
  scale_x_continuous(breaks=c(0,8,16,24,32,40,48,56), name="Interaction#") +
  scale_y_continuous(breaks=c(0,1,2,3,4,5,6,7,8,9), name="Affect Valence") +  
  theme_classic() +
  theme(axis.title=element_text(size=14),
        axis.text=element_text(size=14),
        plot.title=element_text(size=14, hjust=.5)) +
  ggtitle("Affect Valence")

That “blob” is not very informative. Better to consider intraindividual plots - how each individual person changes over time.

Plotting two variables for one person

#plotting intraindividual change 
ggplot(data = interaction_long[which(interaction_long$id == 104), ],
       aes(x = interaction, group= id)) +
  #first variable
  geom_line(aes(y=agval), color="black") +
  geom_point(aes(y=agval), color="black") +
  #second variable
  geom_line(aes(y=stress), color="red") +
  geom_point(aes(y=stress), color="red") +
  #plot layouts
  scale_x_continuous(breaks=c(0,8,16,24,32,40,48,56), name="Interaction#") +
  scale_y_continuous(breaks=c(0,1,2,3,4,5,6,7,8,9), name="Affect Valence & Stress") +  
  theme_classic() +
  theme(axis.title=element_text(size=14),
        axis.text=element_text(size=14),
        plot.title=element_text(size=14, hjust=.5)) +
  ggtitle("ID#104")

Line plots are good for continuous variables. Note however, that the scales for these two variables do overlap, but are not exactly the same.

Including a categorical variable.

We can effectively use the background to display categorical variables using the geom_rect() function within ggplot().

Plotting three variables for one person

#plotting intraindividual change 
ggplot(data = interaction_long[which(interaction_long$id == 104), ],
       aes(x = interaction, group= id)) +
  #categorical variable as background
  geom_rect(aes(xmin=interaction-.5,xmax=interaction+.5, ymin=0,ymax=10,
                fill=factor(partner_gender)), alpha=0.2) +
  #first variable
  geom_line(aes(y=agval), color="black") +
  geom_point(aes(y=agval), color="black") +
  #second variable
  geom_line(aes(y=stress), color="red") +
  geom_point(aes(y=stress), color="red") +
  #plot layouts
  scale_fill_manual(name = "Partner Gender",
                    values = c("blue", "yellow", "gray"), 
                    labels = c("female", "male", "missing")) + 
  scale_x_continuous(breaks=c(0,8,16,24,32,40,48,56), name="Interaction#") +
  scale_y_continuous(breaks=c(0,1,2,3,4,5,6,7,8,9), name="Affect Valence & Stress") +  
  theme_classic() +
  theme(axis.title=element_text(size=14),
        axis.text=element_text(size=14),
        plot.title=element_text(size=14, hjust=.5)) +
  ggtitle("ID#104")

Plotting multiple individuals

Of course, we are also interested in interindividual differences. To examine those, we can display multiple persons in separate panels using facet_wrap().

#plotting intraindividual change 
ggplot(data = interaction_long[which(interaction_long$id <= 106), ],
       aes(x = interaction, group= id)) +
  #categorical variable as background
  geom_rect(aes(xmin=interaction-.5,xmax=interaction+.5, ymin=0,ymax=10,
                fill=factor(partner_gender)), alpha=0.2) +
  #first variable
  geom_line(aes(y=agval), color="black") +
  geom_point(aes(y=agval), color="black") +
  #second variable
  geom_line(aes(y=stress), color="red") +
  geom_point(aes(y=stress), color="red") +
  #plot layouts
  scale_fill_manual(name = "Partner Gender",
                    values = c("blue", "yellow", "gray"), 
                    labels = c("female", "male", "missing")) + 
  scale_x_continuous(breaks=c(0,8,16,24,32,40,48,56), name="Interaction#") +
  scale_y_continuous(breaks=c(0,3,6,9), name="Affect Valence & Stress") +  
  theme_classic() +
  theme(axis.title=element_text(size=14),
        axis.text=element_text(size=14),
        plot.title=element_text(size=14, hjust=.5)) +
  ggtitle("ID 101 to 105") +
  facet_wrap(~id, ncol=1)

Conclusion

This tutorial covered some basics, including reading in repeated measures data from an experience sampling study, manipulating those data into two different structures (long and wide), assessing the overall characteristics of the data/protocol, and describing the data (descriptives and plots).

We hope this foundational material prompt development of some general strategies that can be applied whenever approaching new longitudinal data. The best part is that, after looking at those plots, we know that subsequent analyses will be super fun!

Thanks for playing!

APA ATI Intensive Longitudinal Data: Session C

Data, Descriptives & Data Viz