Grid-Sequence Analysis Tutorial

Overview

Grid-sequence analysis utilizes repeated-measures dyadic data to examine within-dyad processes and between-dyad differences. Please see the paper “Analyzing Dyadic Data using Grid-Sequence Analysis: Inter-Dyad Differences in Intra-Dyad Dynamics” (Brinberg, et al., in press).

Outline

Introduction to Grid-Sequence Analysis.
Creating State-Space Grids for Each Dyad.
Creating Sequences.
Establishing a Cost Matrix and Sequence Analysis.
Cluster Determination.
Examine Group Differences among Clusters.

0. Introduction to Grid-Sequence Analysis.

Grid-sequence analysis is a descriptive technique to capture within-dyad dynamics and allow for between-dyad comparisons. This analytic technique draws from state-space grids, typically used in developmental psychology (Hollenstein, 2013) and sequence analysis, previously used in sociology and biology (Macindoe & Abbott, 2004).

Grid-sequence analysis combines state-space grids and sequence analysis by:

Tracking a dyad’s movements across a grid.
Converting dyad movements into a univariate sequence.
Clustering dyads with similar movements using sequence analysis.

Load data and needed libraries.

#set filepath for repeated measures of happiness data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/gridsequence_simulation_data.csv" 
data <- read.csv(file=url(filepath),header=TRUE)
head(data)

##   X id time1  outcome1  outcome2
## 1 1  1     1  88.27175 100.00000
## 2 2  1     2  93.70530  88.49227
## 3 3  1     3  97.49972  91.96082
## 4 4  1     4 100.00000 100.00000
## 5 5  1     5  96.65027  84.73661
## 6 6  1     6  96.44185 100.00000

#setwd("~/Desktop")
#set filepath for person-level data (e.g., relationship satisfaction and current health)
filepath1 <- "https://quantdev.ssri.psu.edu/sites/qdev/files/gridsequence_simulation_descriptives.csv"
data1 <- read.csv(file=url(filepath1),header=TRUE)
head(data1)

##   id member  rel_sat   health
## 1  1      1 1.388314 2.747397
## 2  1      2 1.895276 2.032986
## 3  2      1 1.074569 1.548543
## 4  2      2 1.022392 1.843375
## 5  3      1 1.052483 1.883714
## 6  3      2 2.006875 3.357317

#delete first column of each data set (1, 2, ..., n labeling in Excel sheet)

data[1] <- NULL
head(data)

##   id time1  outcome1  outcome2
## 1  1     1  88.27175 100.00000
## 2  1     2  93.70530  88.49227
## 3  1     3  97.49972  91.96082
## 4  1     4 100.00000 100.00000
## 5  1     5  96.65027  84.73661
## 6  1     6  96.44185 100.00000

#loading libraries
library(cluster)
library(ggplot2)
library(reshape)
library(stats)
library(TraMineR)
library(TraMineRextras)

Depending on the format of your data set, some data management may be necessary. The final product should be two data sets:

“data” contains repeated measures of the variable of interest (in this case, happiness). There should be a column that contains a couple-level ID variable, a column the contains a continuous measure of time (in this case, 1-42, given that we measured happiness 42 times over a week), and columns for each member of the dyad’s responses to our happiness item.
“data1” contains person-level, time-invariant variables. These are the variables in which you will test between-group differences in Step 5 of grid-sequence analysis. This data file should include a column for couple-level ID, a column that distinguishes each member of the dyad (in this case labeled “member”), and columns with the between-person/dyad variables of interests (such as relationship satisfaction or subjective health).

1. Creating State-Space Grids for Each Dyad.

We begin creating our state-space grid by setting up cut points. In this example, we create a 4 x4 (16 cell) grid, and use three cut points (at intervals of 25 because our measure of happiness is on a 0-100 scale). Our cut points are only for the sake of demonstration, but these divisions can vary depending on your research question or underlying theory.

#Creating cut points
data$cut1 <- 25
data$cut2 <- 50
data$cut3 <- 75

Next we use these cut points to label the cells on the grid with letters of the English alphabet and create a new variable that contains these letters (labeled as “happy” in our data set “data”). We set the x-axis to be dyad member 1 (who is associated with “outcome1”) and the y-axis to be dyad member 2 (who is associated with “outcome2”). In this case, we label the upper-left hand corner with “A” and the lower-right hand corner with “P,” filling in alphabetically along the way.

#Lettering each cell of the grid
data$happy[data$outcome1 <= data$cut1 & data$outcome2 > data$cut3] <- "A"

data$happy[data$outcome1 > data$cut1 & data$outcome1 <= data$cut2 & data$outcome2 > data$cut3] <- "B"

data$happy[data$outcome1 > data$cut2 & data$outcome1 <= data$cut3 & data$outcome2 > data$cut3] <- "C"

data$happy[data$outcome1 > data$cut3 & data$outcome2 > data$cut3] <- "D" 

data$happy[data$outcome1 <= data$cut1 & data$outcome2 > data$cut2 & data$outcome2 <= data$cut3] <- "E"

data$happy[data$outcome1 > data$cut1 & data$outcome1 <= data$cut2 & data$outcome2 > data$cut2 & data$outcome2 <= data$cut3] <- "F"

data$happy[data$outcome1 > data$cut2 & data$outcome1 <= data$cut3 & data$outcome2 > data$cut2 & data$outcome2 <= data$cut3] <- "G"

data$happy[data$outcome1 > data$cut3 & data$outcome2 > data$cut2 & data$outcome2 <= data$cut3] <- "H"

data$happy[data$outcome1 <= data$cut1 & data$outcome2 > data$cut1 & data$outcome2 <= data$cut2] <- "I"

data$happy[data$outcome1 > data$cut1 & data$outcome1 <= data$cut2 & data$outcome2 > data$cut1 & data$outcome2 <= data$cut2] <- "J"

data$happy[data$outcome1 > data$cut2 & data$outcome1 <= data$cut3 & data$outcome2 > data$cut1 & data$outcome2 <= data$cut2] <- "K"

data$happy[data$outcome1 > data$cut3 & data$outcome2 > data$cut1 & data$outcome2 <= data$cut2] <- "L"

data$happy[data$outcome1 <= data$cut1 & data$outcome2 <= data$cut1] <- "M"

data$happy[data$outcome1 > data$cut1 & data$outcome1 <= data$cut2 & data$outcome2 <= data$cut1] <- "N"

data$happy[data$outcome1 > data$cut2 & data$outcome1 <= data$cut3 & data$outcome2 <= data$cut1] <- "O"

data$happy[data$outcome1 > data$cut3 & data$outcome2 <= data$cut1] <- "P"

A quick check to see what the repeated-measures data looks like:

head(data)

##   id time1  outcome1  outcome2 cut1 cut2 cut3 happy
## 1  1     1  88.27175 100.00000   25   50   75     D
## 2  1     2  93.70530  88.49227   25   50   75     D
## 3  1     3  97.49972  91.96082   25   50   75     D
## 4  1     4 100.00000 100.00000   25   50   75     D
## 5  1     5  96.65027  84.73661   25   50   75     D
## 6  1     6  96.44185 100.00000   25   50   75     D

It is useful to plot each dyad’s movements across the state-space grid to obtain a greater understanding of the data. I’ve attached plots of two different dyad’s (one with low variability and one with high variability).

Example Dyad 1: Low variability.

Example Dyad 2: High variability.

Below is code for a loop to obtain a PDF of each dyad’s state space grid.

all_ids <- unique(data$id)

# I'm going to remove the NAs from all_ids

all_ids <- na.omit(all_ids)

cut1 <- 25
cut2 <- 50
cut3 <- 75


# open the pdf file
pdf('State Space Grid Simulation.pdf', width = 10, height = 7)

for(x in 1:length(all_ids)){
  dyad_id <- all_ids[x]
  data_loop_subset <- subset(data, id == dyad_id)

#let's remove the missing data

data_loop_noNA <- na.omit(data_loop_subset)

setwd("~/Desktop")

# At this point we need to check if there is actually data for the individual and only plot points where there are data for both members of the dyad at each time point.

if(nrow(data_loop_noNA) == 0){}

if(nrow(data_loop_noNA) > 0){

plot_title = paste('Couple ID = ', unique(data_loop_noNA$id), sep = '') #adding a dyad ID label at the top of each plot
grid_plot <-

ggplot(data = data_loop_noNA, aes(x = outcome1, y = outcome2)) +
  xlim(-5,105) + #giving a little extra room (the scale is 0-100) so points of the edge of the scale will show up easily
  ylim(-5,105) + #giving a little extra room (the scale is 0-100) so points of the edge of the scale will show up easily
  ylab('Happiness Partner 2') + #y-axis label
  xlab('Happiness Partner 1') + #x-axis label
  geom_rect(xmin=0,xmax=cut1,ymin=0,ymax=cut1, fill="#000000", alpha=.05) + #the following geom_rect lines of code are assigning colors to each cell of the grid and adjusting their transparency (with the alpha = command)
  geom_rect(xmin=0,xmax=cut1,ymin=cut1,ymax=cut2, fill="#000054", alpha=.05) +
  geom_rect(xmin=0,xmax=cut1,ymin=cut2,ymax=cut3, fill="#0000A8", alpha=.05) +
  geom_rect(xmin=0,xmax=cut1,ymin=cut3,ymax=100, fill="#0000FC", alpha=.05) +
  geom_rect(xmin=cut1,xmax=cut2,ymin=0,ymax=cut1, fill="#540000", alpha=.05) +
  geom_rect(xmin=cut1,xmax=cut2,ymin=cut1,ymax=cut2, fill="#540054", alpha=.05) +
  geom_rect(xmin=cut1,xmax=cut2,ymin=cut2,ymax=cut3, fill="#5400A8", alpha=.05) +
  geom_rect(xmin=cut1,xmax=cut2,ymin=cut3,ymax=100, fill="#5400FC", alpha=.05) +
  geom_rect(xmin=cut2,xmax=cut3,ymin=0,ymax=cut1, fill="#A80000", alpha=.05) +
  geom_rect(xmin=cut2,xmax=cut3,ymin=cut1,ymax=cut2, fill="#A80054", alpha=.05) +
  geom_rect(xmin=cut2,xmax=cut3,ymin=cut2,ymax=cut3, fill="#A800A8", alpha=.05) +
  geom_rect(xmin=cut2,xmax=cut3,ymin=cut3,ymax=100, fill="#A800FC", alpha=.05) +
  geom_rect(xmin=cut3,xmax=100,ymin=0,ymax=cut1, fill="#FC0000", alpha=.05) +
  geom_rect(xmin=cut3,xmax=100,ymin=cut1,ymax=cut2, fill="#FC0054", alpha=.05) +
  geom_rect(xmin=cut3,xmax=100,ymin=cut2,ymax=cut3, fill="#FC00A8", alpha=.05) +
  geom_rect(xmin=cut3,xmax=100,ymin=cut3,ymax=100, fill="#FC00FC", alpha=.05) +
   theme(
    axis.text = element_text(size = 14, color = 'black'),
    axis.title = element_text(size = 18),
    panel.grid.major = element_line(colour = "white"),
    panel.grid.minor = element_blank(),
    panel.background = element_blank(),
    axis.ticks = element_blank()
) + #adjusting size of text and titles in plot
  geom_vline(xintercept = c(0,25,50,75,100)) + #adding vertical lines to distinguish cells of grid
  geom_hline(yintercept = c(0,25,50,75,100)) + #adding horizontal lines to distinguish cells of grid
    geom_point(colour= "white") + #setting points on plot to white
  geom_path(colour = "white")  + #setting path connecting points to the color white
  ggtitle(plot_title) #adding title to plot
print(grid_plot)
}
}

dev.off()

2. Creating Sequences.

In this step, we re-format the data from long to wide, create an “alphabet” that consists of the letters in our grid, and extract the sequence of cells visited. Helpful comments are included within the code.

#subsetting out the data we need
data_sub <- data[ ,c("id", "time1", "happy")]

#Reformatting data: long to wide.
data_wide <- reshape(data=data_sub, 
                    timevar=c("time1"),            #time variable
                    idvar= c("id"),                #id variable
                    v.names=c("happy"),            #repeated measures variable 
                    direction="wide", sep=".")

head(data_wide)

##     id happy.1 happy.2 happy.3 happy.4 happy.5 happy.6 happy.7 happy.8
## 1    1       D       D       D       D       D       D       D       D
## 43   2       H       D       D       D       D       H       D       D
## 85   3       O       H       H       G       D       L       G       L
## 127  4       I       E       J       N       F       M       K       N
## 169  5       D       C       D       D       C       D       C       C
## 211  6       D       H       H       H       D       D       H       H
##     happy.9 happy.10 happy.11 happy.12 happy.13 happy.14 happy.15 happy.16
## 1         D        D        D        D        D        D        H        D
## 43        D        H        D        H        D        H        H        D
## 85        L        H        G        C        B        D        J        G
## 127       N        M        M        N        N        J        E        J
## 169       C        D        C        C        B        C        C        C
## 211       H        H        H        H        H        D        H        D
##     happy.17 happy.18 happy.19 happy.20 happy.21 happy.22 happy.23
## 1          D        D        D        D        D        D        D
## 43         D        D        D        D        D        D        D
## 85         E        L        G        J        O        O        H
## 127        I        M        N        M        J        A        N
## 169        B        D        C        B        C        D        D
## 211        D        D        H        H        D        D        D
##     happy.24 happy.25 happy.26 happy.27 happy.28 happy.29 happy.30
## 1          D        D        D        D        H        H        D
## 43         H        D        D        D        D        D        D
## 85         K        B        L        D        H        F        D
## 127        N        O        N        N        M        F        F
## 169        C        C        D        D        C        C        C
## 211        D        D        D        D        D        D        D
##     happy.31 happy.32 happy.33 happy.34 happy.35 happy.36 happy.37
## 1          H        D        D        D        D        D        D
## 43         D        D        D        H        D        D        D
## 85         L        J        D        D        C        D        F
## 127        J        I        J        N        M        N        E
## 169        D        D        C        C        B        D        B
## 211        H        H        D        D        H        D        D
##     happy.38 happy.39 happy.40 happy.41 happy.42
## 1          D        D        H        D        D
## 43         D        D        H        D        H
## 85         D        H        D        H        H
## 127        N        M        E        O        E
## 169        D        C        D        C        B
## 211        H        H        H        H        D

# Creating alphabet.

#this object contains the letters that appear in the data set.
gs.alphabet <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P")

#this object allows for more helpful labels if applicable (e.g., negative, neutral, and positive affect).
gs.labels <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P")



#Creating the sequences.

A <- "#0000FC"
B <- "#5400FC"
C <- "#A800FC"
D <- "#FC00FC"
E <- "#0000A8"
F <- "#5400A8"
G <- "#A800A8"
H <- "#FC00A8"
I <- "#000054"
J <- "#540054"
K <- "#A80054"
L <- "#FC0054"
M <- "#000000"
N <- "#540000"
O <- "#A80000"
P <- "#FC0000" #These are assigning colors to each letter. seqdef (the function below) also has default colors.

#this creates an object that contains all of the sequences.

#seqdef is a function in TraMineR
#input: data, columns containing repeated measures data, alphabet, labels, xtstep = steps between tick marks, cpal (colors)
happy.seq <- seqdef(data_wide, var = 2:41, alphabet = gs.alphabet, 
                    labels = gs.labels, xtstep = 6, cpal=c(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P))


#Plotting the sequences.

#seqIplot is a function in TraMineR
#input: sequence object (created above), legend, title
seqIplot(happy.seq, withlegend = FALSE, title="Dyad Happiness")

3. Establishing a Cost Matrix and Sequence Analysis.

Sequence analysis aims to minimize the “cost” of transforming one sequence into another and relies on an optimal matching algorithm. There are costs for inserting, deleting, and substituting letters, as well as costs for missingness. The researcher establishes a cost matrix, and often use standards, such as insertion/deletion costs of 1.0 and missingness costs of half the highest cost within the matrix. There are a number of ways to determine cost, typically, cost is established as the distance between cells. In this case, we use Manhattan (city-block) distance, but other methods (e.g., Euclidian distance) are also possible. The output of sequence analysis is a n x n (n=number of dyads) matrix with the cost of transforming one sequence into the corresponding sequence in each cell of the matrix.

#Setting up cost matrix.

#example: constant cost matrix

#we are not actually using this cost matrix, but it creates some nice labels that we will use later
costmatrix <- seqsubm(happy.seq, method="CONSTANT", cval=2, with.missing=TRUE,
        miss.cost=.5, time.varying=FALSE, weighted=TRUE,
        transition="both", lag=1, missing.trate=FALSE)

##  [>] setting 0.5 as substitution cost for missing values

##  [>] creating 17x17 substitution-cost matrix using 2 as constant value

costmatrix

##     A-> B-> C-> D-> E-> F-> G-> H-> I-> J-> K-> L-> M-> N-> O-> P-> *->
## A-> 0.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.5
## B-> 2.0 0.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.5
## C-> 2.0 2.0 0.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.5
## D-> 2.0 2.0 2.0 0.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.5
## E-> 2.0 2.0 2.0 2.0 0.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.5
## F-> 2.0 2.0 2.0 2.0 2.0 0.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.5
## G-> 2.0 2.0 2.0 2.0 2.0 2.0 0.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.5
## H-> 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.5
## I-> 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.5
## J-> 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.0 2.0 2.0 2.0 2.0 2.0 2.0 0.5
## K-> 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.0 2.0 2.0 2.0 2.0 2.0 0.5
## L-> 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.0 2.0 2.0 2.0 2.0 0.5
## M-> 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.0 2.0 2.0 2.0 0.5
## N-> 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.0 2.0 2.0 0.5
## O-> 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.0 2.0 0.5
## P-> 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 0.0 0.5
## *-> 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.0

#defining the substitution cost matrix (since insertion/deletion costs are set to 1...do not need to define)

#the cost matrix will be (n+1) by (n+1) with n = number of cells in the grid

#the right-most column and the bottom row represent missingness costs (half of the highest cost, which in this case is half of 6)

#we used Manhattan (city-block) distance to calculate costs, but other methods are possible (e.g., Euclidian distance)
#we calculated manhattan distance manually and then inserted the results into the matrix below. for example, the cost of staying in the same cell is zero, the cost of moving from "A" to "B" is one, and the cost of moving from "A" to "N" is four.

costmatrix2 <- matrix(c(0.0, 1.0, 2.0, 3.0, 1.0, 2.0, 3.0, 4.0, 2.0, 3.0, 4.0, 5.0, 3.0, 4.0, 5.0, 6.0, 3.0, 
                   1.0, 0.0, 1.0, 2.0, 2.0, 1.0, 2.0, 3.0, 3.0, 2.0, 3.0, 4.0, 4.0, 3.0, 4.0, 5.0, 3.0,
                   2.0, 1.0, 0.0, 1.0, 3.0, 2.0, 1.0, 2.0, 4.0, 3.0, 2.0, 3.0, 5.0, 4.0, 3.0, 4.0, 3.0,
                   3.0, 2.0, 1.0, 0.0, 4.0, 3.0, 2.0, 1.0, 5.0, 4.0, 3.0, 2.0, 6.0, 5.0, 4.0, 3.0, 3.0,
                   1.0, 2.0, 3.0, 4.0, 0.0, 1.0, 2.0, 3.0, 1.0, 2.0, 3.0, 4.0, 2.0, 3.0, 4.0, 5.0, 3.0,
                   2.0, 1.0, 2.0, 3.0, 1.0, 0.0, 1.0, 2.0, 2.0, 1.0, 2.0, 3.0, 3.0, 2.0, 3.0, 4.0, 3.0,
                   3.0, 2.0, 1.0, 2.0, 2.0, 1.0, 0.0, 1.0, 3.0, 2.0, 1.0, 2.0, 4.0, 3.0, 2.0, 3.0, 3.0,
                   4.0, 3.0, 2.0, 1.0, 3.0, 2.0, 1.0, 0.0, 4.0, 3.0, 2.0, 1.0, 5.0, 4.0, 3.0, 2.0, 3.0,
                   2.0, 3.0, 4.0, 5.0, 1.0, 2.0, 3.0, 4.0, 0.0, 1.0, 2.0, 3.0, 1.0, 2.0, 3.0, 4.0, 3.0,
                   3.0, 2.0, 3.0, 4.0, 2.0, 1.0, 2.0, 3.0, 1.0, 0.0, 1.0, 2.0, 2.0, 1.0, 2.0, 3.0, 3.0, 
                   4.0, 3.0, 2.0, 3.0, 3.0, 2.0, 1.0, 2.0, 2.0, 1.0, 0.0, 1.0, 3.0, 2.0, 1.0, 2.0, 3.0,
                   5.0, 4.0, 3.0, 2.0, 4.0, 3.0, 2.0, 1.0, 3.0, 2.0, 1.0, 0.0, 4.0, 3.0, 2.0, 1.0, 3.0,
                   3.0, 4.0, 5.0, 6.0, 2.0, 3.0, 4.0, 5.0, 1.0, 2.0, 3.0, 4.0, 0.0, 1.0, 2.0, 3.0, 3.0,
                   4.0, 3.0, 4.0, 5.0, 3.0, 2.0, 3.0, 4.0, 2.0, 1.0, 2.0, 3.0, 1.0, 0.0, 1.0, 2.0, 3.0,
                   5.0, 4.0, 3.0, 4.0, 4.0, 3.0, 2.0, 3.0, 3.0, 2.0, 1.0, 2.0, 2.0, 1.0, 0.0, 1.0, 3.0,
                   6.0, 5.0, 4.0, 3.0, 5.0, 4.0, 3.0, 2.0, 4.0, 3.0, 2.0, 1.0, 3.0, 2.0, 1.0, 0.0, 3.0,
                   3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 0.0), 
                   nrow=17,ncol=17, byrow=TRUE)



#Fixing the row and column names to nice labels from first cost matrix
rownames(costmatrix2) <- rownames(costmatrix)
colnames(costmatrix2) <- colnames(costmatrix)
costmatrix2

##     A-> B-> C-> D-> E-> F-> G-> H-> I-> J-> K-> L-> M-> N-> O-> P-> *->
## A->   0   1   2   3   1   2   3   4   2   3   4   5   3   4   5   6   3
## B->   1   0   1   2   2   1   2   3   3   2   3   4   4   3   4   5   3
## C->   2   1   0   1   3   2   1   2   4   3   2   3   5   4   3   4   3
## D->   3   2   1   0   4   3   2   1   5   4   3   2   6   5   4   3   3
## E->   1   2   3   4   0   1   2   3   1   2   3   4   2   3   4   5   3
## F->   2   1   2   3   1   0   1   2   2   1   2   3   3   2   3   4   3
## G->   3   2   1   2   2   1   0   1   3   2   1   2   4   3   2   3   3
## H->   4   3   2   1   3   2   1   0   4   3   2   1   5   4   3   2   3
## I->   2   3   4   5   1   2   3   4   0   1   2   3   1   2   3   4   3
## J->   3   2   3   4   2   1   2   3   1   0   1   2   2   1   2   3   3
## K->   4   3   2   3   3   2   1   2   2   1   0   1   3   2   1   2   3
## L->   5   4   3   2   4   3   2   1   3   2   1   0   4   3   2   1   3
## M->   3   4   5   6   2   3   4   5   1   2   3   4   0   1   2   3   3
## N->   4   3   4   5   3   2   3   4   2   1   2   3   1   0   1   2   3
## O->   5   4   3   4   4   3   2   3   3   2   1   2   2   1   0   1   3
## P->   6   5   4   3   5   4   3   2   4   3   2   1   3   2   1   0   3
## *->   3   3   3   3   3   3   3   3   3   3   3   3   3   3   3   3   0

#Optimal matching technique for sequence analysis.

#seqdist is a function in TraMineR
#input: sequence object (created above), method (optimal matching), insertion/deletion cost (1), cost matrix, missingness
dist.om <- seqdist(happy.seq, method = "OM", indel = 1.0, sm = costmatrix2, with.missing = TRUE)

##  [>] 98 sequences with 16 distinct events/states

##  [>] including missing value as additional state

## Warning: Some substitution cost are greater that two times the indel cost.
## Such substitution cost will thus never be used.

##  [>] 98 distinct sequences

##  [>] min/max sequence length: 40/40

##  [>] computing distances using OM metric

##  [>] total time: 0.07 secs

4. Cluster Determination.

We next take the distance matrix obtained in the step three to determine an appropriate number of clusters. Several clustering techniques are available, but we use hierarchical cluster analysis. Other possible methods include k-mediods clustering or latent mixture models. After determining the number of clusters that work for the data, we create an object that contains cluster membership for each dyad (which will be used in the final step) and plot the clusters.

In this example, the resulting dendrogram indicated three clusters. We reached this conclusion by examining the length of the vertical lines (longer vertical line indicates greater difference between groups) and the number of dyads within each group (we didn’t want a group with too few dyads). After selecting a three cluster solution, we plotted the sequences of the three clusters for visual comparison.

#Finding clusters.

#we use hieararchical cluster analysis with Ward's method and the corresponding dendrogram to choose an appropriate number of clusters, but other methods to find underlying clusters are possible (e.g., latent mixture modeling)

#agnes is a function in cluster
#insert: distance matrix (created above), TRUE for a dissimilarity matrix, method (in this case, Ward's)
clusterward1 <- agnes(dist.om, diss = TRUE, method = "ward")
plot(clusterward1, which.plot = 2) #looks like three would be an appropriate number of clusters

#Trying cut points.

#3 Types
cl.3 <- cutree(clusterward1, k = 3) #cutting dendrogram (or tree) by the number of determined groups (in this case, 3)
cl.3fac <- factor(cl.3, labels = paste("Type", 1:3)) #turning cut points into a factor variable and labeling them

#seqplot is a function in TraMineR
#insert: sequence object, grouping/cut points (in factor form), type of plot (several options here, but we could with "I"), sorting criteria, legend, border
seqplot(happy.seq, group = cl.3fac, type="I", sortv = "from.start",withlegend = FALSE, border = NA)

5. Examine Group Differences among Clusters.

The final step of grid-sequence analysis examines group differences among the clusters. One can use a variety of methods to examine group differences, and the choice of method will depend on the number of clusters chosen and the research question. For example, if only two clusters are chosen and one wants to examine the clusters as a predictor variable, then one would use the cluster membership variable as a predictor in a logistic regression. In this case, we use analysis of variance (ANOVA) to examine group differences.

We examined males and females separately to assess whether relationship satisfaction and subjective health differed by cluster membership, which represented happiness variability. As you can see below, there were no significant group differences with this simulation data.

#Comparisons of Clusters using ANOVA

#Adding grouping variables to data set
data1$cl.3 <- cl.3

#data management: make data set "wide" by dyad member

data1_wide <- reshape(data=data1, 
                    timevar=c("member"),                    #"time" variable 
                    idvar= c("id"),                         #id variable
                    v.names=c("rel_sat", "health", "cl.3"), #variable measured for each dyad member
                    direction="wide", sep=".")

#Examining differences in relationship satisfaction separately for males and females
rel_sat_f <- aov(data1_wide$rel_sat.1 ~ factor(data1_wide$cl.3.1)) #female relationship satisfaction by female cluster membership
summary(rel_sat_f)

##                           Df Sum Sq Mean Sq F value Pr(>F)
## factor(data1_wide$cl.3.1)  2  0.209  0.1043   0.514    0.6
## Residuals                 95 19.300  0.2032

TukeyHSD(rel_sat_f) #post hoc test if needed

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = data1_wide$rel_sat.1 ~ factor(data1_wide$cl.3.1))
## 
## $`factor(data1_wide$cl.3.1)`
##            diff        lwr       upr     p adj
## 2-1  0.08737329 -0.1729131 0.3476597 0.7043908
## 3-1 -0.01788271 -0.2867054 0.2509399 0.9862666
## 3-2 -0.10525600 -0.3740786 0.1635666 0.6213253

rel_sat_m <- aov(data1_wide$rel_sat.2 ~ factor(data1_wide$cl.3.2)) #male relationship satisfaction by male cluster membership
summary(rel_sat_m)

##                           Df Sum Sq Mean Sq F value Pr(>F)
## factor(data1_wide$cl.3.2)  2  0.246  0.1228   0.601   0.55
## Residuals                 95 19.396  0.2042

TukeyHSD(rel_sat_m) #post hoc test if needed

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = data1_wide$rel_sat.2 ~ factor(data1_wide$cl.3.2))
## 
## $`factor(data1_wide$cl.3.2)`
##            diff        lwr       upr     p adj
## 2-1 -0.11677023 -0.3703495 0.1368091 0.5186389
## 3-1 -0.06098966 -0.3378801 0.2159008 0.8595781
## 3-2  0.05578057 -0.2211099 0.3326710 0.8810726

#Examining differences in subjective health separately for males and females
health_f <- aov(data1_wide$health.1 ~ factor(data1_wide$cl.3.1)) #female subjective health by female cluster membership
summary(health_f)

##                           Df Sum Sq Mean Sq F value Pr(>F)
## factor(data1_wide$cl.3.1)  2   0.28  0.1405   0.243  0.784
## Residuals                 95  54.82  0.5771

TukeyHSD(health_f) #post hoc test if needed

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = data1_wide$health.1 ~ factor(data1_wide$cl.3.1))
## 
## $`factor(data1_wide$cl.3.1)`
##            diff        lwr       upr     p adj
## 2-1 -0.12607877 -0.5647631 0.3126056 0.7732008
## 3-1 -0.04031623 -0.4933875 0.4127550 0.9755634
## 3-2  0.08576254 -0.3673087 0.5388338 0.8942117

health_m <- aov(data1_wide$health.2 ~ factor(data1_wide$cl.3.2)) #male subjective health by male cluster membership
summary(health_m)

##                           Df Sum Sq Mean Sq F value Pr(>F)
## factor(data1_wide$cl.3.2)  2   1.25  0.6231    1.27  0.286
## Residuals                 95  46.61  0.4906

TukeyHSD(health_m) #post hoc test if needed

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = data1_wide$health.2 ~ factor(data1_wide$cl.3.2))
## 
## $`factor(data1_wide$cl.3.2)`
##            diff        lwr       upr     p adj
## 2-1 -0.06290247 -0.4559889 0.3301839 0.9231677
## 3-1 -0.27947378 -0.7086960 0.1497485 0.2723564
## 3-2 -0.21657130 -0.6457935 0.2126509 0.4553051

Cautions

Although there are several distinct advantages of grid-sequence analysis, there are several limitations and considerations to the process, which include:

The need for simultaneous data from each member of the dyad.
The length of time series needed (dependent on the process under examination, but could be lengthy).
The change of a continuous variable into an interval variable when creating axes.
The extent of missingness.
The (current) need to use distinguishable dyads.

Conclusion

Theories of interpersonal dynamics and social interaction emphasize the need to study within-dyad dynamics. Grid-sequence analysis is a flexible new approach that allows researchers to capture within-dyad dynamics and to make between-dyad comparisons using repeated-measures dyadic data. We hope that the combination of state space grids and sequence analysis help researchers better study and understand the processes of interpersonal interactions.