This tutorial is Part 1 of five showing how to do survival analysis with observational data (video recordings of participant behavior), using a study of children’s emotion regulation as an example. In this tutorial, we will demonstrate how to format observational data for survival analysis for four different types of survival analysis models. We will be using data from Cole et al. (2011) for the examples.
In these examples, we examine how children’s temperament (negative affectivity) and use of emotion regulation strategies (bids and focused distraction) are related to the timing of their anger expressions during a frustrating wait task. We will use survival analysis in several different ways, specifically to examine:
These tutorials demonstrate the steps to take when time measurements occur in equally-spaced intervals. For the data we are using, children’s behaviors were coded for each 1-second interval of an 8-minute observational task, resulting in 480 time points. Data are in long format with rows indicating each second of the task for each participant.
Set working directory and read in data
setwd("~/Desktop/Data/Survival analysis")
wait36 <- read.csv("mom_child_36wait_composites.csv",header=TRUE)
head(wait36)
## id age time task taskphase hap ang sad anx biddemand bidother ssooth
## 1 1 36 1 wait 0 0 0 0 0 0 0 0
## 2 1 36 2 wait 0 0 0 0 0 0 0 0
## 3 1 36 3 wait 0 0 0 0 0 0 0 0
## 4 1 36 4 wait 0 0 0 0 0 0 0 0
## 5 1 36 5 wait 0 0 0 0 0 0 0 0
## 6 1 36 6 wait 0 0 0 0 0 0 0 0
## afix distractu distractf leavetaking fdemand touch odd oddb bidinfo
## 1 0 0 0 0 1 0 0 0 0
## 2 0 1 0 0 0 0 0 0 0
## 3 0 1 0 0 0 0 0 0 0
## 4 0 1 0 0 0 0 0 0 0
## 5 0 1 0 0 0 0 0 0 0
## 6 0 1 0 0 0 0 0 0 0
## oddpa afixalt d1b d2b fdd fdo fdplay neu calm neg infoseek disruptive
## 1 0 NA 0 0 1 0 0 1 1 0 0 0
## 2 0 NA 1 0 0 0 0 1 1 0 0 0
## 3 0 NA 1 0 0 0 0 1 1 0 0 0
## 4 0 NA 1 0 0 0 0 1 1 0 0 0
## 5 0 NA 1 0 0 0 0 1 1 0 0 0
## 6 0 NA 1 0 0 0 0 1 1 0 0 0
## switch_keys calmodd negodd mprohib content calmbidd negbidd angbidd
## 1 NA 0 0 0 1 0 0 0
## 2 NA 0 0 0 1 0 0 0
## 3 NA 0 0 0 1 0 0 0
## 4 NA 0 0 0 1 0 0 0
## 5 NA 0 0 0 1 0 0 0
## 6 NA 0 0 0 1 0 0 0
## Coder struc focus redir distract plan lang inhibit p_emo n_emo pm M_lang
## 1 AS 3 0 0 0 1 0 1 0 0 2 2
## 2 AS 3 0 0 0 1 0 1 0 0 2 2
## 3 AS 3 0 0 0 1 0 1 0 0 2 2
## 4 AS 3 0 0 0 1 0 1 0 0 2 2
## 5 AS 3 0 0 0 1 0 1 0 0 2 2
## 6 AS 3 0 0 0 1 0 1 0 0 2 2
## quality success on_off fdemand_PR PR ssooth_EP calmbidd_EP infoseek_EP
## 1 3 3 1 1 1 0 0 0
## 2 3 3 1 0 0 0 0 0
## 3 3 3 1 0 0 0 0 0
## 4 3 3 1 0 0 0 0 0
## 5 3 3 1 0 0 0 0 0
## 6 3 3 1 0 0 0 0 0
## distractu_EP distractf_EP EP skill mat
## 1 0 0 0 3 9
## 2 4 0 4 3 9
## 3 4 0 4 3 9
## 4 4 0 4 3 9
## 5 4 0 4 3 9
## 6 4 0 4 3 9
Load packages:
library(plyr)
library(car)
library(psych)
library(reshape)
library(foreign)
Subset to relevant variables:
wait36a <- wait36[,c("id","time","ang", "biddemand", "distractf")]
indiv.stats <- ddply(wait36,"id",summarize, timelength = length(time),
timeNA = sum(is.na(time)))
# fix row where time = NA to time = 480 for participant 45
wait36a[20160,2] <- 480
distractf
, biddemand
, and ang
Recode time-varying predictors distractf and biddemand, and outcome variable anger. Biddemand=1 refers to child-initiated bids, so recode values of 2 and 3 into 0 because we are only interested in child-initiated, non-disruptive bids. Anger intensity was coded but we are intereseted in treating anger as a binary variable (occurred or did not occur). Recode all levels of anger (1 through 3) into 1.
wait36a$biddemand_r <- recode(wait36a$biddemand, "0=0;1=1;2:3=0")
wait36a$ang_r <- recode(wait36a$ang, "0=0;1:3=1")
wait36a$distractf_kid <- recode(wait36a$distractf,"0=0;1=1;2:3=0")
indiv.stats <- ddply(wait36a,"id",summarize, timelength = length(time),
timeNA = sum(is.na(time)),
angNA = sum(is.na(ang_r)),
ang0 = sum(ang_r == 0,na.rm=TRUE),
ang1 = sum(ang_r == 1, na.rm=TRUE))
It is important to examine if any left censoring is present in the data during Step 2. One option to overcome this issue is to list-wise delete left-censored cases (Allison, 1984; Singer & Willett, 1991). However, list-wise deletion is undesirable because of the amount of information that is lost. Another option, which we took, is to define the start of the observation period for all cases as the first second at which anger was not present (Singer & Willett, 2003). This approach is conceptually consistent with the task design, which was designed to elicit angry expressions from a non-anger state.
In our sample, there were 7 left-censored cases for which we trimmed the beginning of their time series to the first time point at which anger did not occur to meet this definition of the start of the observation period.
To do this, we first subset each child where time == 1 and ang_r == 1 to then calculate frequency, minimum duration, maximum duration, and average duration of this first left-censored anger event.
wait36b <- wait36a[which(wait36a$time == 1 & wait36a$ang_r == 1),]
ids <- unique(wait36b$id)
describe(ids)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 7 68 49.3 82 68 57.82 13 121 108 -0.1 -2.13 18.63
wait36c <- wait36a[wait36a$id %in% ids,]
For each left-censored child, determine how long the first anger episode is:
touch.list <- list()
data.split = split(wait36c, f = wait36c$id)
for (i in 1:length(data.split)) {
iData = rle(data.split[[i]]$ang_r)
iData2 = matrix(unlist(iData),ncol=2,nrow=length(iData$lengths))
length = iData2[1,1]
touch.list[[i]] = c(length)
}
wait36d <- data.frame(do.call(rbind,touch.list)); names(wait36d) <- c("length")
table(wait36d)
## wait36d
## 1 3 8 21
## 3 2 1 1
Remove the length of rows for the first anger episode for each child. Also, create a timeOriginal
variable as well as a new time
variable:
touch.list2 <- list()
for(i in 1:length(data.split)){
iData = data.split[[i]]
start = wait36d[i,1]
iData2 = iData[(start+1):length(data.split[[i]]$ang_r), ]
iData2$timeOriginal = iData2$time
iData2$time = iData2$time - start
touch.list2[[i]] = iData2
}
wait36e <- data.frame(do.call(rbind,touch.list2))
Merge these cases back into the main dataframe. First create a dataframe without the left-censored children (so we don’t duplicate their information):
wait36f <- wait36a[which(wait36a$time == 1 & wait36a$ang_r == 0),]
ids2 <- unique(wait36f$id)
wait36g <- wait36a[wait36a$id %in% ids2,]
wait36g$timeOriginal <- wait36g$time
Then add the left censored children back in with the rest:
wait36h <- rbind(wait36g,wait36e)
wait36h <- wait36h[order(wait36h$id, wait36h$time),]
indiv.stats <- ddply(wait36h,"id",summarize,ang_rNA = sum(is.na(ang_r)),
ang_r0 =sum(ang_r==0,na.rm=TRUE),
ang_r1 =sum(ang_r==1,na.rm=TRUE))
length(indiv.stats[which(indiv.stats$ang_rNA==0 & indiv.stats$ang_r1==0),]$id)
## [1] 12
# 12 kids have no missing, but also no instances of anger
waitcomplete <- merge(wait36h,indiv.stats,by="id",all.x=TRUE)
waitcomplete <- waitcomplete[which(waitcomplete$ang_rNA == 0),]
# create function (survmlmds stands for survival multilevel model data structure)
survmlmds <- function(data,task.var){
# Load packages
library(reshape)
library(car)
# Create individual dataframes for each id
data.split = split(data, f=data$id)
# Create list to store output in
data.list.all = list()
# For each person
for(i in 1:length(data.split)) {
# Extract the variable of interest
iData = data.split[[i]][,task.var]
# Define the ID variable
id = data.split[[i]]$id
# Define the time variable
time = data.split[[i]]$time
# Calculate when 0's and 1's occur
iData = recode(iData,"0=0;1=1;NA=3")
iData2 = rle(iData)
iData3 = data.frame(matrix(unlist(iData2),ncol=2,nrow=length(iData2$lengths)))
colnames(iData3) = c("lengths","value")
iData3$row = seq(1:length(iData3$value))
# If there are any NA's in the task var
if(any(iData %in% 3)==TRUE){
value.list <- list()
for(x in 1:length(iData3$value)){
first = 1
second = 2
last = length(iData3$value)
# if row 1
if(iData3[x,3] %in% first == TRUE){
value_new = ifelse(iData3[(x+1),2] > 2, 3, iData3[x,2])
}
# if row 2
if(iData3[x,3] %in% second == TRUE){
value_new = ifelse(iData3[(x-1),2] >2 | iData3[x,2] > 2 | iData3[(x+1),2] > 2, 3, iData3[x,2])
}
# if last row
if(iData3[x,3] %in% last == TRUE){
if(iData3[(x-1),2] %in% 1 == TRUE){
value_new = ifelse(iData3[x,2] > 2, 3, iData3[x,2])
}
if(iData3[(x-1),2] %in% 1 == FALSE){
value_new = ifelse(iData3[(x-2),2] > 2 | iData3[(x-1),2] > 2 | iData3[x,2] > 3, 3, iData3[x,2])
}
}
# if any other row
if(iData3[x,3] %in% c(1,second,last) == FALSE){
if(iData3[x,2] %in% 0 == TRUE){
value_new = ifelse(iData3[(x-1),2] > 2 | iData3[(x+1),2] > 2, 3, iData3[x,2])
}
if(iData3[x,2] %in% 1 == TRUE){
value_new = ifelse(iData3[(x-2),2] > 2 | iData3[(x-1),2] > 2, 3, iData3[x,2])
}
if(iData3[x,2] %in% 3 == TRUE){
value_new = 3
}
}
value.list[[x]] = value_new
}
iData3$value_new = unlist(value.list)
iData_orig = iData
iData = rep(iData3$value_new,iData3$lengths) # save new iData that has the updated NAs in it
iData4 <- rle(iData)
iData5 = data.frame(matrix(unlist(iData4),ncol=2,nrow=length(iData4$lengths)))
colnames(iData5) = c("lengths","value")
iData = ifelse(iData == 3, NA, iData)
iData5$value = ifelse(iData5$value==3,NA,iData5$value)
iData3_orig = iData3
iData3 = iData5
# save new iData3 so that it will match participants who didn't have missing in their time series
}
# Calculate the number of episodes (i.e. switches between 0's and 1's)
if(any(!is.na(iData3$value)) == TRUE){
#episode_length = length(iData3$lengths)
episode_length = length(na.omit(iData3$value))
iData3$rownum <- seq(1:length(iData3$lengths))
iData4 = na.omit(iData3)
iData4$rownum2 = seq(1:length(iData4$lengths))
iData5 = merge(iData3,iData4[,c("rownum","rownum2")],by="rownum",all.x=TRUE)
iData5 = iData5[,c("lengths","value","rownum","rownum2")]
}
if(all(is.na(iData3$value))){
iData4 = iData3
iData5 = iData4
iData5$rownum = 1
iData5$rownum2 = NA
}
# If not all ang_r == 0
if(sum(iData,na.rm=TRUE) %in% c(0,NA) == FALSE){
# Create an object with the odd row numbers
#odds = seq(from = 1, to = episode_length, by = 2)
odds = which(iData4$rownum2 %% 2 == 1)
# Create an object with the even row numbers
#evens = seq(from = 2, to = episode_length, by = 2)
evens = which(iData4$rownum2 %% 2 == 0)
}
# Calculate period for each person
# Create list
period.list.x = list()
#for(x in 1:episode_length){
for(x in 1:length(iData5$rownum2)){
# if all ang_r == 0
if(all(iData %in% 0) == TRUE){
period = seq(1:length(iData))
}
if(all(iData %in% 0) == FALSE){
# if all ang_r == NA
if(all(is.na(iData5$value)) == TRUE){
period = rep(NA,length(iData))
}
# if all ang_r == 0 or NA
if(all(iData %in% c(0,NA) == TRUE)){
period = rep(NA,iData5[x,1])
}
# If odd row
if(sum(iData,na.rm=TRUE) %in% c(0, NA) == FALSE){
if(iData5[x,4] %in% odds == TRUE){
period = seq(1:(iData5[x,1]))
}
}
# If even row
if(sum(iData, na.rm=TRUE) %in% c(0, NA) == FALSE){
if(iData5[x,4] %in% evens == TRUE){
if(iData5[x,1] %in% 1 == FALSE){
period = iData5[x-1,1] + 1
period = c(iData5[x-1,1]+1,rep(NA,iData5[x,1]-1))
}
if(iData5[x,1] %in% 1 == TRUE){
period = iData5[x-1,1] + 1
}
}
}
# If NA row
if(sum(iData, na.rm=TRUE) %in% c(0, NA) == FALSE){
if((is.na(iData5[x,2])==TRUE)){
period = c(rep(NA,iData5[x,1]))
}
}
}
# Combine into a list
period.list.x[[x]] = period
}
period.df = melt.list(period.list.x)
period = period.df[,1]
# if a series of NAs first and then zeros
if(length(iData3$lengths) %in% 2 == TRUE){
if(is.na(iData3[1,2]) == TRUE){
period = c(seq(1:iData3_orig[1,1]),rep(NA,(length(iData) - iData3_orig[1,1])))
period = c(rep(NA, iData3[1,1]), seq(1:iData3[2,1]))
}
}
# if only 1 episode and this is right censored (i.e. a string of zeros followed by NA's)
if(all(is.na(iData))==TRUE){
if(all(iData3_orig$value_new %in% 3) == TRUE){
if(is.na(iData3[1,2]) == TRUE){
period = c(seq(1:iData3_orig[1,1]), rep(NA, sum(iData3_orig$lengths)-iData3_orig[1,1]))
}
}
}
# Calculate episode for each person
# Create list
episode.list.x = list()
# for(x in 1:episode_length){
for(x in 1:length(iData5$rownum)){
# if all ang_r == NA
if(all(is.na(iData) == TRUE)){
if(all(is.na(period) == TRUE)){
episode_rep = length(iData)
}
if(any(!is.na(period)== TRUE)){
episode_rep = c(sum(!is.na(period)),sum(is.na(period)))
}
}
# if time series starts with NAs and has zeros at the end
if(is.na(iData[1]) == TRUE){
if(iData[length(iData)] %in% 0 == TRUE){
if(iData5$rownum[x] %in% 1 == TRUE){
episode_rep = sum(is.na(iData))
}
if(iData5$rownum[x] %in% 2 == TRUE){
episode_rep = sum(!is.na(iData))
}
}
}
# if all ang_r == 0
if(all(iData %in% 0 == TRUE)){
episode_rep = max(period.list.x[[x]])
}
# # if all ang_r == 0 or ==NA
# if(all(iData %in% c(0,NA) == TRUE)){
# episode_rep = iData5[x,1]
# }
# If odd row
if(sum(iData, na.rm=TRUE) %in% c(0,NA) == FALSE){
if((iData5[x,4] %in% odds) == TRUE){
episode_rep = max(period.list.x[[x]]) + 1
}
}
# If even row
if(sum(iData, na.rm=TRUE) %in% c(0,NA) == FALSE){
if(iData5[x,4] %in% evens == TRUE){
if(iData3[x,1] %in% c(0,1,NA) == FALSE){
#episode_rep = 0
#episode_rep = c(1,rep(NA,length(period.list.x[[x]])-2))
#episode_rep = NA
episode_rep = length(period.list.x[[x]]) - 1
}
if(iData3[x,1] %in% 1 == TRUE){
episode_rep = 0
}
}
}
# If NA row
if(sum(iData, na.rm=TRUE) %in% c(0,NA) == FALSE){
if(is.na(iData5[x,2]) == TRUE){
episode_rep = length(period.list.x[[x]])
}
}
# Combine into a list
episode.list.x[[x]] = episode_rep
}
# Combine episode.list.x into 1 column
episode_rep <- unlist(episode.list.x)
if(sum(iData,na.rm=TRUE) %in% 0 == FALSE){
if(!is.na(iData5[length(iData5$value),2]) == TRUE){
if(iData5[length(iData5$value),2] %in% 0 == TRUE){
episode_rep = c(episode_rep[1:(length(episode_rep)-1)],(episode_rep[length(episode_rep)]-1))
}
}
if(is.na(iData5[length(iData5$value),2]) == TRUE){
if(iData5[length(iData5$value)-1,2] %in% 0 == TRUE)
episode_rep = c(episode_rep[1:(length(episode_rep)-2)],episode_rep[length(episode_rep)-1]-1,episode_rep[length(episode_rep)])
}
}
episodelist.x2 <- list()
for(x in 1:length(episode_rep)){
old = episode_rep[x]
# if all ang_r == 0
if(all(iData %in% 0 == TRUE)){
episode = rep(1,old)
}
if(all(iData %in% 0) == FALSE){
# odd row
if(sum(iData,na.rm=TRUE) %in% c(0,NA) == FALSE){
if((iData5[x,4] %in% odds) == TRUE){
epnumber = iData5[x,4]
episode = rep(((epnumber/2)+.5),old)
}
}
# even row
if(sum(iData,na.rm=TRUE) %in% c(0,NA) == FALSE){
if((iData5[x,4] %in% evens)==TRUE){
episode = rep(NA,old)
}
}
# NA row
if(sum(iData,na.rm=TRUE) %in% c(0,NA) == FALSE){
if((iData5[x,4] %in% NA)==TRUE){
episode = rep(NA,old)
}
}
}
# Censored cases
# if censored such that iData5 is a 2x2 with 0 and then NA
if((length(iData5$lengths %in% 1)) == TRUE){
if(is.na(iData5[1,2] == TRUE)){
if(iData3_orig[x,2] %in% c(0,1) == TRUE){
episode = rep(1,old)
}
if(iData3_orig[x,2] %in% 3 == TRUE){
episode = rep(NA,old)
}
}
}
# if censored such that iData5 is a 2x2 with NA and then 0
if((length(iData5$lengths)) %in% 2 == TRUE){
if(is.na(iData5[1,2] == TRUE)){
if(is.na(iData5[x,2]) == TRUE){
episode = rep(NA,old)
}
if(iData5[x,2] %in% 0 == TRUE){
episode = rep(1,old)
}
}
}
# if(all(iData %in% c(0,NA) == TRUE)){
# episode = rep(NA,old)
# }
episodelist.x2[[x]] = episode
}
episode = unlist(episodelist.x2)
# Create data.frame
iData6 = data.frame(id,time,period,episode)
# Store in list
data.list.all[[i]] = iData6
}
# Put the list elements into a dataframe
data.all = do.call(rbind, data.list.all)
# Return a dataframe
return(data.all)
}
# Use new function
wait36i <- survmlmds(wait36h,"ang_r")
# Combine new data in with existing data
wait36j <- merge(wait36h,wait36i,by=c("id","time"),all=TRUE)
# Sort
wait36j <- wait36j[order(wait36j$id,wait36j$time),]
status (pertains to only episode 1; see Part 2):
wait36j <- ddply(wait36j,c("id"),mutate, status = ifelse(all(ang_r %in% 0 == TRUE),0,1))
wait36j$status <- ifelse(wait36j$episode %in% 1, wait36j$status, NA)
status.time (pertains to only episode 1, see Part 2):
data.split = split(wait36j, f = wait36j$id)
statustime.list <- list()
for(i in 1:length(data.split)){
iData = data.split[[i]]
iData$status2 = recode(iData$status,"0=0;1=1;NA=3")
iData2 = rle(iData$status2)
iData3 = data.frame(matrix(unlist(iData2),ncol=2,nrow=length(iData2$lengths)))
colnames(iData3) = c("lengths","value")
statustime.list2 <- list()
for(x in 1:length(iData3$value)){
if(iData3[x,2] %in% 0 == TRUE){
status_length = iData3[x,1]
status_time = rep(0, status_length)
}
if(iData3[x,2] %in% 1 == TRUE){
status_length = iData3[x,1]
status_time = c(rep(0,status_length-1),1)
}
if(iData3[x,2] %in% 3 == TRUE){
status_length = iData3[x,1]
status_time = rep(NA, status_length)
}
statustime.list2[[x]] = status_time
}
status.time = unlist(statustime.list2)
iData$status_time = status.time
statustime.list[[i]] = iData
}
wait36k <- data.frame(do.call(rbind,statustime.list))
event_time:
wait36l <- ddply(wait36k, c("id","episode"),mutate, event_time = max(period))
wait36l$event_time <- ifelse(is.na(wait36l$episode),NA,wait36l$event_time)
wait36l$start <- wait36l$period -1
wait36l$stop <- wait36l$period
wait36l$ang_r2 <- ifelse(is.na(wait36l$episode),NA,wait36l$ang_r)
wait36m <- rename(wait36l, c("time"="second", "ang_r" = "anger", "biddemand_r"="bids",
"distractf_kid" = "distraction",
"ang_r2" = "ang_event"))
Subset the data to remove unneeded variables:
wait36n <- wait36m[,c("id","second","anger", "bids", "distraction", "timeOriginal", "period",
"episode", "status", "status2","status_time", "event_time",
"start", "stop", "ang_event")]
write.csv(wait36n,"wait36_Anger_Survival Analysis Formatted Data.csv",row.names=FALSE)
Set working directory and read in data
setwd("~/Desktop/Data/Survival analysis")
wait36 <- read.csv("wait36_Anger_Survival Analysis Formatted Data.csv",header=TRUE)
names(wait36)
## [1] "id" "second" "anger" "bids"
## [5] "distraction" "timeOriginal" "period" "episode"
## [9] "status" "status2" "status_time" "event_time"
## [13] "start" "stop" "ang_event"
Merge in temperament data for use as a time-invariant predictor:
# Read in temperament data from SPSS file:
temp.df <- read.spss("~/Desktop/Data/Survival analysis/Temperament Over Time.sav", to.data.frame=TRUE)
# Merge negative affectivity score at 30 months in with wait36
wait36 <-merge(x = wait36, y = temp.df[ , c("id", "namn30")], by = "id", all.x=TRUE)
# Rename ID variable in temp.df so it can be merged with data
names(temp.df)[names(temp.df) == 'ID'] <- 'id'
# Rename the negative affectivity variable for clarity
names(wait36)[names(wait36)=="namn30"] <- "negative_affectivity30"
The original data structure has been re-organized into a data set with one record per ID. Event Time indicates number of seconds until the first anger expression. Negative affectivity is a time-invariant predictor.
wait36a <- wait36[which(wait36$episode == 1 & wait36$period == 1),
c("id","status","event_time","negative_affectivity30")]
head(wait36a)
## id status event_time negative_affectivity30
## 1 1 1 39 2.793750
## 481 2 1 203 3.614583
## 961 3 1 80 4.347222
## 1441 5 1 10 3.745960
## 1921 6 1 14 3.933523
## 2401 8 1 89 3.043750
One difference between the original data structure and the structure for this model is that the data structure has one row of data per participant per second, up until the first anger event (or right censoring), whereas the original data structure has one row of data per participant per second, up until the end of the observation. Note that right-censored cases in the data would have 480 rows of data because this child’s data are right-censored. Otherwise, all non-right-censored participants will have a number of rows that is equal to the time the first anger expression occurred. In the data, the second-by-second Anger Event, Bids, and Distraction variables are the same as the original data structure. However, two new variables are inserted, Start and Stop, that indicate the beginning and ending times of intervals, represented by each row of data. These variables are used in the definition of the outcome variable in the survival
package.
wait36b <- wait36[which(wait36$episode==1),c("id","second","ang_event","bids","distraction","status_time",
"start","stop")]
head(wait36b)
## id second ang_event bids distraction status_time start stop
## 1 1 1 0 0 0 0 0 1
## 2 1 2 0 0 0 0 1 2
## 3 1 3 0 0 0 0 2 3
## 4 1 4 0 0 0 0 3 4
## 5 1 5 0 0 0 0 4 5
## 6 1 6 0 0 0 0 5 6
The data structure for this model requires some restructuring of the data. No transformations are applied to Anger Event. Negative Affectivity is a time-invariant predictor. Two new variables are calculated, Episode and Period (Mills, 2011; Willett & Singer, 1995). Episodes refer to anger expressions, and Episode 1 begins at the first second of the wait task in which anger is not occurring. Subsequent episodes begin once the prior anger event has ended, as individuals cannot be at risk for an event while it is occurring. Each episode ends at the next second during which anger has occurred. Period contains the amount of time within each episode for each row of data (one second per row), restarting at 1 once a new episode begins.
We will use the Episode and Period variables to reformat the data such that each episode has one row of data, with the event_time variable indicating the duration of the episode.
#Summarize to get the maximum value of Period for each episode for each child. The maximum value of Period within each episode indicates the duration of that episode.
df2 <- ddply(wait36, c("id", "episode"), summarize,
period = max(period, na.rm= TRUE))
df2$period <-ifelse(df2$period <0, NA, df2$period)
# Now merge that data frame in with the other variables we are working with.
df3 <- merge(df2, wait36, by=c("id","period", "episode"), all.x = TRUE)
# Remove unneeded rows (only need final rows of each child's episode)
df4 <- df3[which((!is.na(df3$period) &!is.na(df3$episode))),]
#Sort the data by id
df4 <- df4[order(df4$id, df4$episode),]
Now subset to keep only the variables we need for this model:
wait36c <- df4[,c("id","ang_event","episode", "event_time", "negative_affectivity30")]
head(wait36c)
## id ang_event episode event_time negative_affectivity30
## 5 1 1 1 39 2.79375
## 1 1 1 2 15 2.79375
## 6 1 1 3 39 2.79375
## 3 1 1 4 3 2.79375
## 4 1 1 5 346 2.79375
## 8 1 1 6 8 2.79375
This data contains one row per each second of the task, for the entire task (in contrast to the Model 3 data, which contains data for each second up until the first anger episode or time of right-censoring).
wait36d <- wait36[,c("id","second","ang_event","bids", "distraction", "episode","period", "start","stop")]
head(wait36d)
## id second ang_event bids distraction episode period start stop
## 1 1 1 0 0 0 1 1 0 1
## 2 1 2 0 0 0 1 2 1 2
## 3 1 3 0 0 0 1 3 2 3
## 4 1 4 0 0 0 1 4 3 4
## 5 1 5 0 0 0 1 5 4 5
## 6 1 6 0 0 0 1 6 5 6
Then save your files to use in data analysis.