Overview

This tutorial is Part 1 of five showing how to do survival analysis with observational data (video recordings of participant behavior), using a study of children’s emotion regulation as an example. In this tutorial, we will demonstrate how to format observational data for survival analysis for four different types of survival analysis models. We will be using data from Cole et al. (2011) for the examples.

In these examples, we examine how children’s temperament (negative affectivity) and use of emotion regulation strategies (bids and focused distraction) are related to the timing of their anger expressions during a frustrating wait task. We will use survival analysis in several different ways, specifically to examine:

These tutorials demonstrate the steps to take when time measurements occur in equally-spaced intervals. For the data we are using, children’s behaviors were coded for each 1-second interval of an 8-minute observational task, resulting in 480 time points. Data are in long format with rows indicating each second of the task for each participant.

Set-up

Set working directory and read in data

setwd("~/Desktop/Data/Survival analysis")

wait36 <- read.csv("mom_child_36wait_composites.csv",header=TRUE)
head(wait36)
##   id age time task taskphase hap ang sad anx biddemand bidother ssooth
## 1  1  36    1 wait         0   0   0   0   0         0        0      0
## 2  1  36    2 wait         0   0   0   0   0         0        0      0
## 3  1  36    3 wait         0   0   0   0   0         0        0      0
## 4  1  36    4 wait         0   0   0   0   0         0        0      0
## 5  1  36    5 wait         0   0   0   0   0         0        0      0
## 6  1  36    6 wait         0   0   0   0   0         0        0      0
##   afix distractu distractf leavetaking fdemand touch odd oddb bidinfo
## 1    0         0         0           0       1     0   0    0       0
## 2    0         1         0           0       0     0   0    0       0
## 3    0         1         0           0       0     0   0    0       0
## 4    0         1         0           0       0     0   0    0       0
## 5    0         1         0           0       0     0   0    0       0
## 6    0         1         0           0       0     0   0    0       0
##   oddpa afixalt d1b d2b fdd fdo fdplay neu calm neg infoseek disruptive
## 1     0      NA   0   0   1   0      0   1    1   0        0          0
## 2     0      NA   1   0   0   0      0   1    1   0        0          0
## 3     0      NA   1   0   0   0      0   1    1   0        0          0
## 4     0      NA   1   0   0   0      0   1    1   0        0          0
## 5     0      NA   1   0   0   0      0   1    1   0        0          0
## 6     0      NA   1   0   0   0      0   1    1   0        0          0
##   switch_keys calmodd negodd mprohib content calmbidd negbidd angbidd
## 1          NA       0      0       0       1        0       0       0
## 2          NA       0      0       0       1        0       0       0
## 3          NA       0      0       0       1        0       0       0
## 4          NA       0      0       0       1        0       0       0
## 5          NA       0      0       0       1        0       0       0
## 6          NA       0      0       0       1        0       0       0
##   Coder struc focus redir distract plan lang inhibit p_emo n_emo pm M_lang
## 1    AS     3     0     0        0    1    0       1     0     0  2      2
## 2    AS     3     0     0        0    1    0       1     0     0  2      2
## 3    AS     3     0     0        0    1    0       1     0     0  2      2
## 4    AS     3     0     0        0    1    0       1     0     0  2      2
## 5    AS     3     0     0        0    1    0       1     0     0  2      2
## 6    AS     3     0     0        0    1    0       1     0     0  2      2
##   quality success on_off fdemand_PR PR ssooth_EP calmbidd_EP infoseek_EP
## 1       3       3      1          1  1         0           0           0
## 2       3       3      1          0  0         0           0           0
## 3       3       3      1          0  0         0           0           0
## 4       3       3      1          0  0         0           0           0
## 5       3       3      1          0  0         0           0           0
## 6       3       3      1          0  0         0           0           0
##   distractu_EP distractf_EP EP skill mat
## 1            0            0  0     3   9
## 2            4            0  4     3   9
## 3            4            0  4     3   9
## 4            4            0  4     3   9
## 5            4            0  4     3   9
## 6            4            0  4     3   9

Load packages:

library(plyr)
library(car)
library(psych)
library(reshape)
library(foreign)

Subset to relevant variables:

wait36a <- wait36[,c("id","time","ang", "biddemand", "distractf")]

indiv.stats <- ddply(wait36,"id",summarize, timelength = length(time),
                     timeNA = sum(is.na(time)))

# fix row where time = NA to time = 480 for participant 45
wait36a[20160,2] <- 480

Data Preparation

Recode distractf, biddemand, and ang

Recode time-varying predictors distractf and biddemand, and outcome variable anger. Biddemand=1 refers to child-initiated bids, so recode values of 2 and 3 into 0 because we are only interested in child-initiated, non-disruptive bids. Anger intensity was coded but we are intereseted in treating anger as a binary variable (occurred or did not occur). Recode all levels of anger (1 through 3) into 1.

wait36a$biddemand_r <- recode(wait36a$biddemand, "0=0;1=1;2:3=0")
wait36a$ang_r <- recode(wait36a$ang, "0=0;1:3=1")
wait36a$distractf_kid <- recode(wait36a$distractf,"0=0;1=1;2:3=0")

indiv.stats <- ddply(wait36a,"id",summarize, timelength = length(time),
                     timeNA = sum(is.na(time)),
                     angNA = sum(is.na(ang_r)),
                     ang0 = sum(ang_r == 0,na.rm=TRUE),
                     ang1 = sum(ang_r == 1, na.rm=TRUE))

Remove Left Censoring from the Data

It is important to examine if any left censoring is present in the data during Step 2. One option to overcome this issue is to list-wise delete left-censored cases (Allison, 1984; Singer & Willett, 1991). However, list-wise deletion is undesirable because of the amount of information that is lost. Another option, which we took, is to define the start of the observation period for all cases as the first second at which anger was not present (Singer & Willett, 2003). This approach is conceptually consistent with the task design, which was designed to elicit angry expressions from a non-anger state.

In our sample, there were 7 left-censored cases for which we trimmed the beginning of their time series to the first time point at which anger did not occur to meet this definition of the start of the observation period.

To do this, we first subset each child where time == 1 and ang_r == 1 to then calculate frequency, minimum duration, maximum duration, and average duration of this first left-censored anger event.

wait36b <- wait36a[which(wait36a$time == 1 & wait36a$ang_r == 1),]
ids <- unique(wait36b$id)
describe(ids)
##    vars n mean   sd median trimmed   mad min max range skew kurtosis    se
## X1    1 7   68 49.3     82      68 57.82  13 121   108 -0.1    -2.13 18.63
wait36c <- wait36a[wait36a$id %in% ids,]

For each left-censored child, determine how long the first anger episode is:

touch.list <- list()
data.split = split(wait36c, f = wait36c$id)

for (i in 1:length(data.split)) {
  iData =  rle(data.split[[i]]$ang_r)
  iData2 = matrix(unlist(iData),ncol=2,nrow=length(iData$lengths))
  length = iData2[1,1]
  
  touch.list[[i]] = c(length)
}

wait36d <- data.frame(do.call(rbind,touch.list)); names(wait36d) <- c("length")

table(wait36d)
## wait36d
##  1  3  8 21 
##  3  2  1  1

Remove the length of rows for the first anger episode for each child. Also, create a timeOriginal variable as well as a new time variable:

  • timeOriginal will be the original time variable
  • time will be the original time variable minus the length of the first anger episode
touch.list2 <- list()

for(i in 1:length(data.split)){
  iData = data.split[[i]]
  start = wait36d[i,1]
  iData2 = iData[(start+1):length(data.split[[i]]$ang_r), ]
  iData2$timeOriginal = iData2$time 
  iData2$time = iData2$time - start
  
  touch.list2[[i]] = iData2
}

wait36e <- data.frame(do.call(rbind,touch.list2))

Merge these cases back into the main dataframe. First create a dataframe without the left-censored children (so we don’t duplicate their information):

wait36f <- wait36a[which(wait36a$time == 1 & wait36a$ang_r == 0),]
ids2 <- unique(wait36f$id)
wait36g <- wait36a[wait36a$id %in% ids2,]
wait36g$timeOriginal <- wait36g$time

Then add the left censored children back in with the rest:

wait36h <- rbind(wait36g,wait36e)
wait36h <- wait36h[order(wait36h$id, wait36h$time),] 

Format Data to Look at Each Anger Episode

indiv.stats <- ddply(wait36h,"id",summarize,ang_rNA = sum(is.na(ang_r)),
                     ang_r0 =sum(ang_r==0,na.rm=TRUE),
                     ang_r1 =sum(ang_r==1,na.rm=TRUE))

length(indiv.stats[which(indiv.stats$ang_rNA==0 & indiv.stats$ang_r1==0),]$id) 
## [1] 12
# 12 kids have no missing, but also no instances of anger

waitcomplete <- merge(wait36h,indiv.stats,by="id",all.x=TRUE)
waitcomplete <- waitcomplete[which(waitcomplete$ang_rNA == 0),]
    
# create function (survmlmds stands for survival multilevel model data structure)
survmlmds <- function(data,task.var){
  # Load packages
  library(reshape)
  library(car)
  
  # Create individual dataframes for each id
  data.split = split(data, f=data$id)
  
  # Create list to store output in
  data.list.all = list()
  
  # For each person
  for(i in 1:length(data.split)) {
    
    # Extract the variable of interest
    iData = data.split[[i]][,task.var]
    
    # Define the ID variable
    id = data.split[[i]]$id
    
    # Define the time variable
    time = data.split[[i]]$time
    
    # Calculate when 0's and 1's occur
    iData = recode(iData,"0=0;1=1;NA=3")
    iData2 = rle(iData)
    iData3 = data.frame(matrix(unlist(iData2),ncol=2,nrow=length(iData2$lengths)))
    colnames(iData3) = c("lengths","value")
    iData3$row = seq(1:length(iData3$value))
    
    # If there are any NA's in the task var
    if(any(iData %in% 3)==TRUE){
      
      value.list <- list()
      
      for(x in 1:length(iData3$value)){
        first = 1
        second = 2
        last = length(iData3$value)
        
        # if row 1
        if(iData3[x,3] %in% first == TRUE){
          value_new = ifelse(iData3[(x+1),2] > 2, 3, iData3[x,2])
        }
        
        # if row 2
        if(iData3[x,3] %in% second == TRUE){
          value_new = ifelse(iData3[(x-1),2] >2 | iData3[x,2] > 2 | iData3[(x+1),2] > 2, 3, iData3[x,2])
        }
        
        # if last row
        if(iData3[x,3] %in% last == TRUE){
          if(iData3[(x-1),2] %in% 1 == TRUE){
            value_new = ifelse(iData3[x,2] > 2, 3, iData3[x,2]) 
          }
          
          if(iData3[(x-1),2] %in% 1 == FALSE){
            value_new = ifelse(iData3[(x-2),2] > 2 | iData3[(x-1),2] > 2 | iData3[x,2] > 3, 3, iData3[x,2])
          }
          
        }
        
        # if any other row
        if(iData3[x,3] %in% c(1,second,last) == FALSE){
          
          if(iData3[x,2] %in% 0 == TRUE){
            value_new = ifelse(iData3[(x-1),2] > 2 | iData3[(x+1),2] > 2, 3, iData3[x,2])
          }
          
          if(iData3[x,2] %in% 1 == TRUE){
            value_new = ifelse(iData3[(x-2),2] > 2 | iData3[(x-1),2] > 2, 3, iData3[x,2])
          }
          
          if(iData3[x,2] %in% 3 == TRUE){
            value_new = 3
          }
        }
        
        value.list[[x]] = value_new
      }
      
      iData3$value_new = unlist(value.list)
      
      iData_orig = iData
      iData = rep(iData3$value_new,iData3$lengths) # save new iData that has the updated NAs in it
      
      iData4 <- rle(iData)
      iData5 = data.frame(matrix(unlist(iData4),ncol=2,nrow=length(iData4$lengths)))
      colnames(iData5) = c("lengths","value")
      
      iData = ifelse(iData == 3, NA, iData)
      iData5$value = ifelse(iData5$value==3,NA,iData5$value)
      iData3_orig = iData3
      iData3 = iData5 
      # save new iData3 so that it will match participants who didn't have missing in their time series 
    }
    
    # Calculate the number of episodes (i.e. switches between 0's and 1's)
    if(any(!is.na(iData3$value)) == TRUE){
      #episode_length = length(iData3$lengths)
      episode_length = length(na.omit(iData3$value))
      
      iData3$rownum <- seq(1:length(iData3$lengths))
      
      iData4 = na.omit(iData3)
      iData4$rownum2 = seq(1:length(iData4$lengths))
      iData5 = merge(iData3,iData4[,c("rownum","rownum2")],by="rownum",all.x=TRUE)
      iData5 = iData5[,c("lengths","value","rownum","rownum2")]
    }
    
    if(all(is.na(iData3$value))){
      iData4 = iData3
      iData5 = iData4
      iData5$rownum = 1
      iData5$rownum2 = NA
    }
    
    # If not all ang_r == 0
    if(sum(iData,na.rm=TRUE) %in% c(0,NA) == FALSE){
      
      # Create an object with the odd row numbers
      #odds = seq(from = 1, to = episode_length, by = 2)
      odds = which(iData4$rownum2 %% 2 == 1)
      
      # Create an object with the even row numbers
      #evens = seq(from = 2, to = episode_length, by = 2)
      evens = which(iData4$rownum2 %% 2 == 0)
    }
    
    # Calculate period for each person
    # Create list
    period.list.x = list()
    
    #for(x in 1:episode_length){
    for(x in 1:length(iData5$rownum2)){
      
      # if all ang_r == 0
      if(all(iData %in% 0) == TRUE){
        period = seq(1:length(iData))
      }
      
      if(all(iData %in% 0) == FALSE){
      
      # if all ang_r == NA
      if(all(is.na(iData5$value)) == TRUE){
        period = rep(NA,length(iData))
      }
      
      # if all ang_r == 0 or NA
      if(all(iData %in% c(0,NA) == TRUE)){
        period = rep(NA,iData5[x,1])
      }
      
      # If odd row
      if(sum(iData,na.rm=TRUE) %in% c(0, NA) == FALSE){
        if(iData5[x,4] %in% odds == TRUE){
          period = seq(1:(iData5[x,1]))
        }
      }
      
      # If even row
      if(sum(iData, na.rm=TRUE) %in% c(0, NA) == FALSE){
        if(iData5[x,4] %in% evens == TRUE){
          if(iData5[x,1] %in% 1 == FALSE){
            period = iData5[x-1,1] + 1
            period = c(iData5[x-1,1]+1,rep(NA,iData5[x,1]-1))
          }
          if(iData5[x,1] %in% 1 == TRUE){
            period = iData5[x-1,1] + 1
          }
        }
      }
      
      # If NA row
      if(sum(iData, na.rm=TRUE) %in% c(0, NA) == FALSE){
        if((is.na(iData5[x,2])==TRUE)){
          period = c(rep(NA,iData5[x,1]))
        }
      }
    }
      
      # Combine into a list
      period.list.x[[x]] = period
    }
    
    period.df = melt.list(period.list.x)
    period = period.df[,1]
    
    # if a series of NAs first and then zeros
    if(length(iData3$lengths) %in% 2 == TRUE){
      if(is.na(iData3[1,2]) == TRUE){
        period = c(seq(1:iData3_orig[1,1]),rep(NA,(length(iData) - iData3_orig[1,1])))
        period = c(rep(NA, iData3[1,1]), seq(1:iData3[2,1]))
      }
    }
    
    # if only 1 episode and this is right censored (i.e. a string of zeros followed by NA's)
    if(all(is.na(iData))==TRUE){
    if(all(iData3_orig$value_new %in% 3) == TRUE){
        if(is.na(iData3[1,2]) == TRUE){
          period = c(seq(1:iData3_orig[1,1]), rep(NA, sum(iData3_orig$lengths)-iData3_orig[1,1]))
        }
     }
    }
    
    
    # Calculate episode for each person
    # Create list
    episode.list.x = list()
    
    # for(x in 1:episode_length){
    for(x in 1:length(iData5$rownum)){
      
      # if all ang_r == NA
      if(all(is.na(iData) == TRUE)){
        if(all(is.na(period) == TRUE)){
          episode_rep = length(iData)
        }
        if(any(!is.na(period)== TRUE)){
          episode_rep = c(sum(!is.na(period)),sum(is.na(period)))
        }
      }
      
      # if time series starts with NAs and has zeros at the end
      if(is.na(iData[1]) == TRUE){
        if(iData[length(iData)] %in% 0 == TRUE){
          if(iData5$rownum[x] %in% 1 == TRUE){
         episode_rep = sum(is.na(iData))
          }
          if(iData5$rownum[x] %in% 2 == TRUE){
            episode_rep = sum(!is.na(iData))
          }
        }
      }
      
      # if all ang_r == 0 
      if(all(iData %in% 0 == TRUE)){
        episode_rep = max(period.list.x[[x]])
      }
      
      # # if all ang_r == 0 or ==NA
      # if(all(iData %in% c(0,NA) == TRUE)){
      #   episode_rep = iData5[x,1]
      # }
      
      # If odd row
      if(sum(iData, na.rm=TRUE) %in% c(0,NA) == FALSE){
        if((iData5[x,4] %in% odds) == TRUE){
          episode_rep = max(period.list.x[[x]]) + 1
        }
      }
      
      # If even row
      if(sum(iData, na.rm=TRUE) %in% c(0,NA) == FALSE){
        if(iData5[x,4] %in% evens == TRUE){
          if(iData3[x,1] %in% c(0,1,NA) == FALSE){
            #episode_rep = 0
            #episode_rep = c(1,rep(NA,length(period.list.x[[x]])-2))
            #episode_rep = NA 
            episode_rep = length(period.list.x[[x]]) - 1
          }
          if(iData3[x,1] %in% 1 == TRUE){
            episode_rep = 0
          }
        }
      }
      
      # If NA row
      if(sum(iData, na.rm=TRUE) %in% c(0,NA) == FALSE){
        if(is.na(iData5[x,2]) == TRUE){
          episode_rep = length(period.list.x[[x]])
        }
      }
      
      # Combine into a list
      episode.list.x[[x]] = episode_rep
    }
    
    # Combine episode.list.x into 1 column
    episode_rep <- unlist(episode.list.x)
    
    
    if(sum(iData,na.rm=TRUE) %in% 0 == FALSE){
      if(!is.na(iData5[length(iData5$value),2]) == TRUE){
        if(iData5[length(iData5$value),2] %in% 0 == TRUE){
          episode_rep = c(episode_rep[1:(length(episode_rep)-1)],(episode_rep[length(episode_rep)]-1))
        }
      }
      if(is.na(iData5[length(iData5$value),2]) == TRUE){
        if(iData5[length(iData5$value)-1,2] %in% 0 == TRUE)
          episode_rep = c(episode_rep[1:(length(episode_rep)-2)],episode_rep[length(episode_rep)-1]-1,episode_rep[length(episode_rep)])
      }
    }
    
    episodelist.x2 <- list()
    
    for(x in 1:length(episode_rep)){
      old = episode_rep[x]
      
      # if all ang_r == 0
      if(all(iData %in% 0 == TRUE)){
        episode = rep(1,old)
      }
      
      if(all(iData %in% 0) == FALSE){
        
        # odd row
        if(sum(iData,na.rm=TRUE) %in% c(0,NA) == FALSE){
          if((iData5[x,4] %in% odds) == TRUE){
            epnumber = iData5[x,4]
            episode = rep(((epnumber/2)+.5),old)
          }
        }
        
        # even row
        if(sum(iData,na.rm=TRUE) %in% c(0,NA) == FALSE){
          if((iData5[x,4] %in% evens)==TRUE){
            episode = rep(NA,old)
          }
        }
        
        # NA row
        if(sum(iData,na.rm=TRUE) %in% c(0,NA) == FALSE){
          if((iData5[x,4] %in% NA)==TRUE){
            episode = rep(NA,old)
          }
        }
      }
      
      # Censored cases
      # if censored such that iData5 is a 2x2 with 0 and then NA
      if((length(iData5$lengths %in% 1)) == TRUE){
        if(is.na(iData5[1,2] == TRUE)){
          if(iData3_orig[x,2] %in% c(0,1) == TRUE){
            episode = rep(1,old)
          }
          if(iData3_orig[x,2] %in% 3 == TRUE){
            episode = rep(NA,old) 
          }
        }
      }
      
      # if censored such that iData5 is a 2x2 with NA and then 0
      if((length(iData5$lengths)) %in% 2 == TRUE){
        if(is.na(iData5[1,2] == TRUE)){
          if(is.na(iData5[x,2]) == TRUE){
            episode = rep(NA,old)
          }
          if(iData5[x,2] %in% 0 == TRUE){
            episode = rep(1,old)
          }
        }
      }
      
      # if(all(iData %in% c(0,NA) == TRUE)){
      #   episode = rep(NA,old)
      # }
      
      episodelist.x2[[x]] = episode
    }
    
    episode = unlist(episodelist.x2)
    
    
    # Create data.frame
    iData6 = data.frame(id,time,period,episode)
    
    # Store in list
    data.list.all[[i]] = iData6
  }
  
  # Put the list elements into a dataframe
  data.all = do.call(rbind, data.list.all)
  
  # Return a dataframe
  return(data.all)
}

# Use new function
  wait36i <- survmlmds(wait36h,"ang_r")
  
# Combine new data in with existing data
  wait36j <- merge(wait36h,wait36i,by=c("id","time"),all=TRUE)
  
# Sort
  wait36j <- wait36j[order(wait36j$id,wait36j$time),]

Create status, status.time, and time2event1 variables for single-episode models (See Parts 2 and 3)

status (pertains to only episode 1; see Part 2):

wait36j <- ddply(wait36j,c("id"),mutate, status = ifelse(all(ang_r %in% 0 == TRUE),0,1)) 
wait36j$status <- ifelse(wait36j$episode %in% 1, wait36j$status, NA)

status.time (pertains to only episode 1, see Part 2):

data.split = split(wait36j, f = wait36j$id)
statustime.list <- list()

for(i in 1:length(data.split)){
  iData = data.split[[i]]

  iData$status2 = recode(iData$status,"0=0;1=1;NA=3")
  iData2 = rle(iData$status2)
  iData3 = data.frame(matrix(unlist(iData2),ncol=2,nrow=length(iData2$lengths)))
  colnames(iData3) = c("lengths","value")
  
  statustime.list2 <- list()
  for(x in 1:length(iData3$value)){
    if(iData3[x,2] %in% 0 == TRUE){
      status_length = iData3[x,1]
      status_time = rep(0, status_length)
    }
    if(iData3[x,2] %in% 1 == TRUE){
      status_length = iData3[x,1]
      status_time = c(rep(0,status_length-1),1)
    }
    if(iData3[x,2] %in% 3 == TRUE){
      status_length = iData3[x,1]
      status_time = rep(NA, status_length)
    }
    statustime.list2[[x]] = status_time
  }
  status.time = unlist(statustime.list2)
  
  iData$status_time = status.time
    
  statustime.list[[i]] = iData
  }

wait36k <- data.frame(do.call(rbind,statustime.list))

event_time:

wait36l <- ddply(wait36k, c("id","episode"),mutate, event_time = max(period))
wait36l$event_time <- ifelse(is.na(wait36l$episode),NA,wait36l$event_time)

Create start, stop, and status.time variables (see Part 2 and Part 5)

wait36l$start <- wait36l$period -1
wait36l$stop <- wait36l$period 

Create new anger variable where anger = NA if period & episode = NA

wait36l$ang_r2 <- ifelse(is.na(wait36l$episode),NA,wait36l$ang_r)

Rename variables

wait36m <- rename(wait36l, c("time"="second", "ang_r" = "anger", "biddemand_r"="bids", 
                             "distractf_kid" = "distraction", 
                              "ang_r2" = "ang_event"))

Subset the data to remove unneeded variables:

wait36n <- wait36m[,c("id","second","anger", "bids", "distraction", "timeOriginal",  "period",
                      "episode", "status", "status2","status_time", "event_time", 
                       "start", "stop", "ang_event")]

Save:

write.csv(wait36n,"wait36_Anger_Survival Analysis Formatted Data.csv",row.names=FALSE)

Subsetting

Set working directory and read in data

setwd("~/Desktop/Data/Survival analysis")

wait36 <- read.csv("wait36_Anger_Survival Analysis Formatted Data.csv",header=TRUE)
names(wait36)
##  [1] "id"           "second"       "anger"        "bids"        
##  [5] "distraction"  "timeOriginal" "period"       "episode"     
##  [9] "status"       "status2"      "status_time"  "event_time"  
## [13] "start"        "stop"         "ang_event"

Merge in temperament data for use as a time-invariant predictor:

# Read in temperament data from SPSS file:
temp.df <- read.spss("~/Desktop/Data/Survival analysis/Temperament Over Time.sav", to.data.frame=TRUE)

# Merge negative affectivity score at 30 months in with wait36
wait36 <-merge(x = wait36, y = temp.df[ , c("id", "namn30")], by = "id", all.x=TRUE)

# Rename ID variable in temp.df so it can be merged with data
names(temp.df)[names(temp.df) == 'ID'] <- 'id'

# Rename the negative affectivity variable for clarity
names(wait36)[names(wait36)=="namn30"] <- "negative_affectivity30"

Subsetting for Single-Episode Model wtih Time-Invariant Predictors (see Part 2)

The original data structure has been re-organized into a data set with one record per ID. Event Time indicates number of seconds until the first anger expression. Negative affectivity is a time-invariant predictor.

wait36a <- wait36[which(wait36$episode == 1 & wait36$period == 1),
                  c("id","status","event_time","negative_affectivity30")]

head(wait36a)
##      id status event_time negative_affectivity30
## 1     1      1         39               2.793750
## 481   2      1        203               3.614583
## 961   3      1         80               4.347222
## 1441  5      1         10               3.745960
## 1921  6      1         14               3.933523
## 2401  8      1         89               3.043750

Subsetting for Single-Episode Model with Time-Varying Predictors (see Part 3)

One difference between the original data structure and the structure for this model is that the data structure has one row of data per participant per second, up until the first anger event (or right censoring), whereas the original data structure has one row of data per participant per second, up until the end of the observation. Note that right-censored cases in the data would have 480 rows of data because this child’s data are right-censored. Otherwise, all non-right-censored participants will have a number of rows that is equal to the time the first anger expression occurred. In the data, the second-by-second Anger Event, Bids, and Distraction variables are the same as the original data structure. However, two new variables are inserted, Start and Stop, that indicate the beginning and ending times of intervals, represented by each row of data. These variables are used in the definition of the outcome variable in the survival package.

wait36b <- wait36[which(wait36$episode==1),c("id","second","ang_event","bids","distraction","status_time",
                                             "start","stop")]

head(wait36b)
##   id second ang_event bids distraction status_time start stop
## 1  1      1         0    0           0           0     0    1
## 2  1      2         0    0           0           0     1    2
## 3  1      3         0    0           0           0     2    3
## 4  1      4         0    0           0           0     3    4
## 5  1      5         0    0           0           0     4    5
## 6  1      6         0    0           0           0     5    6

Subsetting for Recurring-Episode Model with Time-Invariant Predictors (see Part 4)

The data structure for this model requires some restructuring of the data. No transformations are applied to Anger Event. Negative Affectivity is a time-invariant predictor. Two new variables are calculated, Episode and Period (Mills, 2011; Willett & Singer, 1995). Episodes refer to anger expressions, and Episode 1 begins at the first second of the wait task in which anger is not occurring. Subsequent episodes begin once the prior anger event has ended, as individuals cannot be at risk for an event while it is occurring. Each episode ends at the next second during which anger has occurred. Period contains the amount of time within each episode for each row of data (one second per row), restarting at 1 once a new episode begins.

We will use the Episode and Period variables to reformat the data such that each episode has one row of data, with the event_time variable indicating the duration of the episode.

#Summarize to get the maximum value of Period for each episode for each child. The maximum value of Period within each episode indicates the duration of that episode.
df2 <- ddply(wait36, c("id", "episode"), summarize,
                     period = max(period, na.rm= TRUE))

df2$period <-ifelse(df2$period <0, NA, df2$period)

# Now merge that data frame in with the other variables we are working with.
df3 <- merge(df2, wait36, by=c("id","period", "episode"), all.x = TRUE)

# Remove unneeded rows (only need final rows of each child's episode) 
df4 <- df3[which((!is.na(df3$period) &!is.na(df3$episode))),]

#Sort the data by id
df4 <- df4[order(df4$id, df4$episode),]

Now subset to keep only the variables we need for this model:

wait36c <- df4[,c("id","ang_event","episode", "event_time", "negative_affectivity30")]

head(wait36c)
##   id ang_event episode event_time negative_affectivity30
## 5  1         1       1         39                2.79375
## 1  1         1       2         15                2.79375
## 6  1         1       3         39                2.79375
## 3  1         1       4          3                2.79375
## 4  1         1       5        346                2.79375
## 8  1         1       6          8                2.79375

Subsetting for Recurring-Episode Model with Time-Varying Predictors (Model 4, see Part 5)

This data contains one row per each second of the task, for the entire task (in contrast to the Model 3 data, which contains data for each second up until the first anger episode or time of right-censoring).

wait36d <- wait36[,c("id","second","ang_event","bids", "distraction", "episode","period", "start","stop")]

head(wait36d)
##   id second ang_event bids distraction episode period start stop
## 1  1      1         0    0           0       1      1     0    1
## 2  1      2         0    0           0       1      2     1    2
## 3  1      3         0    0           0       1      3     2    3
## 4  1      4         0    0           0       1      4     3    4
## 5  1      5         0    0           0       1      5     4    5
## 6  1      6         0    0           0       1      6     5    6

Save:

Then save your files to use in data analysis.