0.1 Survival Analysis Basics

Our usual example data set does not specifically have an event time configuration. So, we will do a bit of acrobatics to make an example from it.

We will use survival analysis to examine the time until children reach a particular threshold on their WISC verbal scores and whether mother’s graduation status is associated with the time to the score attainment event.

We will run two different kinds of models: (1) Kaplan-Meier, and (2) Cox regression.

0.1.0.1 Load libraries and read in data.

#load libraries
library(GGally)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
library(psych)
library(reshape)
library(survival)

#set file path where data are stored (on the QuantDev website)
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/wisc3raw.csv"

#read in the .csv file using the url() function
wisc3raw <- read.csv(file=url(filepath),header=TRUE)
head(wisc3raw)
id verb1 verb2 verb4 verb6 perfo1 perfo2 perfo4 perfo6 info1 comp1 simu1 voca1 info6 comp6 simu6 voca6 momed grad constant
1 24.42 26.98 39.61 55.64 19.84 22.97 43.90 44.19 31.287 25.627 22.932 22.215 69.883 44.424 68.045 51.162 9.5 0 1
2 12.44 14.38 21.92 37.81 5.90 13.44 18.29 40.38 13.801 14.787 7.581 15.373 41.871 44.862 33.897 37.741 5.5 0 1
3 32.43 33.51 34.30 50.18 27.64 45.02 46.99 77.72 34.970 34.675 28.052 26.841 60.424 50.260 35.844 55.477 14.0 1 1
4 22.69 28.39 42.16 44.72 33.16 29.68 45.97 61.66 24.795 31.391 8.208 20.197 52.865 42.669 45.802 35.987 14.0 1 1
5 28.23 37.81 41.06 70.95 27.64 44.42 65.48 64.22 25.263 30.263 15.977 35.417 67.368 86.654 72.368 60.417 11.5 0 1
6 16.06 20.12 38.02 39.94 8.45 15.78 26.99 39.08 15.402 23.399 11.453 20.560 46.437 52.956 22.537 47.716 14.0 1 1

0.1.0.2 Subset to variables of interest.

#create list of variables of interest
varnames <- c("id", "verb1", "verb2", "verb4", "verb6", "momed", "grad")

#sub-setting the columns of interest
wiscsub <- wisc3raw[ ,varnames]

#Looking at the sample-level descriptives for the chosen variables
describe(wiscsub)
vars n mean sd median trimmed mad min max range skew kurtosis se
id 1 204 102.5000000 59.0338886 102.500 102.5000000 75.612600 1.00 204.00 203.00 0.0000000 -1.2176609 4.1331989
verb1 2 204 19.5850490 5.8076950 19.335 19.4976829 5.411490 3.33 35.15 31.82 0.1301071 -0.0528376 0.4066200
verb2 3 204 25.4153431 6.1063772 25.980 25.4026220 6.567918 5.95 39.85 33.90 -0.0639307 -0.3445494 0.4275319
verb4 4 204 32.6077451 7.3198841 32.820 32.4240244 7.183197 12.60 52.84 40.24 0.2317998 -0.0776524 0.5124944
verb6 5 204 43.7499020 10.6650511 42.545 43.4560976 11.297412 17.35 72.59 55.24 0.2356459 -0.3605210 0.7467029
momed 6 204 10.8112745 2.6982790 11.500 10.9969512 2.965200 5.50 18.00 12.50 -0.3586143 0.0056095 0.1889173
grad 7 204 0.2254902 0.4189328 0.000 0.1585366 0.000000 0.00 1.00 1.00 1.3040954 -0.3007374 0.0293312

0.1.1 Make new variables

0.1.1.1 Make binary variables that represent surpassing the WISC verbal threshold at each time point.

wiscsub$verbhigh1 <- ifelse(wiscsub$verb1 > 35, 1, 0)
wiscsub$verbhigh2 <- ifelse(wiscsub$verb2 > 35, 1, 0)
wiscsub$verbhigh4 <- ifelse(wiscsub$verb4 > 35, 1, 0)
wiscsub$verbhigh6 <- ifelse(wiscsub$verb6 > 35, 1, 0)

0.1.1.2 Make an event time variable.

wiscsub$eventtime <- 6 
  #in this example also serves as default value for censored individuals
  #designating default about when study ended 

wiscsub$eventtime <-ifelse(wiscsub$verbhigh6 == 1, 6, wiscsub$eventtime)
wiscsub$eventtime <-ifelse(wiscsub$verbhigh4 == 1, 4, wiscsub$eventtime)
wiscsub$eventtime <-ifelse(wiscsub$verbhigh2 == 1, 2, wiscsub$eventtime)
wiscsub$eventtime <-ifelse(wiscsub$verbhigh1 == 1, 1, wiscsub$eventtime)
  #designating when event occured, if it occur during study time

0.1.1.3 Add jitter (i.e., noise) to the “eventtime” variable.

Survival models often do not handle ties in the timing of events well. Since ties are prevalent in our data set, we add a bit of noise to the “eventtime” variable so there are fewer ties. This is not how you should deal with ties in your own data! We are only doing this here for the sake of demonstration.

#add jitter
wiscsub$eventtime <- jitter(wiscsub$eventtime, amount = 0.5)

#if "eventtime" is greater than 6, set to 6
wiscsub$eventtime[wiscsub$eventtime > 6] <- 6

0.1.1.4 Make a count variable.

wiscsub$verbhigh_count <- (wiscsub$verbhigh1 + wiscsub$verbhigh2 + wiscsub$verbhigh4 + wiscsub$verbhigh6)

0.1.1.5 Make an event variable.

wiscsub$highverbevent <- ifelse(wiscsub$verbhigh_count >= 1, 1, 0)
  # = 1 if eventhappened
  # = 0 if censored = event never happened

head(wiscsub)
id verb1 verb2 verb4 verb6 momed grad verbhigh1 verbhigh2 verbhigh4 verbhigh6 eventtime verbhigh_count highverbevent
1 24.42 26.98 39.61 55.64 9.5 0 0 0 1 1 4.456214 2 1
2 12.44 14.38 21.92 37.81 5.5 0 0 0 0 1 6.000000 1 1
3 32.43 33.51 34.30 50.18 14.0 1 0 0 0 1 6.000000 1 1
4 22.69 28.39 42.16 44.72 14.0 1 0 0 1 1 3.768331 2 1
5 28.23 37.81 41.06 70.95 11.5 0 0 1 1 1 1.607514 3 1
6 16.06 20.12 38.02 39.94 14.0 1 0 0 1 1 3.950754 2 1

0.1.1.6 Examine variables.

describe(wiscsub)
vars n mean sd median trimmed mad min max range skew kurtosis se
id 1 204 102.5000000 59.0338886 102.500000 102.5000000 75.6126000 1.000000 204.00 203.000000 0.0000000 -1.2176609 4.1331989
verb1 2 204 19.5850490 5.8076950 19.335000 19.4976829 5.4114900 3.330000 35.15 31.820000 0.1301071 -0.0528376 0.4066200
verb2 3 204 25.4153431 6.1063772 25.980000 25.4026220 6.5679180 5.950000 39.85 33.900000 -0.0639307 -0.3445494 0.4275319
verb4 4 204 32.6077451 7.3198841 32.820000 32.4240244 7.1831970 12.600000 52.84 40.240000 0.2317998 -0.0776524 0.5124944
verb6 5 204 43.7499020 10.6650511 42.545000 43.4560976 11.2974120 17.350000 72.59 55.240000 0.2356459 -0.3605210 0.7467029
momed 6 204 10.8112745 2.6982790 11.500000 10.9969512 2.9652000 5.500000 18.00 12.500000 -0.3586143 0.0056095 0.1889173
grad 7 204 0.2254902 0.4189328 0.000000 0.1585366 0.0000000 0.000000 1.00 1.000000 1.3040954 -0.3007374 0.0293312
verbhigh1 8 204 0.0049020 0.0700140 0.000000 0.0000000 0.0000000 0.000000 1.00 1.000000 14.0735013 197.0293397 0.0049020
verbhigh2 9 204 0.0588235 0.2358729 0.000000 0.0000000 0.0000000 0.000000 1.00 1.000000 3.7224603 11.9151904 0.0165144
verbhigh4 10 204 0.3725490 0.4846728 0.000000 0.3414634 0.0000000 0.000000 1.00 1.000000 0.5233478 -1.7345378 0.0339339
verbhigh6 11 204 0.7745098 0.4189328 1.000000 0.8414634 0.0000000 0.000000 1.00 1.000000 -1.3040954 -0.3007374 0.0293312
eventtime 12 204 5.0384720 1.2066264 5.728479 5.2300132 0.4025577 1.463386 6.00 4.536613 -1.1318924 0.3356640 0.0844807
verbhigh_count 13 204 1.2107843 0.8651327 1.000000 1.1829268 1.4826000 0.000000 3.00 3.000000 0.1742124 -0.7537878 0.0605714
highverbevent 14 204 0.7745098 0.4189328 1.000000 0.8414634 0.0000000 0.000000 1.00 1.000000 -1.3040954 -0.3007374 0.0293312

0.1.2 Survival Analysis

Using the survival package.

Main functions:

  • Surv() defines a survival object
  • coxph() runs a cox proportional hazards regression
  • survfit() fits a survival curve to a model or formula

Using person-level (single-record per person) data - 2 things have been coded in the data:

  1. if an event occurred or not (highverbevent)
  2. time: when the event occurred or when observation ended (eventtime)

0.1.2.1 First, create a “survival object” Surv().

For right-censored data, only two arguments are needed in the Surv() function: a vector of times and a vector indicating which times are observed and censored.

0.1.2.2 Create the response variable.

mySobject <- Surv(time = wiscsub$eventtime,
                  event = wiscsub$highverbevent)

mySobject
##   [1] 4.456214  6.000000  6.000000  3.768331  1.607514  3.950754  5.514678 
##   [8] 5.612551+ 3.770731  4.207384  3.992383  6.000000+ 6.000000+ 6.000000 
##  [15] 5.771520+ 6.000000  6.000000  6.000000+ 5.602294  4.290411  4.413303 
##  [22] 6.000000  5.863059+ 5.761163+ 5.772660+ 1.913098  3.519280  4.183018 
##  [29] 4.187607  6.000000  4.124705  2.472809  4.154618  3.652265  6.000000+
##  [36] 6.000000+ 3.680174  6.000000  6.000000+ 1.718252  5.818743+ 5.830550 
##  [43] 6.000000  5.592594  5.633395  3.532866  5.666562  1.463387  5.757690 
##  [50] 5.883802+ 3.808264  5.751254  5.741386  5.852333  4.279612  5.736334+
##  [57] 6.000000  2.453186  6.000000+ 1.721923  6.000000  5.932392+ 3.576237 
##  [64] 4.094792  6.000000  6.000000  6.000000+ 5.822032  3.748918  6.000000+
##  [71] 6.000000  6.000000  6.000000+ 6.000000+ 6.000000+ 6.000000  4.425048 
##  [78] 4.360613  6.000000  1.988180  6.000000+ 4.191416  6.000000  6.000000 
##  [85] 6.000000  5.955880  5.913988+ 6.000000  5.697349  5.558278  5.971480+
##  [92] 4.422032  4.372844  6.000000  3.619857  6.000000  4.294982  3.746087 
##  [99] 6.000000  5.732306  6.000000+ 3.913196  3.702228  5.901461+ 6.000000+
## [106] 5.664813+ 6.000000  5.510698  5.747324+ 5.636933  5.655297  5.963338+
## [113] 4.476535  3.970245  1.840785  5.642326  3.650001  4.378837  3.971115 
## [120] 6.000000  6.000000+ 6.000000  2.218591  4.271677  3.656232  4.307223 
## [127] 5.849864  6.000000  3.971738  5.744210+ 3.500200  3.844001  3.562141 
## [134] 3.948425  5.952562  5.869042  5.704426  5.704363  6.000000  1.792095 
## [141] 5.761149  6.000000+ 4.315433  6.000000+ 6.000000  6.000000  1.940024 
## [148] 3.587564  5.934726+ 5.516001  5.997754  5.888190  5.528066  3.512439 
## [155] 5.714072  6.000000+ 6.000000+ 5.583919+ 6.000000+ 3.573541  6.000000 
## [162] 6.000000+ 3.997040  5.828489  5.624771  5.825315+ 6.000000  6.000000+
## [169] 6.000000  3.794147  3.832796  6.000000  5.784048  6.000000  3.619753 
## [176] 6.000000+ 5.842835+ 6.000000  5.711267  4.404968  6.000000  4.474616 
## [183] 5.527636  3.544864  5.724652+ 6.000000  4.029932  5.699233  3.792539 
## [190] 2.250954  5.683022  3.835964  5.831228  5.749436  6.000000  5.840779 
## [197] 5.752446+ 5.951822  4.055302  4.434701  4.024975  5.645427  3.744717 
## [204] 3.704454
# note that the + indicates thos observations that are right censored

0.1.2.3 Kaplan-Meier estimates

The Kaplan-Meier estimate is a nonparametric maximum likelihood estimate (MLE) of the survival function, S(t). This estimate is a step function with jumps at observed event times, t_i.

kmsurvival <- survfit(mySobject ~ 1,
                  conf.int = 0.95, conf.type = "log") 
#conf.type specifies the transformation used for calculating the confidence interval (="log" by default) 
kmsurvival
## Call: survfit(formula = mySobject ~ 1, conf.int = 0.95, conf.type = "log")
## 
##       n  events  median 0.95LCL 0.95UCL 
##  204.00  158.00    5.75    5.65    5.95
summary(kmsurvival)
## Call: survfit(formula = mySobject ~ 1, conf.int = 0.95, conf.type = "log")
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##  1.46    204       1    0.995 0.00489        0.986        1.000
##  1.61    203       1    0.990 0.00690        0.977        1.000
##  1.72    202       1    0.985 0.00843        0.969        1.000
##  1.72    201       1    0.980 0.00971        0.962        1.000
##  1.79    200       1    0.975 0.01083        0.955        0.997
##  1.84    199       1    0.971 0.01183        0.948        0.994
##  1.91    198       1    0.966 0.01274        0.941        0.991
##  1.94    197       1    0.961 0.01359        0.935        0.988
##  1.99    196       1    0.956 0.01438        0.928        0.984
##  2.22    195       1    0.951 0.01512        0.922        0.981
##  2.25    194       1    0.946 0.01581        0.916        0.978
##  2.45    193       1    0.941 0.01647        0.909        0.974
##  2.47    192       1    0.936 0.01710        0.903        0.970
##  3.50    191       1    0.931 0.01770        0.897        0.967
##  3.51    190       1    0.926 0.01827        0.891        0.963
##  3.52    189       1    0.922 0.01882        0.885        0.959
##  3.53    188       1    0.917 0.01935        0.880        0.955
##  3.54    187       1    0.912 0.01986        0.874        0.952
##  3.56    186       1    0.907 0.02035        0.868        0.948
##  3.57    185       1    0.902 0.02082        0.862        0.944
##  3.58    184       1    0.897 0.02128        0.856        0.940
##  3.59    183       1    0.892 0.02172        0.851        0.936
##  3.62    182       1    0.887 0.02214        0.845        0.932
##  3.62    181       1    0.882 0.02256        0.839        0.928
##  3.65    180       1    0.877 0.02296        0.834        0.924
##  3.65    179       1    0.873 0.02335        0.828        0.920
##  3.66    178       1    0.868 0.02373        0.822        0.915
##  3.68    177       1    0.863 0.02409        0.817        0.911
##  3.70    176       1    0.858 0.02445        0.811        0.907
##  3.70    175       1    0.853 0.02480        0.806        0.903
##  3.74    174       1    0.848 0.02513        0.800        0.899
##  3.75    173       1    0.843 0.02546        0.795        0.895
##  3.75    172       1    0.838 0.02578        0.789        0.890
##  3.77    171       1    0.833 0.02609        0.784        0.886
##  3.77    170       1    0.828 0.02640        0.778        0.882
##  3.79    169       1    0.824 0.02669        0.773        0.878
##  3.79    168       1    0.819 0.02698        0.767        0.873
##  3.81    167       1    0.814 0.02726        0.762        0.869
##  3.83    166       1    0.809 0.02753        0.757        0.865
##  3.84    165       1    0.804 0.02780        0.751        0.860
##  3.84    164       1    0.799 0.02806        0.746        0.856
##  3.91    163       1    0.794 0.02831        0.741        0.852
##  3.95    162       1    0.789 0.02856        0.735        0.847
##  3.95    161       1    0.784 0.02880        0.730        0.843
##  3.97    160       1    0.779 0.02903        0.725        0.838
##  3.97    159       1    0.775 0.02926        0.719        0.834
##  3.97    158       1    0.770 0.02948        0.714        0.830
##  3.99    157       1    0.765 0.02970        0.709        0.825
##  4.00    156       1    0.760 0.02991        0.703        0.821
##  4.02    155       1    0.755 0.03012        0.698        0.816
##  4.03    154       1    0.750 0.03032        0.693        0.812
##  4.06    153       1    0.745 0.03051        0.688        0.807
##  4.09    152       1    0.740 0.03070        0.682        0.803
##  4.12    151       1    0.735 0.03089        0.677        0.798
##  4.15    150       1    0.730 0.03107        0.672        0.794
##  4.18    149       1    0.725 0.03124        0.667        0.789
##  4.19    148       1    0.721 0.03142        0.662        0.785
##  4.19    147       1    0.716 0.03158        0.656        0.780
##  4.21    146       1    0.711 0.03174        0.651        0.776
##  4.27    145       1    0.706 0.03190        0.646        0.771
##  4.28    144       1    0.701 0.03205        0.641        0.767
##  4.29    143       1    0.696 0.03220        0.636        0.762
##  4.29    142       1    0.691 0.03235        0.631        0.758
##  4.31    141       1    0.686 0.03249        0.625        0.753
##  4.32    140       1    0.681 0.03262        0.620        0.748
##  4.36    139       1    0.676 0.03275        0.615        0.744
##  4.37    138       1    0.672 0.03288        0.610        0.739
##  4.38    137       1    0.667 0.03300        0.605        0.735
##  4.40    136       1    0.662 0.03312        0.600        0.730
##  4.41    135       1    0.657 0.03324        0.595        0.725
##  4.42    134       1    0.652 0.03335        0.590        0.721
##  4.43    133       1    0.647 0.03346        0.585        0.716
##  4.43    132       1    0.642 0.03356        0.580        0.711
##  4.46    131       1    0.637 0.03366        0.575        0.707
##  4.47    130       1    0.632 0.03376        0.570        0.702
##  4.48    129       1    0.627 0.03385        0.564        0.697
##  5.51    128       1    0.623 0.03394        0.559        0.693
##  5.51    127       1    0.618 0.03402        0.554        0.688
##  5.52    126       1    0.613 0.03411        0.549        0.683
##  5.53    125       1    0.608 0.03418        0.544        0.679
##  5.53    124       1    0.603 0.03426        0.539        0.674
##  5.56    123       1    0.598 0.03433        0.534        0.669
##  5.59    121       1    0.593 0.03440        0.529        0.664
##  5.60    120       1    0.588 0.03446        0.524        0.660
##  5.62    118       1    0.583 0.03453        0.519        0.655
##  5.63    117       1    0.578 0.03459        0.514        0.650
##  5.64    116       1    0.573 0.03465        0.509        0.645
##  5.64    115       1    0.568 0.03471        0.504        0.640
##  5.65    114       1    0.563 0.03476        0.499        0.636
##  5.66    113       1    0.558 0.03481        0.494        0.631
##  5.67    111       1    0.553 0.03486        0.489        0.626
##  5.68    110       1    0.548 0.03490        0.484        0.621
##  5.70    109       1    0.543 0.03494        0.479        0.616
##  5.70    108       1    0.538 0.03498        0.474        0.611
##  5.70    107       1    0.533 0.03501        0.469        0.606
##  5.70    106       1    0.528 0.03504        0.464        0.601
##  5.71    105       1    0.523 0.03506        0.459        0.596
##  5.71    104       1    0.518 0.03509        0.454        0.592
##  5.73    102       1    0.513 0.03511        0.449        0.587
##  5.74    100       1    0.508 0.03513        0.443        0.582
##  5.75     97       1    0.503 0.03515        0.438        0.576
##  5.75     96       1    0.497 0.03518        0.433        0.571
##  5.76     94       1    0.492 0.03520        0.428        0.566
##  5.76     93       1    0.487 0.03521        0.422        0.561
##  5.78     89       1    0.481 0.03524        0.417        0.556
##  5.82     87       1    0.476 0.03527        0.411        0.550
##  5.83     85       1    0.470 0.03529        0.406        0.545
##  5.83     84       1    0.465 0.03531        0.400        0.539
##  5.83     83       1    0.459 0.03533        0.395        0.534
##  5.84     82       1    0.453 0.03534        0.389        0.528
##  5.85     80       1    0.448 0.03535        0.384        0.523
##  5.85     79       1    0.442 0.03535        0.378        0.517
##  5.87     77       1    0.436 0.03536        0.372        0.511
##  5.89     75       1    0.430 0.03536        0.366        0.506
##  5.95     70       1    0.424 0.03539        0.360        0.500
##  5.95     69       1    0.418 0.03540        0.354        0.494
##  5.96     68       1    0.412 0.03541        0.348        0.488
##  6.00     65       1    0.406 0.03543        0.342        0.481
##  6.00     64      40    0.152 0.02792        0.106        0.218
0.1.2.3.1 To get specific information out of the Kaplan-Meier model.
summary(kmsurvival)$surv    # returns the Kaplan-Meier estimate at each t_i
summary(kmsurvival)$time    # {t_i}
summary(kmsurvival)$n.risk  # {Y_i}
summary(kmsurvival)$n.event # {d_i}
summary(kmsurvival)$std.err # standard error of the K-M estimate at {t_i}
summary(kmsurvival)$lower   # lower pointwise estimates (alternatively, $upper)
0.1.2.3.2 Plot the Kaplan-Meier Survival Curve.
plot(kmsurvival, 
     xlab="Time", 
     ylab="Survival Probability", 
     main="Kaplan-Meier Survival Curve")

0.1.2.3.3 Plot the cumulative hazard.
plot(kmsurvival, fun="cumhaz", 
     xlab="Time", 
     ylab="Cumulative Hazard")

0.1.2.3.4 To calculate the hazard at each time.
H_hat <- -log(kmsurvival$surv)
H_hat
##   [1] 0.004914015 0.009852296 0.014815086 0.019802627 0.024815169
##   [6] 0.029852963 0.034916265 0.040005335 0.045120435 0.050261835
##  [11] 0.055429805 0.060624622 0.065846566 0.071095922 0.076372979
##  [16] 0.081678031 0.087011377 0.092373320 0.097764169 0.103184236
##  [21] 0.108633841 0.114113307 0.119622963 0.125163143 0.130734188
##  [26] 0.136336444 0.141970261 0.147635999 0.153334020 0.159064695
##  [31] 0.164828399 0.170625517 0.176456437 0.182321557 0.188221279
##  [36] 0.194156014 0.200126181 0.206132205 0.212174520 0.218253566
##  [41] 0.224369793 0.230523659 0.236715629 0.242946179 0.249215792
##  [46] 0.255524961 0.261874188 0.268263987 0.274694877 0.281167391
##  [51] 0.287682072 0.294239473 0.300840157 0.307484700 0.314173688
##  [56] 0.320907720 0.327687407 0.334513372 0.341386251 0.348306694
##  [61] 0.355275364 0.362292936 0.369360103 0.376477571 0.383646061
##  [66] 0.390866309 0.398139068 0.405465108 0.412845215 0.420280194
##  [71] 0.427770866 0.435318071 0.442922671 0.450585543 0.458307589
##  [76] 0.466089730 0.473932907 0.481838087 0.489806257 0.497838428
##  [81] 0.505935638 0.514098949 0.514098949 0.522397752 0.530766002
##  [86] 0.530766002 0.539276691 0.547860435 0.556518498 0.565252178
##  [91] 0.574062807 0.582951755 0.582951755 0.592001590 0.601134074
##  [96] 0.610350729 0.619653122 0.629042862 0.638521606 0.648091057
## [101] 0.657752968 0.657752968 0.667605264 0.667605264 0.677655600
## [106] 0.677655600 0.677655600 0.688018387 0.698489687 0.698489687
## [111] 0.709184976 0.719995892 0.719995892 0.719995892 0.719995892
## [116] 0.731295448 0.731295448 0.742856270 0.742856270 0.754690728
## [121] 0.766666919 0.778788279 0.791058372 0.791058372 0.803637154
## [126] 0.816376180 0.816376180 0.829448261 0.829448261 0.842871282
## [131] 0.842871282 0.842871282 0.842871282 0.842871282 0.857260019
## [136] 0.871858818 0.886673904 0.886673904 0.886673904 0.902178091
## [141] 1.883007344

0.1.2.4 Let’s look at the estimated survivals of two groups.

km2group <- survfit(mySobject ~ wiscsub$grad)
km2group
## Call: survfit(formula = mySobject ~ wiscsub$grad)
## 
##                  n events median 0.95LCL 0.95UCL
## wiscsub$grad=0 158    113   5.85    5.73    6.00
## wiscsub$grad=1  46     45   4.25    3.95    5.71
summary(km2group)
## Call: survfit(formula = mySobject ~ wiscsub$grad)
## 
##                 wiscsub$grad=0 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##  1.46    158       1    0.994 0.00631        0.981        1.000
##  1.61    157       1    0.987 0.00889        0.970        1.000
##  1.72    156       1    0.981 0.01086        0.960        1.000
##  1.79    155       1    0.975 0.01250        0.950        0.999
##  1.84    154       1    0.968 0.01393        0.941        0.996
##  1.91    153       1    0.962 0.01521        0.933        0.992
##  1.99    152       1    0.956 0.01637        0.924        0.988
##  2.25    151       1    0.949 0.01744        0.916        0.984
##  2.45    150       1    0.943 0.01844        0.908        0.980
##  2.47    149       1    0.937 0.01937        0.900        0.975
##  3.53    148       1    0.930 0.02025        0.892        0.971
##  3.59    147       1    0.924 0.02108        0.884        0.966
##  3.62    146       1    0.918 0.02186        0.876        0.962
##  3.65    145       1    0.911 0.02261        0.868        0.957
##  3.65    144       1    0.905 0.02332        0.860        0.952
##  3.66    143       1    0.899 0.02400        0.853        0.947
##  3.68    142       1    0.892 0.02465        0.845        0.942
##  3.74    141       1    0.886 0.02528        0.838        0.937
##  3.75    140       1    0.880 0.02588        0.830        0.932
##  3.77    139       1    0.873 0.02645        0.823        0.927
##  3.79    138       1    0.867 0.02701        0.816        0.922
##  3.81    137       1    0.861 0.02754        0.808        0.916
##  3.83    136       1    0.854 0.02806        0.801        0.911
##  3.84    135       1    0.848 0.02855        0.794        0.906
##  3.84    134       1    0.842 0.02903        0.787        0.901
##  3.91    133       1    0.835 0.02950        0.780        0.895
##  3.97    132       1    0.829 0.02995        0.772        0.890
##  3.97    131       1    0.823 0.03038        0.765        0.885
##  3.99    130       1    0.816 0.03080        0.758        0.879
##  4.00    129       1    0.810 0.03120        0.751        0.874
##  4.02    128       1    0.804 0.03159        0.744        0.868
##  4.09    127       1    0.797 0.03197        0.737        0.863
##  4.12    126       1    0.791 0.03234        0.730        0.857
##  4.15    125       1    0.785 0.03269        0.723        0.852
##  4.19    124       1    0.778 0.03304        0.716        0.846
##  4.21    123       1    0.772 0.03337        0.709        0.840
##  4.27    122       1    0.766 0.03369        0.703        0.835
##  4.28    121       1    0.759 0.03400        0.696        0.829
##  4.29    120       1    0.753 0.03430        0.689        0.823
##  4.29    119       1    0.747 0.03459        0.682        0.818
##  4.36    118       1    0.741 0.03487        0.675        0.812
##  4.37    117       1    0.734 0.03515        0.668        0.806
##  4.38    116       1    0.728 0.03541        0.662        0.801
##  4.40    115       1    0.722 0.03566        0.655        0.795
##  4.41    114       1    0.715 0.03591        0.648        0.789
##  4.46    113       1    0.709 0.03614        0.641        0.783
##  4.48    112       1    0.703 0.03637        0.635        0.778
##  5.51    111       1    0.696 0.03659        0.628        0.772
##  5.51    110       1    0.690 0.03680        0.621        0.766
##  5.52    109       1    0.684 0.03700        0.615        0.760
##  5.53    108       1    0.677 0.03720        0.608        0.754
##  5.53    107       1    0.671 0.03738        0.601        0.748
##  5.56    106       1    0.665 0.03756        0.595        0.742
##  5.59    104       1    0.658 0.03774        0.588        0.736
##  5.60    103       1    0.652 0.03791        0.582        0.730
##  5.62    101       1    0.645 0.03808        0.575        0.724
##  5.63    100       1    0.639 0.03824        0.568        0.718
##  5.64     99       1    0.632 0.03840        0.561        0.712
##  5.64     98       1    0.626 0.03854        0.555        0.706
##  5.65     97       1    0.620 0.03868        0.548        0.700
##  5.66     96       1    0.613 0.03881        0.542        0.694
##  5.67     94       1    0.607 0.03895        0.535        0.688
##  5.70     93       1    0.600 0.03907        0.528        0.682
##  5.70     92       1    0.593 0.03919        0.521        0.675
##  5.70     91       1    0.587 0.03929        0.515        0.669
##  5.70     90       1    0.580 0.03939        0.508        0.663
##  5.71     89       1    0.574 0.03949        0.502        0.657
##  5.73     87       1    0.567 0.03958        0.495        0.650
##  5.74     85       1    0.561 0.03967        0.488        0.644
##  5.75     82       1    0.554 0.03978        0.481        0.638
##  5.75     81       1    0.547 0.03987        0.474        0.631
##  5.76     79       1    0.540 0.03996        0.467        0.624
##  5.82     74       1    0.533 0.04008        0.460        0.617
##  5.83     72       1    0.525 0.04020        0.452        0.610
##  5.83     71       1    0.518 0.04031        0.445        0.603
##  5.83     70       1    0.511 0.04041        0.437        0.596
##  5.84     69       1    0.503 0.04049        0.430        0.589
##  5.85     67       1    0.496 0.04058        0.422        0.582
##  5.85     66       1    0.488 0.04065        0.415        0.575
##  5.87     64       1    0.481 0.04073        0.407        0.567
##  5.95     59       1    0.472 0.04084        0.399        0.560
##  5.95     58       1    0.464 0.04094        0.391        0.552
##  5.96     57       1    0.456 0.04103        0.382        0.544
##  6.00     54      30    0.203 0.03583        0.143        0.287
## 
##                 wiscsub$grad=1 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##  1.72     46       1    0.978  0.0215        0.937        1.000
##  1.94     45       1    0.957  0.0301        0.899        1.000
##  2.22     44       1    0.935  0.0364        0.866        1.000
##  3.50     43       1    0.913  0.0415        0.835        0.998
##  3.51     42       1    0.891  0.0459        0.806        0.986
##  3.52     41       1    0.870  0.0497        0.777        0.973
##  3.54     40       1    0.848  0.0530        0.750        0.958
##  3.56     39       1    0.826  0.0559        0.724        0.943
##  3.57     38       1    0.804  0.0585        0.698        0.928
##  3.58     37       1    0.783  0.0608        0.672        0.911
##  3.62     36       1    0.761  0.0629        0.647        0.895
##  3.70     35       1    0.739  0.0647        0.623        0.878
##  3.70     34       1    0.717  0.0664        0.598        0.860
##  3.75     33       1    0.696  0.0678        0.575        0.842
##  3.77     32       1    0.674  0.0691        0.551        0.824
##  3.79     31       1    0.652  0.0702        0.528        0.805
##  3.95     30       1    0.630  0.0712        0.505        0.787
##  3.95     29       1    0.609  0.0720        0.483        0.767
##  3.97     28       1    0.587  0.0726        0.461        0.748
##  4.03     27       1    0.565  0.0731        0.439        0.728
##  4.06     26       1    0.543  0.0734        0.417        0.708
##  4.18     25       1    0.522  0.0737        0.396        0.688
##  4.19     24       1    0.500  0.0737        0.375        0.668
##  4.31     23       1    0.478  0.0737        0.354        0.647
##  4.32     22       1    0.457  0.0734        0.333        0.626
##  4.42     21       1    0.435  0.0731        0.313        0.604
##  4.43     20       1    0.413  0.0726        0.293        0.583
##  4.43     19       1    0.391  0.0720        0.273        0.561
##  4.47     18       1    0.370  0.0712        0.253        0.539
##  5.68     17       1    0.348  0.0702        0.234        0.517
##  5.71     16       1    0.326  0.0691        0.215        0.494
##  5.76     15       1    0.304  0.0678        0.197        0.471
##  5.78     14       1    0.283  0.0664        0.178        0.448
##  5.89     13       1    0.261  0.0647        0.160        0.424
##  6.00     11       1    0.237  0.0631        0.141        0.399
##  6.00     10      10    0.000     NaN           NA           NA
ggsurv(km2group)

0.1.2.5 Let’s test the difference (using a log-rank test).

survdiff(mySobject ~ wiscsub$grad, rho=0) 
## Call:
## survdiff(formula = mySobject ~ wiscsub$grad, rho = 0)
## 
##                  N Observed Expected (O-E)^2/E (O-E)^2/V
## wiscsub$grad=0 158      113    131.3      2.55      17.8
## wiscsub$grad=1  46       45     26.7     12.56      17.8
## 
##  Chisq= 17.8  on 1 degrees of freedom, p= 2.43e-05
#where rho=0 is the log-rank or Mantel-Haenszel test
#Note: The null hypothesis is that h1(t) = h2(t) for all t
#in this case we reject the null (meaning survival curves are different)

0.1.2.6 Now let’s run Cox’s proportional hazards model.

\(logh(t) = a(t) +b_{1}x_{1} + b_{2}x_{2}\)

This is a semi-parametric model because the baseline hazard h0(t) can take any form, then, the covariates enter the model linearly.

0.1.2.6.1 Five Steps to Survival Analysis with Cox
  • Step 1: Model Selection
  • Step 2: Data Preparation
  • Step 3: Data Description
  • Step 4: Model Building, Estimation, and Assessment of Fit to Data
  • Step 5: Interpret and Present Model Results
0.1.2.6.2 Step 1: Model Selection

It is important to first determine the kind of model that will be used to answer research questions, as other steps (e.g., data preparation) will depend on what type of model will be analyzed. Specifically, researchers need to decide whether models will:

  • Be analyzed in discrete or continuous time
  • Include single or repeating occurrences of the dependent variable
  • Be analyzed with a non-parametric, semi-parametric, or parametric approach
  • Include predictors that are time invariant or time varying (or both)

We are interested in (1) how long it takes children to reach a certain threshold on their WISC verbal score, and (2) whether mothers’ graduation status is associated with children’s verbal scores. The data were collected every one or more years, so we will use a discrete time model. We will use a semi-parametric approach (Cox proportional hazards model), because we do not know the shape of the underlying distribution (which precludes the use of a parametric approach).

0.1.2.6.3 Step 2: Data Preparation

To recap - the data format for Model 1 need the following characteristics:

  • One row per individual
  • A Status variable containing 1s and 0s, which indicate whether or not the dependent variable (i.e., meeting threshold verbal score) occured for each participant within the observation period
  • A variable indicating the time until the dependent variable occured (eventtime)
  • Left-censored cases need to be examined and dealt with in some way–either by removing any left-censored cases, or by re-defining the start of the observation.
0.1.2.6.4 Step 3: Data Description

Now, we need to examine several characteristics of the data before we start modeling. Specifically, we need to examine:

  • How many cases in the data are right-censored.
  • How many ties (cases with exactly the same survival time) are present in the data. This is important if using continuous time models. A large number of ties could be problematic, but this is unlikely with observational data. If a large number of ties are present, the researcher could aggregate time into discrete values and use a discrete-time model to overcome the presence of many ties.
  • The median survival time. Standard descriptive statistics (mean, standard deviation) will not provide accurate information about survival analysis data because of censoring. The median survival time is used instead.
  • A plot of survival times to understand how survival times are distributed in the data.
  • Descriptive statistics for any predictors in the model. We will be looking at child behaviors (focused distraction and bids about the demand of the wait task).

Determine how much of sample has right censoring.

Some kids in these data are right-censored such that the period of observation expired before they met the verbal threshold.

length(wiscsub[ which(wiscsub$highverbevent == 0), ]$id)
## [1] 46

In this case, 46 of the 204 individuals in the sample have right censoring, which is within the acceptable range.

Calculate the median survival time.

summary(wiscsub$eventtime)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.463   4.049   5.728   5.038   6.000   6.000

Approximately half-way through 5th grade, half of the children reached the verbal threshold for the first time.

Plot survival times and ties.

Order by survival time and create an order variable:

wiscsub <- wiscsub[order(-wiscsub$eventtime, wiscsub$highverbevent),]
wiscsub$order <- c(1:length(wiscsub$id))

Create ordered survival time plot:

ggplot(data=wiscsub, aes(x = eventtime, y = order)) +
       geom_rect(xmin = 0, xmax = wiscsub$eventtime, ymin = wiscsub$order, ymax = wiscsub$order+1, colour="lightgray") +
       geom_rect(xmin = wiscsub$eventtime-0.25, xmax = wiscsub$eventtime, ymin = wiscsub$order, ymax = wiscsub$order+1, fill = factor(wiscsub$highverbevent+1)) +
       geom_vline(xintercept = 6,linetype="solid") +
       scale_x_continuous(breaks = seq(1, 6, 1)) +
       xlab("Survival Time (Grade in School)") + ylab("Order (reversed)") +
       ggtitle("WISC Verbal Score Survival Time") +
       theme(axis.text.y=element_blank(),
             axis.ticks.y=element_blank(),
             panel.background=element_blank(),
             axis.text.x=element_text(colour="black"))