This tutorial walks through a few helpful initial steps before conducting growth curve analyses (or any analyses for that matter). Specifically, this tutorial demonstrates how to manipulate data structures and how to obtain initial descriptive statistics and plots of the data, which will be useful when making decisions about analyses later down the line. In this tutorial, we will be using a data set that examines weight over time.
The code and example provided in this tutorial are from Chapter 2 of Grimm, Ram, and Estabrook (2016), with a few additions in code and commentary; however, the chpater should be referred to for further interpretations and insights about the analyses.
This tutorial provides line-by-line code to
1. re-structure data (long to wide, and wide to long),
2 create initial longitudinal plots,
3. examine descriptive statistics, and
4. create plots of bivariate relationships.
#set filepath
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/wght_data.csv"
#read in the .csv file using the url() function
wght_long <- read.csv(file=url(filepath),header=TRUE)
#add names the columns of the data set
names(wght_long) <- c('id','occ','occ_begin','year','time_in_study','grade','age','gyn_age', 'wght')
#view the first few observations in the data set
head(wght_long)
## id occ occ_begin year time_in_study grade age gyn_age wght
## 1 4 2 2 1994 2.083333 5 10.916667 0.9166667 100
## 2 4 3 3 1996 3.750000 6 12.583333 2.5833333 108
## 3 5 1 1 1992 0.000000 0 6.333333 -6.6666667 49
## 4 5 2 2 1994 2.000000 1 8.333333 -4.6666667 52
## 5 5 3 3 1996 3.750000 2 10.083333 -2.9166667 72
## 6 5 4 4 1998 5.750000 5 12.083333 -0.9166667 100
#calling the libraries we will need throughout this tutorial
library(ggplot2)
library(psych)
library(reshape)
Often times, different analyses call for different data structures. Two data structures that are frequently used are referred to as “long” data files or “wide” data files. Long data files contain a row for each measurement occasion and a column for each repeated measure, resulting in a data file that is N (number of persons) x O (number of occasions) long. In contrast, wide data files contain a row for each person and each measurement occasion is a separate column.
It is useful to have both long and wide files of your data before beginning analyses. Below, we begin with re-structuring a long data file into a wide data file, and then reverse this change.
#rounding the variable age and creating a new variable with this information
wght_long$age_r <- round(wght_long$age)
#restructuring the data from long to wide
wght_wide <- reshape(wght_long, #data set
v.names='wght', #repeated measures variable
idvar='id', #id variable
timevar='age_r', #time metric/occasion variable
direction='wide') #direction of re-structuring
## Warning in reshapeWide(data, idvar = idvar, timevar =
## timevar, varying = varying, : some constant variables
## (occ,occ_begin,year,time_in_study,grade,age,gyn_age) are really varying
#view the first few observations in the data set
head(wght_wide)
## id occ occ_begin year time_in_study grade age gyn_age wght.11
## 1 4 2 2 1994 2.083333 5 10.916667 0.9166667 100
## 3 5 1 1 1992 0.000000 0 6.333333 -6.6666667 NA
## 8 8 1 1 1986 0.000000 4 10.000000 -2.0000000 NA
## 12 10 1 1 1990 0.000000 3 7.916667 -5.0833333 NA
## 16 11 1 1 1990 0.000000 1 7.583333 -6.4166667 NA
## 19 19 1 1 1996 0.000000 0 5.916667 -5.0833333 NA
## wght.13 wght.6 wght.8 wght.10 wght.12 wght.14 wght.16 wght.7 wght.5
## 1 108 NA NA NA NA NA NA NA NA
## 3 NA 49 52 72 100 124 NA NA NA
## 8 NA NA NA 85 133 180 160 NA NA
## 12 NA NA 61 75 105 147 NA NA NA
## 16 NA NA 49 65 70 NA NA NA NA
## 19 NA 40 52 91 120 NA NA NA NA
## wght.9 wght.15 wght.17 wght.19 wght.18
## 1 NA NA NA NA NA
## 3 NA NA NA NA NA
## 8 NA NA NA NA NA
## 12 NA NA NA NA NA
## 16 NA NA NA NA NA
## 19 NA NA NA NA NA
#creating new data set with only the id and weight variables
wght_wide1 <- wght_wide[ , c('id','wght.5','wght.6','wght.7','wght.8','wght.9','wght.10',
'wght.11','wght.12','wght.13','wght.14','wght.15','wght.16',
'wght.17','wght.18','wght.19')]
#view the first few observations in the data set
head(wght_wide1)
## id wght.5 wght.6 wght.7 wght.8 wght.9 wght.10 wght.11 wght.12 wght.13
## 1 4 NA NA NA NA NA NA 100 NA 108
## 3 5 NA 49 NA 52 NA 72 NA 100 NA
## 8 8 NA NA NA NA NA 85 NA 133 NA
## 12 10 NA NA NA 61 NA 75 NA 105 NA
## 16 11 NA NA NA 49 NA 65 NA 70 NA
## 19 19 NA 40 NA 52 NA 91 NA 120 NA
## wght.14 wght.15 wght.16 wght.17 wght.18 wght.19
## 1 NA NA NA NA NA NA
## 3 124 NA NA NA NA NA
## 8 180 NA 160 NA NA NA
## 12 147 NA NA NA NA NA
## 16 NA NA NA NA NA NA
## 19 NA NA NA NA NA NA
#add names the columns of the data set
names(wght_wide1) <- c('id','wght5','wght6','wght7','wght8','wght9','wght10',
'wght11','wght12','wght13','wght14','wght15','wght16',
'wght17','wght18','wght19')
#restructuring the data from wide to long
wght_long_new <- reshape(data = wght_wide1, #data set
idvar='id', #id variable
varying=c('wght5','wght6','wght7','wght8','wght9','wght10', #repeated measures variables
'wght11','wght12','wght13','wght14','wght15','wght16',
'wght17','wght18','wght19'),
times=c(5,6,7,8,9,10,11,12,13,14,15,16,17,18,19), #time metric/occasion variable
v.names='wght', #name of repeated measures variable (i.e., new column name)
direction='long') #direction of restructuring
#re-order columns
wght_long_new <- wght_long_new[order(wght_long_new$id, wght_long_new$time),]
#view the first few observations in the data set
head(wght_long_new)
## id time wght
## 4.5 4 5 NA
## 4.6 4 6 NA
## 4.7 4 7 NA
## 4.8 4 8 NA
## 4.9 4 9 NA
## 4.10 4 10 NA
#creating new data set with no missing weight variables
wght_long_new1 <- wght_long_new[which(!is.na(wght_long_new$wght)), ]
#view the first few observations in the data set
head(wght_long_new1)
## id time wght
## 4.11 4 11 100
## 4.13 4 13 108
## 5.6 5 6 49
## 5.8 5 8 52
## 5.10 5 10 72
## 5.12 5 12 100
Before beginning analyses, it is often helpful to examine plots of your data. In this case, we want to make sure that growth curve models are appropriate for our data (i.e., do we see any growth/change in our data?).
#creating a new data set with a subset of our data for plots that are more clear/less messy
wght_long1 <- wght_long[which(wght_long$id > 1300 & wght_long$id < 1600), ]
#creating a plot and assigning it to an object
plot_obs <- ggplot(data=wght_long1, #data set
aes(x=age, y=wght, group=id)) + #calling variables
geom_line() + #adding lines to plot
theme_bw() + #changing style/background
scale_x_continuous(breaks = c(5,7,9,11,13,15,17), name = "Chronological Age") + #creating breaks in the x-axis and labeling the x-axis
scale_y_continuous(breaks = c(25,50,75,100,125,150,175,200,225), name = "Weight") #creating breaks in the y-axis and labeling the y-axis
#printing the object (plot)
print(plot_obs)
plot_obs <- ggplot(data=wght_long1, #data set
aes(x=age, y=wght, group=id)) + #calling variables
geom_line() + #adding lines to plot
geom_point(size=2) + #adding and adjusting size of points on plot
theme_classic() + #changing style/background
scale_x_continuous(breaks = c(5,7,9,11,13,15,17), name = "Chronological Age") + #creating breaks in the x-axis and labeling the x-axis
scale_y_continuous(breaks = c(25,50,75,100,125,150,175,200,225), name = "Weight") #creating breaks in the y-axis and labeling the y-axis
#print the plot
print(plot_obs)
Now since we’ve visually examined our data, we will get a better feel of our data through the examination of descriptive statistics. We use our wide data fill to conduct these analyses.
#creating new data set with only the weight variables
wght_vars <- wght_wide1[ , c('wght5','wght6','wght7','wght8','wght9','wght10','wght11','wght12',
'wght13','wght14','wght15','wght16','wght17','wght18','wght19')]
#view the first few observations in the data set
head(wght_vars)
## wght5 wght6 wght7 wght8 wght9 wght10 wght11 wght12 wght13 wght14 wght15
## 1 NA NA NA NA NA NA 100 NA 108 NA NA
## 3 NA 49 NA 52 NA 72 NA 100 NA 124 NA
## 8 NA NA NA NA NA 85 NA 133 NA 180 NA
## 12 NA NA NA 61 NA 75 NA 105 NA 147 NA
## 16 NA NA NA 49 NA 65 NA 70 NA NA NA
## 19 NA 40 NA 52 NA 91 NA 120 NA NA NA
## wght16 wght17 wght18 wght19
## 1 NA NA NA NA
## 3 NA NA NA NA
## 8 160 NA NA NA
## 12 NA NA NA NA
## 16 NA NA NA NA
## 19 NA NA NA NA
#univariate descriptives
describe(wght_vars)
## vars n mean sd median trimmed mad min max range skew
## wght5 1 171 43.47 9.26 41.0 42.45 5.93 30 90 60 1.61
## wght6 2 837 47.85 9.84 46.0 46.86 8.90 27 110 83 1.48
## wght7 3 856 54.21 12.46 51.0 52.69 8.90 8 127 119 1.35
## wght8 4 1157 62.77 17.57 60.0 60.72 14.83 7 280 273 2.59
## wght9 5 1001 72.78 19.70 69.0 70.50 16.31 37 220 183 1.40
## wght10 6 1320 83.08 23.03 79.0 80.51 20.76 20 200 180 1.16
## wght11 7 1145 96.72 27.31 91.0 93.90 23.72 44 265 221 1.16
## wght12 8 1288 108.90 28.75 104.0 106.12 23.72 1 249 248 1.05
## wght13 9 1014 122.82 33.15 116.0 119.06 25.20 42 313 271 1.26
## wght14 10 1054 130.42 34.12 122.0 125.78 25.20 62 324 262 1.50
## wght15 11 143 130.41 26.88 128.0 127.90 23.72 87 235 148 0.93
## wght16 12 72 135.03 32.76 126.5 130.91 25.95 90 240 150 1.09
## wght17 13 48 137.31 33.42 129.0 132.62 20.76 96 255 159 1.56
## wght18 14 15 136.13 49.94 119.0 127.77 14.83 101 280 179 1.88
## wght19 15 7 153.43 30.55 145.0 153.43 22.24 118 200 82 0.39
## kurtosis se
## wght5 4.29 0.71
## wght6 4.46 0.34
## wght7 3.20 0.43
## wght8 21.02 0.52
## wght9 3.80 0.62
## wght10 1.88 0.63
## wght11 2.27 0.81
## wght12 1.91 0.80
## wght13 2.41 1.04
## wght14 2.98 1.05
## wght15 1.10 2.25
## wght16 0.70 3.86
## wght17 2.28 4.82
## wght18 2.31 12.89
## wght19 -1.67 11.55
#bivariate descriptives
cor(wght_vars, use='pairwise.complete.obs') #correlation matrix
## wght5 wght6 wght7 wght8 wght9 wght10
## wght5 1.0000000 NA 0.7763339 1.0000000 0.8038661 0.7278684
## wght6 NA 1.0000000 0.8557850 0.7369325 0.8083459 0.7618904
## wght7 0.7763339 0.8557850 1.0000000 0.8631670 0.8488304 0.6646086
## wght8 1.0000000 0.7369325 0.8631670 1.0000000 0.9571767 0.7791118
## wght9 0.8038661 0.8083459 0.8488304 0.9571767 1.0000000 0.8940827
## wght10 0.7278684 0.7618904 0.6646086 0.7791118 0.8940827 1.0000000
## wght11 0.6527799 0.7457341 0.7802957 0.8010815 0.8688773 0.9059677
## wght12 0.4330780 0.6539078 0.6907005 0.7840391 0.8057898 0.8675311
## wght13 0.6656333 0.7186277 0.7426071 0.8057780 0.8224555 0.7865057
## wght14 0.7254958 0.5842574 0.6680452 0.6488491 0.8505020 0.7829868
## wght15 NA 0.2979179 0.9981373 0.7044674 0.6291826 0.6701769
## wght16 NA NA NA NA 0.9155911 0.6347579
## wght17 NA NA NA NA NA 0.7905679
## wght18 NA NA NA NA NA NA
## wght19 NA NA NA NA NA NA
## wght11 wght12 wght13 wght14 wght15 wght16
## wght5 0.6527799 0.4330780 0.6656333 0.7254958 NA NA
## wght6 0.7457341 0.6539078 0.7186277 0.5842574 0.2979179 NA
## wght7 0.7802957 0.6907005 0.7426071 0.6680452 0.9981373 NA
## wght8 0.8010815 0.7840391 0.8057780 0.6488491 0.7044674 NA
## wght9 0.8688773 0.8057898 0.8224555 0.8505020 0.6291826 0.9155911
## wght10 0.9059677 0.8675311 0.7865057 0.7829868 0.6701769 0.6347579
## wght11 1.0000000 0.8403308 0.8601377 0.8776408 0.7062868 0.6974460
## wght12 0.8403308 1.0000000 0.9104735 0.8457076 0.7411690 0.8621695
## wght13 0.8601377 0.9104735 1.0000000 0.9643801 0.8059892 0.9258816
## wght14 0.8776408 0.8457076 0.9643801 1.0000000 0.9900719 0.9264146
## wght15 0.7062868 0.7411690 0.8059892 0.9900719 1.0000000 0.9995866
## wght16 0.6974460 0.8621695 0.9258816 0.9264146 0.9995866 1.0000000
## wght17 0.4637109 0.9621373 0.8964683 0.9794416 0.8783625 0.9431854
## wght18 0.9331130 0.3258318 1.0000000 0.9422574 NA 0.9032972
## wght19 NA 1.0000000 1.0000000 NA 0.8464803 NA
## wght17 wght18 wght19
## wght5 NA NA NA
## wght6 NA NA NA
## wght7 NA NA NA
## wght8 NA NA NA
## wght9 NA NA NA
## wght10 0.7905679 NA NA
## wght11 0.4637109 0.9331130 NA
## wght12 0.9621373 0.3258318 1.0000000
## wght13 0.8964683 1.0000000 1.0000000
## wght14 0.9794416 0.9422574 NA
## wght15 0.8783625 NA 0.8464803
## wght16 0.9431854 0.9032972 NA
## wght17 1.0000000 NA 0.9499461
## wght18 NA 1.0000000 NA
## wght19 0.9499461 NA 1.0000000
cov(wght_vars, use='pairwise.complete.obs') #covariance matrix
## wght5 wght6 wght7 wght8 wght9 wght10 wght11
## wght5 85.79195 NA 100.6235 120.0000 160.0060 87.30714 172.7811
## wght6 NA 96.84823 107.9002 142.6230 268.3675 173.69643 270.3874
## wght7 100.62355 107.90018 155.3054 168.1466 214.8442 164.22347 262.8675
## wght8 120.00000 142.62304 168.1466 308.6154 506.4894 321.34312 444.2223
## wght9 160.00597 268.36755 214.8442 506.4894 388.2059 357.30508 475.6419
## wght10 87.30714 173.69643 164.2235 321.3431 357.3051 530.49248 547.4219
## wght11 172.78107 270.38739 262.8675 444.2223 475.6419 547.42195 745.7104
## wght12 71.31579 172.50264 247.2632 357.4025 364.9774 556.08329 502.3402
## wght13 179.46414 272.85654 323.3059 509.5192 573.6170 487.37626 793.5887
## wght14 176.21053 187.14631 207.1805 414.5335 452.4881 617.12588 786.0611
## wght15 NA 32.13636 334.5000 310.2857 237.2240 267.58559 443.1825
## wght16 NA NA NA NA 497.3053 352.76882 295.8667
## wght17 NA NA NA NA NA 261.48918 130.6970
## wght18 NA NA NA NA NA NA 258.6667
## wght19 NA NA NA NA NA NA NA
## wght12 wght13 wght14 wght15 wght16 wght17
## wght5 71.31579 179.4641 176.2105 NA NA NA
## wght6 172.50264 272.8565 187.1463 32.13636 NA NA
## wght7 247.26321 323.3059 207.1805 334.50000 NA NA
## wght8 357.40250 509.5192 414.5335 310.28571 NA NA
## wght9 364.97741 573.6170 452.4881 237.22402 497.3053 NA
## wght10 556.08329 487.3763 617.1259 267.58559 352.7688 261.4892
## wght11 502.34023 793.5887 786.0611 443.18248 295.8667 130.6970
## wght12 826.75810 777.6939 797.8288 390.89655 710.5937 836.5500
## wght13 777.69388 1099.2347 1613.8075 641.77025 1047.9167 970.7571
## wght14 797.82875 1613.8075 1164.3862 61.00000 950.0605 218.6000
## wght15 390.89655 641.7703 61.0000 722.63715 1405.0000 988.0465
## wght16 710.59373 1047.9167 950.0605 1405.00000 1073.4358 118.0000
## wght17 836.55000 970.7571 218.6000 988.04655 118.0000 1116.9428
## wght18 36.08333 1584.0000 1850.5321 NA 2236.9727 NA
## wght19 1435.00000 42.0000 NA 888.00000 NA 809.6000
## wght18 wght19
## wght5 NA NA
## wght6 NA NA
## wght7 NA NA
## wght8 NA NA
## wght9 NA NA
## wght10 NA NA
## wght11 258.66667 NA
## wght12 36.08333 1435.0000
## wght13 1584.00000 42.0000
## wght14 1850.53205 NA
## wght15 NA 888.0000
## wght16 2236.97273 NA
## wght17 NA 809.6000
## wght18 2493.69524 NA
## wght19 NA 933.2857
Finally, we plot the bivariate descriptive statistics we just examined.
#creating a function that will plot bivariate relationships of our variables
panel.hist <- function(x, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h <- hist(x, plot = FALSE)
breaks <- h$breaks; nB <- length(breaks)
y <- h$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col="cyan", ...)
}
#using pairs to create a matrix of scatterplots and our panel.hist function to create a scatterplot matrix of the bivariate relationships between all of our weight measures
pairs(~wght5+wght6+wght7+wght8+wght9+
wght10+wght11+wght12+wght13+wght14+
wght15+wght16+wght17+wght18+wght19,
data=wght_wide1, diag.panel=panel.hist)
#using pairs to create a matrix of scatterplots and our panel.hist function to create a scatterplot matrix of the bivariate relationships between a subset of our weight measures
pairs(~wght5+wght6+wght7+wght8+wght9+wght10, data=wght_wide1, diag.panel=panel.hist)
This tutorial has presented several key steps when beginning data analyses, specifically, setting up data in several formats (long and wide), plotting the data, and obtaining descriptive statistics of the data.