We have forwarded a proposal for Computational Developmental Science (Ram et al., 2015). The basic idea is that theory can be used to generate some formal propositions (equations). Those propositions can then be used in to simualate/generate data. The implications of the theory can then be explored in the simulated world through in silica experiments (e.g., changing parameter values). Theory can be rejected and revised appropriately.

# 1 Overview

This script is a very basic set-up for simualting growth model data. We specifically generate formal propositions for (1) intraindividual change and (2) interindividual differences in specific aspects of intraindividual change.

We set up our example using a (1) linear model for intraindividual change and (2) interindividual differences in two aspects of change, intercept and linear slope. The script is meant to be adjusted for examination of other theories (e.g., nonlinear intraindividual change).

## 1.1 Preliminary

Libraries used in this script

library(ggplot2)
library(MASS)
library(psych)

# 2 Specify Growth Model

Our first task is to write out the theory of interest - a theory of interindividual differences in intraindividual change.

## 2.1 Specifying mathematical model

The equation for intraindividual change is … (will need to adjust this section with new equation)

$y_{it} = (b_{0i}) + (b_{1i})x_{it} + e_{it}$ where $x_{it} = time_{it}$ and $b_{0i} = g_{00} + u_{0i}$ $b_{1i} = g_{10} + u_{1i}$ with $e_{it} \tilde{} N(0,\sigma^{2}_{e})$ and $[u_{0i},u_{1i}, ...] \tilde{} N(0,\Psi)$

After algebraic substitution we get … $y_{it} = (g_{00} + u_{0i}) + (g_{10} + u_{1i})x_{it} + e_{it}$

This equation can be rearranged to more closely match R code layout of whatever functions will use to estimate. For our purposes here, this form translates in a sufficiently straightforward manner with our coding below.

# 3 Generate Intraindividual Change (prototype)

## 3.1 Set up a computation function

Setting up function in R to hold the mathemtical growth model … (will need to adjust this section to match new equation)

growth.fun <- function(x, u_0=0,u_1=0,u_2=0,u_3=0,u_4=0, error=0) {
y = (g_00 + u_0) + (g_10 + u_1)*x + error   #adjust this line to match equation
return(y)
}

## 3.2 Setting values for simulation

Set prototypical values for g00, g10, g20, g30, ….(will need to adjust these values with theoretically informed values)

#setting fixed effects parameters
g_00 <- 0
g_10 <- 0.2
g_20 <- NA
g_30 <- NA
g_40 <- NA
#...add more as needed ...

Set number of persons and occasions (here it is for 1 person, 10 occasions).

num_persons <- 1
num_occasions <- 10

## 3.3 Generate data for prototype trajectory

Generate empty data:

proto_data <- data.frame(
id = sort(rep(c(1:num_persons),num_occasions)),
time = rep(c(0:(num_occasions-1)),num_persons),
outcome = NA)
head(proto_data)
id time outcome
1 0 NA
1 1 NA
1 2 NA
1 3 NA
1 4 NA
1 5 NA

Fill in ‘outcome’ with prototype trajectory from function (plugging in x = time variable)

proto_data$outcome <- growth.fun(x=proto_data$time)
proto_data
id time outcome
1 0 0.0
1 1 0.2
1 2 0.4
1 3 0.6
1 4 0.8
1 5 1.0
1 6 1.2
1 7 1.4
1 8 1.6
1 9 1.8

## 3.4 Plot prototype trajectory

Plot the prototype …(may need to adjust for aestehtics)

#plot prototype trajectory
ggplot(data = proto_data, aes(x = time, y = outcome, group = id)) +
ggtitle("Simulated Linear Growth") +
#geom_point() +
geom_line(color="red", size = 2) +
xlab("Time") +
ylab("Outcome") + #ylim(0,100) +
scale_x_continuous(breaks=seq(0,9,by=1))

We now have the intraindividual change part in place.

# 4 Generate Interindividual Differences in Intraindividual Change

## 4.1 Setting values for simulation

This section is setting the information for the between-person random effects part of the model. $[u_{0i},u_{1i}, ...] \tilde{} N(0,\Psi)$.

Set the number of persons to simulate: (will need to adjust this to size of desired sample)

num_persons <- 25

Create names for the growth coefficients (gcoeffs):

names_gcoeffs <- c("u_0","u_1","u_2","u_3","u_4")

Note - am creating 5 individual difference coefficients even though our example only uses 2.

Create a vector of means for the individual differences, u variables (typically these will be zero):

#setting means of random effects
u_0 <- 0
u_1 <- 0
u_2 <- 0
u_3 <- 0
u_4 <- 0

Set values for standard deviations for growth coefficients (will need to adjust this with chosen values)

sig_u_0 <- 1.0
sig_u_1 <- 0.3
sig_u_2 <- 1.0
sig_u_3 <- 1.0
sig_u_4 <- 1.0
#...add more as needed ...

Create a symmetric matrix of correlations among the growth coefficients (will need to adjust this with chosen values):

corr_gcoeffs <- matrix(c(1.0,0.2,0.0,0.0,0.0,
0.2,1.0,0.0,0.0,0.0,
0.0,0.0,1.0,0.0,0.0,
0.0,0.0,0.0,1.0,0.0,
0.0,0.0,0.0,0.0,1.0),nrow=5,ncol=5,byrow=TRUE)
corr_gcoeffs
##      [,1] [,2] [,3] [,4] [,5]
## [1,]  1.0  0.2    0    0    0
## [2,]  0.2  1.0    0    0    0
## [3,]  0.0  0.0    1    0    0
## [4,]  0.0  0.0    0    1    0
## [5,]  0.0  0.0    0    0    1

Create a vector of means. Create a vector of standard deviations and make it into a diagonal matrix:

#making mean vector
means_gcoeffs <- c(u_0,u_1,u_2,u_3,u_4)
#making sds vector and diagonal matrix
sds_gcoeffs_vector <- c(sig_u_0,sig_u_1,sig_u_2,sig_u_3,sig_u_4)
sds_gcoeffs <- diag(sds_gcoeffs_vector)
sds_gcoeffs
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1  0.0    0    0    0
## [2,]    0  0.3    0    0    0
## [3,]    0  0.0    1    0    0
## [4,]    0  0.0    0    1    0
## [5,]    0  0.0    0    0    1

Do the matrix multiplication to get a covariance matrix:

cov_gcoeffs <- sds_gcoeffs %*% corr_gcoeffs %*% t(sds_gcoeffs)
cov_gcoeffs
##      [,1] [,2] [,3] [,4] [,5]
## [1,] 1.00 0.06    0    0    0
## [2,] 0.06 0.09    0    0    0
## [3,] 0.00 0.00    1    0    0
## [4,] 0.00 0.00    0    1    0
## [5,] 0.00 0.00    0    0    1

## 4.2 Generate individual-level coefficients

We use the mvrnorm() function in the MASS package) …

Use the mean vector and variance-covariance matrix to create some multivariate normal data:

#setting the random seed (in order to be able to repeat with same generation)
set.seed(1234) #can change this value if want different data run
#simulate
coeffs_simulate <- mvrnorm(n = num_persons, mu = means_gcoeffs, Sigma = cov_gcoeffs, empirical = TRUE)

Note that makes the data exact (if N is large enough)

Some cleanup into a data-like object.

#making and binding an id variable
id <- c(1:nrow(coeffs_simulate))
indiv_data <- cbind(id,coeffs_simulate)
#assigning the variable names
colnames(indiv_data) <- c("id",names_gcoeffs)
head(indiv_data)
##      id        u_0         u_1         u_2         u_3        u_4
## [1,]  1  1.0883565  0.35493517  0.06978623  1.03101628  1.5681671
## [2,]  2  1.1514923  0.02592345  1.35546492 -0.10400543 -1.1427319
## [3,]  3  1.2717192  0.08035000 -1.10056618  0.60137539 -0.6211519
## [4,]  4  0.4712630 -0.19889661  2.31188787  0.07321659  1.6843328
## [5,]  5  0.6817137 -0.20026783 -0.59006198 -0.70953285 -0.2979141
## [6,]  6 -0.1140907 -0.37390083  1.24890947  0.42056273 -1.5163611
#descriptives
describe(indiv_data)
vars n mean sd median trimmed mad min max range skew kurtosis se
id 1 25 13 7.359801 13.0000000 13.0000000 8.8956000 1.0000000 25.0000000 24.000000 0.0000000 -1.3446646 1.47196
u_0 2 25 0 1.000000 0.4533346 0.0649800 0.9414834 -2.3494809 1.2717192 3.621200 -0.5562613 -0.8521228 0.20000
u_1 3 25 0 0.300000 -0.0559674 -0.0046921 0.2577785 -0.5169619 0.6649941 1.181956 0.3168838 -0.8436492 0.06000
u_2 4 25 0 1.000000 -0.0992872 -0.0440353 0.8244843 -2.2144909 2.3118879 4.526379 0.3678988 0.0888570 0.20000
u_3 5 25 0 1.000000 -0.0062015 0.0257368 0.8268599 -2.3353412 2.2650956 4.600437 -0.1814367 0.1138464 0.20000
u_4 6 25 0 1.000000 0.0631530 0.0133786 0.8978654 -1.9403057 1.6843328 3.624639 -0.1622044 -0.8209951 0.20000
pairs.panels(indiv_data[,-1])

We now have the individual differences part in place.

# 5 Generate Intraindividual Change Trajectories for Multiple Person Sample

Now lets generate the simulated intraindivdual change trajectories for all those individuals.

## 5.1 Setting additional simulation parameters

Set number of occasions: (will need to adjust this)

num_occasions <- 10

Setting the (time-varying) residual error standard deviation: (will need to adjust this)

$e_{it} \tilde{} N(0,\sigma^{2}_{e})$.

#set the sd of the error
error_sd <- 0.1

Calculating total number of observations

NobsTotal <- num_persons*num_occasions

Generating an empty data frame of appropraite size:

empty_df <- data.frame(
id = sort(rep(c(1:num_persons),num_occasions)),
time = rep(c(0:(num_occasions-1)),num_persons),
outcome = NA)
head(empty_df)
id time outcome
1 0 NA
1 1 NA
1 2 NA
1 3 NA
1 4 NA
1 5 NA

Merging in the individual differences coefficients:

sim_data <- merge(empty_df,indiv_data,by="id")

Creating a (time-varying) residual error score for each observation:

sim_data$error <- rnorm(nrow(sim_data),mean=0,sd=error_sd) head(sim_data) id time outcome u_0 u_1 u_2 u_3 u_4 error 1 0 NA 1.088356 0.3549352 0.0697862 1.031016 1.568167 0.0080060 1 1 NA 1.088356 0.3549352 0.0697862 1.031016 1.568167 -0.0631409 1 2 NA 1.088356 0.3549352 0.0697862 1.031016 1.568167 -0.1513288 1 3 NA 1.088356 0.3549352 0.0697862 1.031016 1.568167 -0.0636100 1 4 NA 1.088356 0.3549352 0.0697862 1.031016 1.568167 0.0226302 1 5 NA 1.088356 0.3549352 0.0697862 1.031016 1.568167 0.1013690 Notice which variables are time-invariant, and which are time-varying. ## 5.2 Generate longitudinal outcome data Use the originally specified growth function to generate outcome scores in the long data set: sim_data$outcome <- growth.fun(x=sim_data$time, u_0=sim_data$u_0,
u_1=sim_data$u_1, u_2=sim_data$u_2,
u_3=sim_data$u_3, u_4=sim_data$u_4,
error=sim_data\$error)
head(sim_data)
id time outcome u_0 u_1 u_2 u_3 u_4 error
1 0 1.096362 1.088356 0.3549352 0.0697862 1.031016 1.568167 0.0080060
1 1 1.580151 1.088356 0.3549352 0.0697862 1.031016 1.568167 -0.0631409
1 2 2.046898 1.088356 0.3549352 0.0697862 1.031016 1.568167 -0.1513288
1 3 2.689552 1.088356 0.3549352 0.0697862 1.031016 1.568167 -0.0636100
1 4 3.330727 1.088356 0.3549352 0.0697862 1.031016 1.568167 0.0226302
1 5 3.964401 1.088356 0.3549352 0.0697862 1.031016 1.568167 0.1013690

## 5.3 Plot trajecotry data

Plotting the simulated data (with the prototype as an additional layer) (may need to adjust title and for aesthetics)

ggplot(data = sim_data, aes(x = time, y = outcome, group = id)) +
ggtitle("Simulated Linear Growth") +
geom_point() +
geom_line() +
xlab("Time") +
ylab("Outcome") + #ylim(0,100) +
scale_x_continuous(breaks=seq(0,9,by=1)) +
stat_function(fun=growth.fun, color="red", size = 2)