We have forwarded a proposal for **Computational Developmental Science** (Ram et al., 2015). The basic idea is that theory can be used to generate some formal propositions (equations). Those propositions can then be used in to simualate/generate data. The implications of the theory can then be explored in the simulated world through *in silica* experiments (e.g., changing parameter values). Theory can be rejected and revised appropriately.

This script is a very basic set-up for simualting growth model data. We specifically generate formal propositions for (1) intraindividual change and (2) interindividual differences in specific aspects of intraindividual change.

We set up our example using a (1) linear model for intraindividual change and (2) interindividual differences in two aspects of change, intercept and linear slope. The script is meant to be adjusted for examination of other theories (e.g., nonlinear intraindividual change).

Libraries used in this script

```
library(ggplot2)
library(MASS)
library(psych)
```

Our first task is to write out the theory of interest - a theory of interindividual differences in intraindividual change.

The equation for intraindividual change is â€¦ **(will need to adjust this section with new equation)**

\[y_{it} = (b_{0i}) + (b_{1i})x_{it} + e_{it}\] where \[x_{it} = time_{it}\] and \[b_{0i} = g_{00} + u_{0i}\] \[b_{1i} = g_{10} + u_{1i}\] with \[e_{it} \tilde{} N(0,\sigma^{2}_{e})\] and \[[u_{0i},u_{1i}, ...] \tilde{} N(0,\Psi)\]

After algebraic substitution we get â€¦ \[y_{it} = (g_{00} + u_{0i}) + (g_{10} + u_{1i})x_{it} + e_{it}\]

This equation can be rearranged to more closely match R code layout of whatever functions will use to estimate. For our purposes here, this form translates in a sufficiently straightforward manner with our coding below.

Setting up function in R to hold the mathemtical growth model â€¦ **(will need to adjust this section to match new equation)**

```
growth.fun <- function(x, u_0=0,u_1=0,u_2=0,u_3=0,u_4=0, error=0) {
y = (g_00 + u_0) + (g_10 + u_1)*x + error #adjust this line to match equation
return(y)
}
```

Set prototypical values for g00, g10, g20, g30, â€¦.**(will need to adjust these values with theoretically informed values)**

```
#setting fixed effects parameters
g_00 <- 0
g_10 <- 0.2
g_20 <- NA
g_30 <- NA
g_40 <- NA
#...add more as needed ...
```

Set number of persons and occasions (here it is for 1 person, 10 occasions).

```
num_persons <- 1
num_occasions <- 10
```

Generate empty data:

```
proto_data <- data.frame(
id = sort(rep(c(1:num_persons),num_occasions)),
time = rep(c(0:(num_occasions-1)),num_persons),
outcome = NA)
head(proto_data)
```

id | time | outcome |
---|---|---|

1 | 0 | NA |

1 | 1 | NA |

1 | 2 | NA |

1 | 3 | NA |

1 | 4 | NA |

1 | 5 | NA |

Fill in â€˜outcomeâ€™ with prototype trajectory from function (plugging in x = time variable)

```
proto_data$outcome <- growth.fun(x=proto_data$time)
proto_data
```

id | time | outcome |
---|---|---|

1 | 0 | 0.0 |

1 | 1 | 0.2 |

1 | 2 | 0.4 |

1 | 3 | 0.6 |

1 | 4 | 0.8 |

1 | 5 | 1.0 |

1 | 6 | 1.2 |

1 | 7 | 1.4 |

1 | 8 | 1.6 |

1 | 9 | 1.8 |

Plot the prototype â€¦**(may need to adjust for aestehtics)**

```
#plot prototype trajectory
ggplot(data = proto_data, aes(x = time, y = outcome, group = id)) +
ggtitle("Simulated Linear Growth") +
#geom_point() +
geom_line(color="red", size = 2) +
xlab("Time") +
ylab("Outcome") + #ylim(0,100) +
scale_x_continuous(breaks=seq(0,9,by=1))
```

We now have the *intraindividual change* part in place.

This section is setting the information for the between-person random effects part of the model. \[[u_{0i},u_{1i}, ...] \tilde{} N(0,\Psi)\].

Set the number of persons to simulate: **(will need to adjust this to size of desired sample)**

`num_persons <- 25`

Create names for the growth coefficients (gcoeffs):

`names_gcoeffs <- c("u_0","u_1","u_2","u_3","u_4")`

Note - am creating 5 individual difference coefficients even though our example only uses 2.

Create a vector of means for the individual differences, *u* variables (typically these will be zero):

```
#setting means of random effects
u_0 <- 0
u_1 <- 0
u_2 <- 0
u_3 <- 0
u_4 <- 0
```

Set values for standard deviations for growth coefficients **(will need to adjust this with chosen values)**

```
sig_u_0 <- 1.0
sig_u_1 <- 0.3
sig_u_2 <- 1.0
sig_u_3 <- 1.0
sig_u_4 <- 1.0
#...add more as needed ...
```

Create a *symmetric matrix* of correlations among the growth coefficients **(will need to adjust this with chosen values)**:

```
corr_gcoeffs <- matrix(c(1.0,0.2,0.0,0.0,0.0,
0.2,1.0,0.0,0.0,0.0,
0.0,0.0,1.0,0.0,0.0,
0.0,0.0,0.0,1.0,0.0,
0.0,0.0,0.0,0.0,1.0),nrow=5,ncol=5,byrow=TRUE)
corr_gcoeffs
```

```
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.0 0.2 0 0 0
## [2,] 0.2 1.0 0 0 0
## [3,] 0.0 0.0 1 0 0
## [4,] 0.0 0.0 0 1 0
## [5,] 0.0 0.0 0 0 1
```

Create a *vector* of means. Create a *vector* of standard deviations and make it into a *diagonal matrix*:

```
#making mean vector
means_gcoeffs <- c(u_0,u_1,u_2,u_3,u_4)
#making sds vector and diagonal matrix
sds_gcoeffs_vector <- c(sig_u_0,sig_u_1,sig_u_2,sig_u_3,sig_u_4)
sds_gcoeffs <- diag(sds_gcoeffs_vector)
sds_gcoeffs
```

```
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 0.0 0 0 0
## [2,] 0 0.3 0 0 0
## [3,] 0 0.0 1 0 0
## [4,] 0 0.0 0 1 0
## [5,] 0 0.0 0 0 1
```

Do the matrix multiplication to get a covariance matrix:

```
cov_gcoeffs <- sds_gcoeffs %*% corr_gcoeffs %*% t(sds_gcoeffs)
cov_gcoeffs
```

```
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.00 0.06 0 0 0
## [2,] 0.06 0.09 0 0 0
## [3,] 0.00 0.00 1 0 0
## [4,] 0.00 0.00 0 1 0
## [5,] 0.00 0.00 0 0 1
```

We use the mvrnorm() function in the MASS package) â€¦

Use the mean vector and variance-covariance matrix to *create some multivariate normal data*:

```
#setting the random seed (in order to be able to repeat with same generation)
set.seed(1234) #can change this value if want different data run
#simulate
coeffs_simulate <- mvrnorm(n = num_persons, mu = means_gcoeffs, Sigma = cov_gcoeffs, empirical = TRUE)
```

Note that makes the data exact (if N is large enough)

Some cleanup into a data-like object.

```
#making and binding an id variable
id <- c(1:nrow(coeffs_simulate))
indiv_data <- cbind(id,coeffs_simulate)
#assigning the variable names
colnames(indiv_data) <- c("id",names_gcoeffs)
head(indiv_data)
```

```
## id u_0 u_1 u_2 u_3 u_4
## [1,] 1 1.0883565 0.35493517 0.06978623 1.03101628 1.5681671
## [2,] 2 1.1514923 0.02592345 1.35546492 -0.10400543 -1.1427319
## [3,] 3 1.2717192 0.08035000 -1.10056618 0.60137539 -0.6211519
## [4,] 4 0.4712630 -0.19889661 2.31188787 0.07321659 1.6843328
## [5,] 5 0.6817137 -0.20026783 -0.59006198 -0.70953285 -0.2979141
## [6,] 6 -0.1140907 -0.37390083 1.24890947 0.42056273 -1.5163611
```

```
#descriptives
describe(indiv_data)
```

vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

id | 1 | 25 | 13 | 7.359801 | 13.0000000 | 13.0000000 | 8.8956000 | 1.0000000 | 25.0000000 | 24.000000 | 0.0000000 | -1.3446646 | 1.47196 |

u_0 | 2 | 25 | 0 | 1.000000 | 0.4533346 | 0.0649800 | 0.9414834 | -2.3494809 | 1.2717192 | 3.621200 | -0.5562613 | -0.8521228 | 0.20000 |

u_1 | 3 | 25 | 0 | 0.300000 | -0.0559674 | -0.0046921 | 0.2577785 | -0.5169619 | 0.6649941 | 1.181956 | 0.3168838 | -0.8436492 | 0.06000 |

u_2 | 4 | 25 | 0 | 1.000000 | -0.0992872 | -0.0440353 | 0.8244843 | -2.2144909 | 2.3118879 | 4.526379 | 0.3678988 | 0.0888570 | 0.20000 |

u_3 | 5 | 25 | 0 | 1.000000 | -0.0062015 | 0.0257368 | 0.8268599 | -2.3353412 | 2.2650956 | 4.600437 | -0.1814367 | 0.1138464 | 0.20000 |

u_4 | 6 | 25 | 0 | 1.000000 | 0.0631530 | 0.0133786 | 0.8978654 | -1.9403057 | 1.6843328 | 3.624639 | -0.1622044 | -0.8209951 | 0.20000 |

`pairs.panels(indiv_data[,-1])`

We now have the *individual differences* part in place.

Now lets generate the simulated *intraindivdual change* trajectories for all those individuals.

Set number of occasions: **(will need to adjust this)**

`num_occasions <- 10`

Setting the (time-varying) residual error standard deviation: **(will need to adjust this)**

\[e_{it} \tilde{} N(0,\sigma^{2}_{e})\].

```
#set the sd of the error
error_sd <- 0.1
```

Calculating total number of observations

`NobsTotal <- num_persons*num_occasions`

Generating an empty data frame of appropraite size:

```
empty_df <- data.frame(
id = sort(rep(c(1:num_persons),num_occasions)),
time = rep(c(0:(num_occasions-1)),num_persons),
outcome = NA)
head(empty_df)
```

id | time | outcome |
---|---|---|

1 | 0 | NA |

1 | 1 | NA |

1 | 2 | NA |

1 | 3 | NA |

1 | 4 | NA |

1 | 5 | NA |

Merging in the individual differences coefficients:

`sim_data <- merge(empty_df,indiv_data,by="id")`

Creating a (time-varying) residual error score for each observation:

```
sim_data$error <- rnorm(nrow(sim_data),mean=0,sd=error_sd)
head(sim_data)
```

id | time | outcome | u_0 | u_1 | u_2 | u_3 | u_4 | error |
---|---|---|---|---|---|---|---|---|

1 | 0 | NA | 1.088356 | 0.3549352 | 0.0697862 | 1.031016 | 1.568167 | 0.0080060 |

1 | 1 | NA | 1.088356 | 0.3549352 | 0.0697862 | 1.031016 | 1.568167 | -0.0631409 |

1 | 2 | NA | 1.088356 | 0.3549352 | 0.0697862 | 1.031016 | 1.568167 | -0.1513288 |

1 | 3 | NA | 1.088356 | 0.3549352 | 0.0697862 | 1.031016 | 1.568167 | -0.0636100 |

1 | 4 | NA | 1.088356 | 0.3549352 | 0.0697862 | 1.031016 | 1.568167 | 0.0226302 |

1 | 5 | NA | 1.088356 | 0.3549352 | 0.0697862 | 1.031016 | 1.568167 | 0.1013690 |

Notice which variables are time-invariant, and which are time-varying.

Use the originally specified growth function to generate outcome scores in the long data set:

```
sim_data$outcome <- growth.fun(x=sim_data$time,
u_0=sim_data$u_0,
u_1=sim_data$u_1,
u_2=sim_data$u_2,
u_3=sim_data$u_3,
u_4=sim_data$u_4,
error=sim_data$error)
head(sim_data)
```

id | time | outcome | u_0 | u_1 | u_2 | u_3 | u_4 | error |
---|---|---|---|---|---|---|---|---|

1 | 0 | 1.096362 | 1.088356 | 0.3549352 | 0.0697862 | 1.031016 | 1.568167 | 0.0080060 |

1 | 1 | 1.580151 | 1.088356 | 0.3549352 | 0.0697862 | 1.031016 | 1.568167 | -0.0631409 |

1 | 2 | 2.046898 | 1.088356 | 0.3549352 | 0.0697862 | 1.031016 | 1.568167 | -0.1513288 |

1 | 3 | 2.689552 | 1.088356 | 0.3549352 | 0.0697862 | 1.031016 | 1.568167 | -0.0636100 |

1 | 4 | 3.330727 | 1.088356 | 0.3549352 | 0.0697862 | 1.031016 | 1.568167 | 0.0226302 |

1 | 5 | 3.964401 | 1.088356 | 0.3549352 | 0.0697862 | 1.031016 | 1.568167 | 0.1013690 |

Plotting the simulated data (with the prototype as an additional layer) **(may need to adjust title and for aesthetics)**

```
ggplot(data = sim_data, aes(x = time, y = outcome, group = id)) +
ggtitle("Simulated Linear Growth") +
geom_point() +
geom_line() +
xlab("Time") +
ylab("Outcome") + #ylim(0,100) +
scale_x_continuous(breaks=seq(0,9,by=1)) +
stat_function(fun=growth.fun, color="red", size = 2)
```