# Overview

This tutorial illustrates how to do power analysis for multilevel models using the “simr” package in R. The examples follow the presentation in Chapter 10, Statistical Power for Intenstive Longitudinal Designs, of Bolger & Laurenceau (2013).

# Outline

This script covers …

A. Power analysis for a simple multilevel model (growth model)

B. Power analysis for a multilevel process model

C. Power analysis for a categorical outcome multilevel model

Some other useful overviews on doing power analysis in R can be found at

### Preliminaries

library(psych) #for describing data
library(lme4) # multilevel models
library(simr)  #for power analysis simualtions
library(ggplot2) # for data visualization
library(plyr)  # for data manipulation

# A. Power Analysis for the Time Course Example

This example makes use of the Intimacy data set from B & L Chapter 4.

#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/BLdata_time.csv"
#read in the .csv file using the url() function
data_intimacy <- read.csv(file=url(filepath),header=TRUE)

Describe the data.

#describing the data
describe(data_intimacy)
##           vars   n  mean    sd median trimmed   mad min   max range skew
## id           1 800 25.50 14.44  25.50   25.50 18.53   1 50.00 49.00 0.00
## time         2 800  7.50  4.61   7.50    7.50  5.93   0 15.00 15.00 0.00
## time01       3 800  0.50  0.31   0.50    0.50  0.40   0  1.00  1.00 0.00
## intimacy     4 800  3.47  1.63   3.33    3.43  1.61   0  9.41  9.41 0.28
## treatment    5 800  0.50  0.50   0.50    0.50  0.74   0  1.00  1.00 0.00
##           kurtosis   se
## id           -1.21 0.51
## time         -1.21 0.16
## time01       -1.21 0.01
## intimacy     -0.06 0.06
## treatment    -2.00 0.02
#number of persons
length(unique(data_intimacy$id)) ## [1] 50 #number of occasions length(unique(data_intimacy$time))
## [1] 16
#crossing of persons and occasions
xtabs(~id+time, data=data_intimacy)
##     time
## id   0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
##   1  1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   2  1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   3  1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   4  1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   5  1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   6  1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   7  1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   8  1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   9  1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   10 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   11 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   12 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   13 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   14 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   15 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   16 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   17 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   18 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   19 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   20 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   21 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   22 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   23 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   24 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   25 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   26 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   27 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   28 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   29 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   30 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   31 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   32 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   33 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   34 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   35 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   36 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   37 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   38 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   39 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   40 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   41 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   42 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   43 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   44 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   45 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   46 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   47 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   48 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   49 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1
##   50 1 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1

We see that there are 50 persons with 16 occasions each (coded 0 to 15). The data are fully complete.

## Running the multilevel with the available data.

We use the lmer() function in the lme4 library, as the simr power analysis package is built to work with lmer().

Running multilevel using lmer().

#Run linear growth model with AR(1) errors
fit_lgmodel <- lmer(intimacy ~ 1 + time01 + treatment + time01:treatment +
(1 + time01 | id),
data=data_intimacy,
na.action = na.exclude)
#examine model summary
summary(fit_lgmodel)
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## intimacy ~ 1 + time01 + treatment + time01:treatment + (1 + time01 |
##     id)
##    Data: data_intimacy
##
## REML criterion at convergence: 2834.4
##
## Scaled residuals:
##     Min      1Q  Median      3Q     Max
## -2.6243 -0.6798 -0.0190  0.6434  2.4463
##
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  id       (Intercept) 0.6858   0.8281
##           time01      1.8938   1.3762   -0.45
##  Residual             1.6925   1.3010
## Number of obs: 800, groups:  id, 50
##
## Fixed effects:
##                  Estimate Std. Error t value
## (Intercept)       2.89897    0.20703  14.002
## time01            0.73520    0.34721   2.117
## treatment        -0.05644    0.29279  -0.193
## time01:treatment  0.92143    0.49103   1.877
##
## Correlation of Fixed Effects:
##             (Intr) time01 trtmnt
## time01      -0.599
## treatment   -0.707  0.423
## tm01:trtmnt  0.423 -0.707 -0.599

The results are the very similar to those in the book even through the model does not include the auto-correlation of residuals. lmer() does not provide for alternative error structures.

The object obtaned from lmer(), formally referred to as a merMod object is used in the power analysis.

We can simualte power directly from the fitted model using the powerSim() function. We supply the function with the fitted model object (fit_lgmodel), the specific parameter we would like to examine (“time01”) and test to use (“lr”), how many simulations we would like (nsim=100) and what alpha level to use in the evaluations of significance (alpha=.05).

Typically, power ananlysis simulations use 1000 or more simulations. For time savings here, we only use 100.

We evaluate all three parameters in the model.

Simulating power directly from fitted model for the time effect.

#Power for the effect of interest
SimPower_Direct<-powerSim(fit_lgmodel,test=fixed("time01","lr"),
seed=1234, #set for replication
nsim=100, alpha=.05, progress=TRUE) #options
## Simulating: |                                                             |Simulating: |=                                                            |Simulating: |==                                                           |Simulating: |===                                                          |Simulating: |====                                                         |Simulating: |=====                                                        |Simulating: |======                                                       |Simulating: |=======                                                      |Simulating: |========                                                     |Simulating: |=========                                                    |Simulating: |==========                                                   |Simulating: |===========                                                  |Simulating: |============                                                 |Simulating: |=============                                                |Simulating: |==============                                               |Simulating: |===============                                              |Simulating: |================                                             |Simulating: |=================                                            |Simulating: |==================                                           |Simulating: |===================                                          |Simulating: |====================                                         |Simulating: |=====================                                        |Simulating: |======================                                       |Simulating: |=======================                                      |Simulating: |========================                                     |Simulating: |=========================                                    |Simulating: |==========================                                   |Simulating: |===========================                                  |Simulating: |============================                                 |Simulating: |=============================                                |Simulating: |==============================                               |Simulating: |===============================                              |Simulating: |================================                             |Simulating: |=================================                            |Simulating: |==================================                           |Simulating: |===================================                          |Simulating: |====================================                         |Simulating: |=====================================                        |Simulating: |======================================                       |Simulating: |=======================================                      |Simulating: |========================================                     |Simulating: |=========================================                    |Simulating: |==========================================                   |Simulating: |===========================================                  |Simulating: |============================================                 |Simulating: |=============================================                |Simulating: |==============================================               |Simulating: |===============================================              |Simulating: |================================================             |Simulating: |=================================================            |Simulating: |==================================================           |Simulating: |===================================================          |Simulating: |====================================================         |Simulating: |=====================================================        |Simulating: |======================================================       |Simulating: |=======================================================      |Simulating: |========================================================     |Simulating: |=========================================================    |Simulating: |==========================================================   |Simulating: |===========================================================  |Simulating: |============================================================ |Simulating: |=============================================================|
## Warning in observedPowerWarning(sim): This appears to be an "observed
## power" calculation
#examine post-hoc power
SimPower_Direct
## Power for predictor 'time01', (95% confidence interval):
##       62.00% (51.75, 71.52)
##
## Test: Likelihood ratio
##       Effect size for time01 is 0.74
##
## Based on 100 simulations, (100 warnings, 0 errors)
## alpha = 0.05, nrow = 800
##
## Time elapsed: 0 h 0 m 12 s
##
## nb: result might be an observed power calculation

We see that with an effect size of 0.74, power for the time01 predictor is 62%, less than ideal.

Simulating power directly from fitted model for the treatment effect.

#Power for the effect of interest
SimPower_Direct<-powerSim(fit_lgmodel,test=fixed("treatment","lr"),
seed=1234, #set for replication
nsim=100, alpha=.05, progress=FALSE) #options
## Warning in observedPowerWarning(sim): This appears to be an "observed
## power" calculation
#examine post-hoc power
SimPower_Direct
## Power for predictor 'treatment', (95% confidence interval):
##        5.00% ( 1.64, 11.28)
##
## Test: Likelihood ratio
##       Effect size for treatment is -0.056
##
## Based on 100 simulations, (100 warnings, 0 errors)
## alpha = 0.05, nrow = 800
##
## Time elapsed: 0 h 0 m 11 s
##
## nb: result might be an observed power calculation

We see that with an effect size of -0.06, power for the treatment predictor is 5%, very low.

Simulating power directly from fitted model for the time01:treatment interaction effect.

#Power for the effect of interest
SimPower_Direct<-powerSim(fit_lgmodel,test=fixed("time01:treatment","lr"),
seed=1234, #set for replication
nsim=100, alpha=.05, progress=FALSE) #options
## Warning in observedPowerWarning(sim): This appears to be an "observed
## power" calculation
#examine post-hoc power
SimPower_Direct
## Power for predictor 'time01:treatment', (95% confidence interval):
##       45.00% (35.03, 55.27)
##
## Test: Likelihood ratio
##       Effect size for time01:treatment is 0.92
##
## Based on 100 simulations, (20 warnings, 0 errors)
## alpha = 0.05, nrow = 800
##
## Time elapsed: 0 h 0 m 11 s
##
## nb: result might be an observed power calculation

We see that with an effect size of 0.92, power for the time01:treatment interaction is 45%, lower than ideal.

All three of these evaluations are very similar to what was obtained using Mplus and presented in the book.

Given low power, we would like to evaluate what is the power to detect the interaction at different sample sizes - obtained by either adding more persons or adding more occasions within person.

To evaluate sample sizes that are larger than the original data, we first “extend” the data by creating a new lmer() object. Specifically, we use the extend() function in the simr package.

We do that by extending the data that is inside the lmer() model object to include more participants or more occasions.

Look at size of the data within the lmer() object.

#xtabs(~id+time, data=getData(fit_lgmodel))

Extending the number of persons from 50 to 200.

#extending number of persons
fit_lgmodelpersons <- extend(fit_lgmodel, along="id", n=200)

#checking length of data, number of persons
length(unique(getData(fit_lgmodelpersons)$id)) ## [1] 200 #checking length of data, number of times length(unique(getData(fit_lgmodelpersons)$time))
## [1] 16

Extending the number of occasions from 16 to 60.

#extending number of occasions
fit_lgmodeltimes <- extend(fit_lgmodel, along="time", n=60)

#checking length of data, number of persons
length(unique(getData(fit_lgmodeltimes)$id)) ## [1] 50 #checking length of data, number of times length(unique(getData(fit_lgmodeltimes)$time))
## [1] 60

Using the new model objects with extented data, we can examine what power looks like all the way up to N = 200.

Running power curve analysis for a range of sample sizes - persons (each with T = 16 occasions). Specific sample sizes chosen so that the total number of observations, N**T*, is near 500, 1000, 1500, 2000, 2500, and 3000.

#running power curve
powerCurve_persons <-powerCurve(fit_lgmodelpersons,
fixed("time01:treatment", "lr"),
along = "id",
breaks = c(32,63,94,125,157,188),
nsim=100,alpha=.05, progress=FALSE)
## Warning in observedPowerWarning(sim): This appears to be an "observed
## power" calculation
#examining power estiamtes
summary(powerCurve_persons)
##   nrow nlevels successes trials mean     lower     upper
## 1  512      32        28    100 0.28 0.1947936 0.3786670
## 2 1008      63        54    100 0.54 0.4374116 0.6401566
## 3 1504      94        76    100 0.76 0.6642645 0.8397754
## 4 2000     125        87    100 0.87 0.7879593 0.9289270
## 5 2512     157        92    100 0.92 0.8484424 0.9648284
## 6 3008     188        98    100 0.98 0.9296161 0.9975687
#plot(powerCurve_persons)

We see the power estimates as each sample size, and how power increases as number of persons increases.

Using the new model objects with extented data, we can also examine what power looks like all the way up to T = 60.

Running power curve analysis for a range of sample sizes - occasions (increasing number of occasions within person). Specific sample sizes chosen so that the total number of observations, N**T*, is near 500, 1000, 1500, 2000, 2500, and 3000.

#running power curve
powerCurve_times <-powerCurve(fit_lgmodeltimes,
fixed("time01:treatment", "lr"),
within = "id",
breaks = c(10,20,30,40,50,60),
nsim=100,alpha=.05, progress=FALSE)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## Warning in observedPowerWarning(sim): This appears to be an "observed
## power" calculation
#examining power estiamtes
summary(powerCurve_times)
##   nrow nlevels successes trials mean     lower     upper
## 1  500      10        29    100 0.29 0.2035742 0.3892660
## 2 1000      20        49    100 0.49 0.3886442 0.5919637
## 3 1500      30        51    100 0.51 0.4080363 0.6113558
## 4 2000      40        54    100 0.54 0.4374116 0.6401566
## 5 2500      50        59    100 0.59 0.4871442 0.6873800
## 6 3000      60        60    100 0.60 0.4972092 0.6967052
#plot(powerCurve_times)

We see the power estimates as each sample size, and how power does not increase very much as the number of occasions increases.

Of course, power curves are often useful in visual form. It takes a few steps to get them into a form useful for ggplot().

Prepare the summary outputs for plotting (Data Management).

#preparing persons power curve data
powerdata_persons <- as.data.frame(summary(powerCurve_persons))
powerdata_persons$persons <- powerdata_persons$nlevels
powerdata_persons$times <- 16 powerdata_persons$design <- "persons"

#preparing times power curve data
powerdata_times <- as.data.frame(summary(powerCurve_times))
powerdata_times$persons <-50 powerdata_times$times <- powerdata_times$nlevels powerdata_times$design <- "times"

#merging together
powerdata_all <- rbind(powerdata_persons, powerdata_times)

Plotting power curves

#Plotting curves
ggplot(data = powerdata_all,
aes(y = mean, x=nrow, group=factor(design))) +
#power goal
geom_hline(yintercept=0.8, color="gray40",lty = 2) +
#persons curve
geom_line(color="black") +
geom_errorbar(ymin=powerdata_all$lower,ymax=powerdata_all$upper, width=50, color="black") +
geom_point(aes(shape=factor(powerdata_all$design)),size=3, color="black", show.legend = FALSE) + scale_shape_manual(values=c(15,17)) + scale_x_continuous(limits=c(500,3100), breaks=c(500,1000,1500,2000,2500,3000), name="Number of Total Observations") + scale_y_continuous(limits=c(0,1.0), breaks=c(0,.2,.4,.6,.8,1.0), name="Power") + theme_classic() + #theme(axis.title=element_text(size=14), # axis.text=element_text(size=14), # plot.title=element_text(size=14, hjust=.5)) + ggtitle("Power Curves for time01:treatment interaction") Nice! We can see that power increases with larger sample size (squares), but not so much with additional occasions (triangles). This is the same as the Figure 10.1 in B & L. Readers are referred to the book for more in-depth discussion of the implications, but the general notion is that more level-2 units (here, persons) is better. # B. Power Analysis for the Within-person Process Example This example makes use of the Process data set from B & L Chapter 5. Loading the Process data (N = 66, T = 28). #set filepath for data file filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/BLdata_process.csv" #read in the .csv file using the url() function data_process <- read.csv(file=url(filepath),header=TRUE) Describe the data. #describing the data describe(data_process) ## vars n mean sd median trimmed mad min max range ## id 1 1848 33.50 19.06 33.50 33.50 24.46 1.00 66.00 65.00 ## time 2 1848 13.50 8.08 13.50 13.50 10.38 0.00 27.00 27.00 ## time7c 3 1848 0.00 1.15 0.00 0.00 1.48 -1.93 1.93 3.86 ## intimacy 4 1848 4.82 2.28 4.84 4.83 2.34 0.00 10.00 10.00 ## conflict 5 1848 0.22 0.42 0.00 0.15 0.00 0.00 1.00 1.00 ## confc 6 1848 0.00 0.42 -0.22 -0.07 0.00 -0.22 0.78 1.00 ## confcb 7 1848 0.00 0.16 -0.04 -0.02 0.16 -0.22 0.42 0.64 ## confcw 8 1848 0.00 0.38 -0.11 -0.04 0.16 -0.64 0.96 1.61 ## relqual 9 1848 0.61 0.49 1.00 0.63 0.00 0.00 1.00 1.00 ## skew kurtosis se ## id 0.00 -1.20 0.44 ## time 0.00 -1.21 0.19 ## time7c 0.00 -1.21 0.03 ## intimacy -0.02 -0.50 0.05 ## conflict 1.34 -0.20 0.01 ## confc 1.34 -0.20 0.01 ## confcb 0.75 -0.35 0.00 ## confcw 1.04 0.04 0.01 ## relqual -0.43 -1.81 0.01 #number of persons length(unique(data_process$id))
## [1] 66
#number of occasions
length(unique(data_process$time)) ## [1] 28 #crossing of persons and occasions #xtabs(~id+time, data=data_process) We see that there are 66 persons with 28 occasions each (coded 0 to 27). The data are fully complete. ## Running the multilevel model with the available data. We use the lmer() function in the lme4 library, as the simr power analysis package is built to work with lmer(). Running multilevel using lmer(). #Run multilevel model fit_cpmodel <- lmer(intimacy ~ 1 + relqual + confcb + relqual:confcb + confcw + confcw:relqual + time7c + (1 + confcw | id), data=data_process, na.action = na.exclude) #examine model summary summary(fit_cpmodel) ## Linear mixed model fit by REML ['lmerMod'] ## Formula: ## intimacy ~ 1 + relqual + confcb + relqual:confcb + confcw + confcw:relqual + ## time7c + (1 + confcw | id) ## Data: data_process ## ## REML criterion at convergence: 7817.9 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.0095 -0.6767 -0.0045 0.6714 2.9160 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## id (Intercept) 0.8005 0.8947 ## confcw 2.6993 1.6430 0.27 ## Residual 3.5908 1.8949 ## Number of obs: 1848, groups: id, 66 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 4.53208 0.22129 20.480 ## relqual 0.64735 0.28209 2.295 ## confcb -0.84163 1.10421 -0.762 ## confcw -2.02444 0.37063 -5.462 ## time7c -0.02820 0.03865 -0.730 ## relqual:confcb 2.52633 1.68176 1.502 ## relqual:confcw 1.03382 0.49235 2.100 ## ## Correlation of Fixed Effects: ## (Intr) relqul confcb confcw time7c rlql:cnfcb ## relqual -0.784 ## confcb -0.520 0.408 ## confcw 0.166 -0.131 0.036 ## time7c 0.001 -0.001 -0.001 -0.008 ## relql:cnfcb 0.341 -0.038 -0.657 -0.024 -0.002 ## relql:cnfcw -0.125 0.178 -0.027 -0.753 0.005 0.040 The results are the very similar to those in the book. As before, we can simulate power directly from the fitted model using the powerSim() function. We supply the function with the fitted model object (fit_cpmodel), the specific parameter we would like to examine (“relqual:confcw”) and test to use (“lr”), how many simulations we would like (nsim=100) and what alpha level to use in the evaluations of significance (alpha=.05). Typically, power ananlysis simulations use 1000 or more simulations. For time savings here, we only use 100. This time, we just evaluate the main parameter of interest. Simulating power directly from fitted model for the relqual:confcw interaction effect. #Power for the effect of interest SimPower_Direct<-powerSim(fit_cpmodel,test=fixed("relqual:confcw","lr"), seed=1234, #set for replication nsim=100, alpha=.05, progress=FALSE) #options ## Warning in observedPowerWarning(sim): This appears to be an "observed ## power" calculation #examine post-hoc power SimPower_Direct ## Power for predictor 'relqual:confcw', (95% confidence interval): ## 58.00% (47.71, 67.80) ## ## Test: Likelihood ratio ## Effect size for relqual:confcw is 1.0 ## ## Based on 100 simulations, (20 warnings, 0 errors) ## alpha = 0.05, nrow = 1848 ## ## Time elapsed: 0 h 0 m 20 s ## ## nb: result might be an observed power calculation We see that with an effect size of 1.0, power for the relqual:confcw interaction is 58%, just as was found in the book. Given low power, we would like to evaluate what is the power to detect the interaction at different sample sizes - obtained by either adding more persons or adding more occasions within person. To evaluate sample sizes that are larger than the original data, we first “extend” the data by creating a new lmer() object. Specifically, we use the extend() function in the simr package. We do that by extending the data that is inside the lmer() model object to include more participants or more occasions. Extending the number of persons from 66 to 200. #extending number of persons fit_cpmodelpersons <- extend(fit_cpmodel, along="id", n=200) #checking length of data, number of persons length(unique(getData(fit_cpmodelpersons)$id))
## [1] 200
#checking length of data, number of times
length(unique(getData(fit_cpmodelpersons)$time)) ## [1] 28 Extending the number of occasions from 28 to 70. #extending number of occasions fit_cpmodeltimes <- extend(fit_cpmodel, along="time", n=70) #checking length of data, number of persons length(unique(getData(fit_cpmodeltimes)$id))
## [1] 66
#checking length of data, number of times
length(unique(getData(fit_cpmodeltimes)$time)) ## [1] 70 Using the new model objects with extented data, we can examine what power looks like all the way up to N = 165. Running power curve analysis for a range of sample sizes - persons (each with T = 16 occasions). #running power curve powerCurve_persons <-powerCurve(fit_cpmodelpersons, fixed("relqual:confcw", "lr"), along = "id", breaks = c(35,66,83,95,118,142,165), nsim=100,alpha=.05, progress=FALSE) ## Warning in observedPowerWarning(sim): This appears to be an "observed ## power" calculation #examining power estiamtes summary(powerCurve_persons) ## nrow nlevels successes trials mean lower upper ## 1 980 35 31 100 0.31 0.2212888 0.4103146 ## 2 1848 66 40 100 0.40 0.3032948 0.5027908 ## 3 2324 83 52 100 0.52 0.4177898 0.6209945 ## 4 2660 95 57 100 0.57 0.4671337 0.6686090 ## 5 3304 118 69 100 0.69 0.5896854 0.7787112 ## 6 3976 142 73 100 0.73 0.6319837 0.8139336 ## 7 4620 165 84 100 0.84 0.7532124 0.9056897 #plot(powerCurve_persons) We see the power estimates as each sample size, and how power increases as number of persons increases. Using the new model objects with extented data, we can also examine what power looks like all the way up to T = 60. Running power curve analysis for a range of sample sizes - occasions (increasing number of occasions within person). Specific sample sizes chosen so that the total number of observations, N**T*, is near 500, 1000, 1500, 2000, 2500, and 3000. #running power curve powerCurve_times <-powerCurve(fit_cpmodeltimes, fixed("relqual:confcw", "lr"), within = "id", breaks = c(15,28,35,40,50,60,70), nsim=100,alpha=.05, progress=FALSE) ## Warning in observedPowerWarning(sim): This appears to be an "observed ## power" calculation #examining power estiamtes summary(powerCurve_times) ## nrow nlevels successes trials mean lower upper ## 1 990 15 53 100 0.53 0.4275815 0.6305948 ## 2 1848 28 57 100 0.57 0.4671337 0.6686090 ## 3 2310 35 59 100 0.59 0.4871442 0.6873800 ## 4 2640 40 58 100 0.58 0.4771192 0.6780145 ## 5 3300 50 62 100 0.62 0.5174607 0.7152325 ## 6 3960 60 65 100 0.65 0.5481506 0.7427062 ## 7 4620 70 65 100 0.65 0.5481506 0.7427062 #plot(powerCurve_times) We see the power estimates as each sample size, and how power does not increase very much as the number of occasions increases. Of course, power curves are often useful in visual form. It takes a few steps to get them into a form useful for ggplot(). Prepare the summary outputs for plotting (Data Management). #preparing persons power curve data powerdata_persons <- as.data.frame(summary(powerCurve_persons)) powerdata_persons$persons <- powerdata_persons$nlevels powerdata_persons$times <- 28
powerdata_persons$design <- "persons" #preparing times power curve data powerdata_times <- as.data.frame(summary(powerCurve_times)) powerdata_times$persons <-66
powerdata_times$times <- powerdata_times$nlevels
powerdata_times$design <- "times" #merging together powerdata_all <- rbind(powerdata_persons, powerdata_times) Plotting power curves #Plotting curves ggplot(data = powerdata_all, aes(y = mean, x=nrow, group=factor(design))) + #power goal geom_hline(yintercept=0.8, color="gray40",lty = 2) + #persons curve geom_line(color="black") + geom_errorbar(ymin=powerdata_all$lower,ymax=powerdata_all$upper, width=50, color="black") + geom_point(aes(shape=factor(powerdata_all$design)),size=3, color="black", show.legend = FALSE) +
scale_shape_manual(values=c(15,17)) +
scale_x_continuous(limits=c(950,5000),
breaks=c(990,1848,2320,2640,3300,3960,4620),
name="Number of Total Observations") +
scale_y_continuous(limits=c(0,1.0),
breaks=c(0,.2,.4,.6,.8,1.0),
name="Power") +
theme_classic() +
#theme(axis.title=element_text(size=14),
#      axis.text=element_text(size=14),
#      plot.title=element_text(size=14, hjust=.5)) +
ggtitle("Power Curves for relqual:confcw interaction")

Nice! We can see that power increases with larger sample size (squares), but not so much with additional occasions (triangles). This is the same as the Figure 10.2 in B & L. Readers are referred to the book for more in-depth discussion of the implications, but the general notion is that more level-2 units (here, persons) is better.

# C. Power Analysis for the Categorical Outcomes Example

This example makes use of the Categorical data set from B & L Chapter 6.

#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/BLdata_categorical.csv"
#read in the .csv file using the url() function
data_categorical <- read.csv(file=url(filepath),header=TRUE)

Describe the data.

#describing the data
describe(data_categorical)
##         vars    n  mean    sd median trimmed   mad   min    max  range
## id         1 1345 45.61 30.19  38.00   44.44 37.06  1.00 102.00 101.00
## time       2 1345 14.66  7.76  14.00   14.60 10.38  2.00  28.00  26.00
## time7c     3 1345  0.00  1.15  -0.10   -0.01  1.54 -1.88   1.98   3.85
## pconf      4 1345  0.14  0.35   0.00    0.06  0.00  0.00   1.00   1.00
## lpconf     5 1345  0.16  0.36   0.00    0.07  0.00  0.00   1.00   1.00
## lpconfc    6 1345  0.00  0.36  -0.16   -0.09  0.00 -0.16   0.84   1.00
## amang      7 1345  0.49  1.11   0.00    0.22  0.00  0.00  10.00  10.00
## amangc     8 1345  0.00  1.11  -0.49   -0.27  0.00 -0.49   9.51  10.00
## amangcb    9 1345  0.00  0.49  -0.18   -0.10  0.27 -0.47   1.61   2.08
## amangcw   10 1345  0.00  1.00  -0.17   -0.15  0.27 -2.10   9.40  11.50
##         skew kurtosis   se
## id      0.28    -1.29 0.82
## time    0.06    -1.18 0.21
## time7c  0.06    -1.18 0.03
## pconf   2.02     2.09 0.01
## lpconf  1.88     1.55 0.01
## lpconfc 1.88     1.55 0.01
## amang   3.96    21.05 0.03
## amangc  3.96    21.05 0.03
## amangcb 1.72     2.32 0.01
## amangcw 3.78    22.51 0.03
#number of persons
length(unique(data_categorical$id)) ## [1] 61 #number of occasions length(unique(data_categorical$time))
## [1] 27
#crossing of persons and occasions
head(xtabs(~id+time, data=data_categorical),10) #just top of output
##     time
## id   2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
##   1  1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
##   3  1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
##   5  1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  0
##   6  0 1 1 1 1 1 1 0  0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
##   9  1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
##   10 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
##   11 1 1 1 1 1 1 1 0  0  1  1  1  1  1  1  1  0  0  1  1  1  1  1  1  1  1
##   12 1 1 1 1 1 1 1 1  0  0  0  0  1  1  1  1  1  1  1  1  1  1  1  1  1  1
##   13 0 0 0 0 0 0 1 1  1  1  1  1  1  0  0  1  1  1  1  1  1  1  1  1  1  1
##   14 1 1 1 1 1 1 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##     time
## id   28
##   1   1
##   3   1
##   5   0
##   6   1
##   9   1
##   10  1
##   11  1
##   12  1
##   13  1
##   14  0

We see that there are 61 persons with 27 occasions (coded 2 to 28) - and that the data are not fully complete.

Examine how many observations per person.

#calculating intraindividual stats (counts)
istats <- ddply(data_categorical, "id", summarize,
icount_pconf = sum(!is.na(pconf)),  #count of observations
icount_amang = sum(!is.na(amang)))
describe(istats)
##              vars  n  mean    sd median trimmed   mad min max range  skew
## id              1 61 46.30 29.86     42   45.24 40.03   1 102   101  0.26
## icount_pconf    2 61 22.05  6.52     25   23.35  2.97   5  27    22 -1.43
## icount_amang    3 61 22.05  6.52     25   23.35  2.97   5  27    22 -1.43
##              kurtosis   se
## id              -1.32 3.82
## icount_pconf     0.88 0.84
## icount_amang     0.88 0.84
xtabs(~icount_pconf, data=istats)
## icount_pconf
##  5  6 11 12 13 14 15 17 19 20 21 22 23 24 25 26 27
##  2  3  1  1  2  1  1  1  2  1  2  2  7  2 10  2 21

We see that on average individual completed 22 occasions.

Running the multilevel model with the available data.

We use the lmer() function in the lme4 library, as the simr power analysis package is built to work with lmer().

Running multilevel using lmer().

#Run multilevel model
fit_catmodel <- glmer(pconf ~ 1 + amangcw + amangcb + lpconfc + time7c +
(1 | id),
family=binomial,
data=data_categorical,
na.action = na.exclude)
#examine model summary
summary(fit_catmodel)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: pconf ~ 1 + amangcw + amangcb + lpconfc + time7c + (1 | id)
##    Data: data_categorical
##
##      AIC      BIC   logLik deviance df.resid
##   1084.8   1116.0   -536.4   1072.8     1339
##
## Scaled residuals:
##     Min      1Q  Median      3Q     Max
## -0.7981 -0.4224 -0.3533 -0.2893  4.5619
##
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.2475   0.4975
## Number of obs: 1345, groups:  id, 61
##
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.90161    0.10894 -17.456  < 2e-16 ***
## amangcw      0.21565    0.06738   3.200  0.00137 **
## amangcb     -0.22437    0.22398  -1.002  0.31649
## lpconfc      0.31826    0.20892   1.523  0.12767
## time7c      -0.19222    0.07073  -2.718  0.00657 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
##         (Intr) amngcw amngcb lpcnfc
## amangcw -0.108
## amangcb  0.050 -0.103
## lpconfc  0.006 -0.083  0.052
## time7c   0.132 -0.004 -0.027  0.102

The results are the very similar to the Mplus results reported in the book.

As before, we can simulate power directly from the fitted model using the powerSim() function. We supply the function with the fitted model object (fit_catmodel), the specific parameter we would like to examine (“amangcw”) and test to use (“lr”), how many simulations we would like (nsim=100) and what alpha level to use in the evaluations of significance (alpha=.05).

Typically, power ananlysis simulations use 1000 or more simulations. For time savings here, we only use 100.

This time, we just evaluate the main parameter of interest.

Simulating power directly from fitted model for the amangcw effect.

#Power for the effect of interest
SimPower_Direct<-powerSim(fit_catmodel,test=fixed("amangcw","lr"),
seed=1234, #set for replication
nsim=100, alpha=.05, progress=FALSE) #options
## Warning in observedPowerWarning(sim): This appears to be an "observed
## power" calculation
#examine post-hoc power
SimPower_Direct
## Power for predictor 'amangcw', (95% confidence interval):
##       79.00% (69.71, 86.51)
##
## Test: Likelihood ratio
##       Effect size for amangcw is 0.22
##
## Based on 100 simulations, (0 warnings, 0 errors)
## alpha = 0.05, nrow = 1345
##
## Time elapsed: 0 h 0 m 55 s
##
## nb: result might be an observed power calculation

We see that with an effect size of 0.22, power for the amangcw effect is 79%. Note that since the data object we are using has the missing data in it, we are already doing the power anaysis with, on average, 22 occcasions (i.e., we do not need to reduce to get that analysis, as B & L do in Chapter 10.5).

As is done in the book, we can also test power for the variance term for the intercept. We use the compare function and specify the model to compare against, which in this case is a model without any random effects. Simulating power directly from fitted model for the random effect.

#Power for the single random effect
SimPower_variance<-powerSim(fit_catmodel,test=random(), #empty becasue only one
seed=1234, #set for replication
nsim=100, alpha=.05, progress=FALSE) #options
## Warning in observedPowerWarning(sim): This appears to be an "observed
## power" calculation
#examine post-hoc power of random effect
SimPower_variance
## Power for a single random effect, (95% confidence interval):
##        0.00% ( 0.00,  3.62)
##
## Test: Exact restricted LRT (package RLRsim)
##
## Based on 100 simulations, (0 warnings, 100 errors)
## alpha = 0.05, nrow = 1345
##
## Time elapsed: 0 h 0 m 33 s
##
## nb: result might be an observed power calculation

Note that we get something a bit different from the book here. We are not yet sure why, but are exciterd about the prospects of also being able to examine power for random effects as this is where more occasions may be relevant.

## Conclusion

This tutorial covered some basics for conducting power analysis for multilevel model in R using the simr package. We fit models using “pilot” data and then extended those data in order to examine power across a range of sample sizes - persons and occasions. There is also facility for construting a model and power analysis from scratch (prior to having data for analysis). We shall look to cover that in a future tutorial.

Thanks for powering through with us!