Overview

Individuals, at any given moment, are a complex configuration of characteristics. Some of these characteristics are relatively labile, and some are relatively stable. A general question is: How, when, and why do the many characteristics of an individual change or covary over time? Inherently this question is multivariate (many characteristics), requires time-series data (change over time), and focused on within-person covariation (single-subject). Previously we have examined intraindividual covariation of two or three variables. Extension to multiple (many) variables is relatively straightforward, but require a “multi-outcome” framework. In this tutorial, we engage with how Factor Analysis (a basic data reduction technique) is used to examine multivariate intraindividual variability - P-technique Factor Analsyis.

Outline

  1. Introduction to The P-technique Factor Analysis Model

  2. An Example using Exploratory P-technique Factor Analysis

  3. An Example using Confirmatory P-technique Factor Analysis

  4. Conclusion

Prelim - Loading libraries used in this script.

library(psych)
library(ggplot2)
library(corrplot) #plotting correlation matrices
library(GPArotation) #methods for factor rotation
library(nFactors)  #methods for determining the number of factors
library(lavaan)  #for fitting structural equation models
library(semPlot)  #for automatically making diagrams 

1. Introduction to the Factor Analyis Model

The basic factor analysis model is written as

\[y_{pi} = \lambda_{pq} f_{qi} + u_{pi}\] where \(y_{pi}\) is individual i’s score on the pth observed variable, \(f_{qi}\) is individual i’s score on the qth latent common factor, \(u_{pi}\) is individual i’s score on the pth latent unique factor, and \(\lambda_{pq}\) is the factor loading that indicates the relation between the pth observed variable and the qth latent common factor.

Typically, we have multiple observed variables and one or more common factors. For instance in the 6 variable, 2 factor case we would have …

\[y_{1i} = \lambda_{11} f_{1i} + \lambda_{12} f_{2i} + u_{1i}\] \[y_{2i} = \lambda_{21} f_{1i} + \lambda_{22} f_{2i} + u_{2i}\] \[y_{3i} = \lambda_{31} f_{1i} + \lambda_{32} f_{2i} + u_{3i}\] \[y_{4i} = \lambda_{41} f_{1i} + \lambda_{42} f_{2i} + u_{4i}\] \[y_{5i} = \lambda_{51} f_{1i} + \lambda_{52} f_{2i} + u_{5i}\] \[y_{6i} = \lambda_{61} f_{1i} + \lambda_{62} f_{2i} + u_{6i}\] which can be written in a compact matrix form as

\[ \boldsymbol{Y_{i}} = \boldsymbol{\Lambda}\boldsymbol{F_{i}} + \boldsymbol{U_{i}} \] where \(\boldsymbol{Y_{i}}\) is a \(p\) x 1 vector of observed variable scores, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{F_{i}}\) is a \(q\) x 1 vector of common factor scores, and \(\boldsymbol{U_{i}}\) is a p x 1 vector of unique factor scores.

Extension to multiple persons provided for mapping to the observed correlation matrix, \(\boldsymbol{\Sigma} = \boldsymbol{Y}'\boldsymbol{Y}\) and the common factor model becomes

\[ \boldsymbol{\Sigma} = \boldsymbol{\Lambda}\boldsymbol{\Psi}\boldsymbol{\Lambda}' + \boldsymbol{\Theta} \] where \(\boldsymbol{\Sigma}\) is a p x p covariance (or correlation) matrix of the observed variables, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{\Psi}\) is a q x q covariance matrix of the latent factor variables, and \(\boldsymbol{\Theta}\) is a diagonal matrix of unique factor variances.

The analysis is focused around modeling of \(\boldsymbol{\Sigma}\), a covariance (or correlation matrix). Traditionally, factor analysis is conceived as analysis of interindividual differences and analysis of an R-slice (persons x variables x 1 occasion) of the data box. Alternatively, the \(\boldsymbol{\Sigma}\), a covariance (or correlation matrix), can be made from a P-slice (occasions x variables x 1 person), such that the factor analysis becomes an analysis of intraindividual differences and within-person covariation.

In all of the above, we simply replace the i subscript with at t.

2. An Example using Exploratory P-technique Factor Analysis

For our examples we use a small 2-person data simulated set modeled after “The Lebo Data” used in the Brose and Ram (2012) step-by-step guide to P-technique.

Prelim - Reading in Repeated Measures Data

There are 2 samples in the data - designated by id.

#Reading the data from web location
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/ptechdata.csv"
#read in the .csv file using the url() function
pdat <- read.csv(file=url(filepath),header=TRUE)

Lets split the data for the two persons into separate data frames

#splitting data
pdat1 <- pdat[pdat$id==1, ] 
pdat2 <- pdat[pdat$id==2, ]

P-technique Factor Analysis for Person 1

Lets have a look at Person 1’s data and descriptives.

#data structure
head(pdat1,10)
##    id    v1    v2    v3    v4    v5    v6    v7    v8    v9
## 1   1 -0.84 -0.27 -0.59 -0.97 -0.72 -1.86  1.11  0.46  0.09
## 2   1 -0.04 -0.33 -0.54  2.57  1.30  0.59  1.42  2.56  0.66
## 3   1  0.34 -0.23  0.99 -1.84 -1.62 -1.55  1.05  0.22  0.60
## 4   1 -1.01 -1.96 -1.17 -2.93 -1.75 -1.89 -0.51 -0.07  1.14
## 5   1  2.02  0.21  0.97  0.13 -0.01 -0.88 -1.22  0.54 -0.28
## 6   1 -0.24 -0.19  0.23  0.80  1.06  1.61  1.31 -1.10  0.26
## 7   1 -0.61 -1.37  0.15 -0.68 -1.50 -0.44 -1.63  1.40  1.79
## 8   1 -1.26 -2.59 -1.10 -0.40 -0.66  0.29 -1.45 -1.06  0.15
## 9   1  0.23  0.25 -0.21  0.23 -0.42  1.01  0.15  1.16 -0.36
## 10  1  1.01 -0.08  1.95  1.14 -0.06 -0.48 -0.62 -1.27 -0.08
#descriptives (without id column)
describe(pdat1[,-1])
##    vars   n  mean   sd median trimmed  mad   min  max range  skew kurtosis
## v1    1 100  0.00 0.95   0.02    0.00 0.96 -2.33 2.44  4.77  0.04    -0.41
## v2    2 100 -0.14 0.97  -0.11   -0.12 0.90 -2.59 2.07  4.66 -0.21    -0.21
## v3    3 100  0.04 0.91   0.09    0.08 0.90 -3.11 2.10  5.21 -0.46     0.72
## v4    4 100 -0.13 0.93  -0.06   -0.11 0.93 -2.93 2.57  5.50 -0.15     0.21
## v5    5 100 -0.18 0.85  -0.29   -0.20 0.83 -2.04 1.75  3.79  0.25    -0.31
## v6    6 100 -0.01 0.98   0.06    0.01 0.81 -2.37 2.24  4.61 -0.21    -0.34
## v7    7 100  0.12 0.94   0.12    0.12 1.02 -1.91 2.14  4.05  0.04    -0.68
## v8    8 100 -0.03 1.04   0.00   -0.03 0.93 -2.58 2.56  5.14 -0.05    -0.32
## v9    9 100  0.17 0.96   0.26    0.18 0.93 -2.54 2.47  5.01 -0.20    -0.17
##      se
## v1 0.10
## v2 0.10
## v3 0.09
## v4 0.09
## v5 0.09
## v6 0.10
## v7 0.09
## v8 0.10
## v9 0.10

Data appear to already be in standardized form.

Lets make a plot of the 9-variate raw time-series.

#preparing data
#adding time variable
day <- 1:100
pdat1_plot <- cbind(pdat1$id,day,pdat1[,-1])
pdat2_plot <- cbind(pdat2$id,day,pdat2[,-1])
#Plotting observed scores
ggplot(data=pdat1_plot, aes(x=day)) +
  geom_line(aes(y=v1), color= 1) + 
  geom_line(aes(y=v2), color= 2) + 
  geom_line(aes(y=v3), color= 3) + 
  geom_line(aes(y=v4), color= 4) + 
  geom_line(aes(y=v5), color= 5) + 
  geom_line(aes(y=v6), color= 6) + 
  geom_line(aes(y=v7), color= 7) + 
  geom_line(aes(y=v8), color= 8) + 
  geom_line(aes(y=v9), color= 9) +
  xlab("Day") + ylab("Observed Score") + 
  scale_x_continuous(limits=c(0,100)) +
  scale_y_continuous(limits=c(-3,3)) +
  ggtitle("Person 1: Multivariate Time Series")