Overview

Individuals, at any given moment, are a complex configuration of characteristics. Some of these characteristics are relatively labile, and some are relatively stable. A general question is: How, when, and why do the many characteristics of an individual change or covary over time? Inherently this question is multivariate (many characteristics), requires time-series data (change over time), and focused on within-person covariation (single-subject). Previously we have examined intraindividual covariation of two or three variables. Extension to multiple (many) variables is relatively straightforward, but require a “multi-outcome” framework. In this tutorial, we engage with how Factor Analysis (a basic data reduction technique) is used to examine multivariate intraindividual variability - P-technique Factor Analsyis.

Outline

  1. Introduction to The P-technique Factor Analysis Model

  2. An Example using Exploratory P-technique Factor Analysis

  3. An Example using Confirmatory P-technique Factor Analysis

  4. Conclusion

Prelim - Loading libraries used in this script.

library(psych)
library(ggplot2)
library(corrplot) #plotting correlation matrices
library(GPArotation) #methods for factor rotation
library(nFactors)  #methods for determining the number of factors
library(lavaan)  #for fitting structural equation models
library(semPlot)  #for automatically making diagrams 

1. Introduction to the Factor Analyis Model

The basic factor analysis model is written as

\[y_{pi} = \lambda_{pq} f_{qi} + u_{pi}\] where \(y_{pi}\) is individual i’s score on the pth observed variable, \(f_{qi}\) is individual i’s score on the qth latent common factor, \(u_{pi}\) is individual i’s score on the pth latent unique factor, and \(\lambda_{pq}\) is the factor loading that indicates the relation between the pth observed variable and the qth latent common factor.

Typically, we have multiple observed variables and one or more common factors. For instance in the 6 variable, 2 factor case we would have …

\[y_{1i} = \lambda_{11} f_{1i} + \lambda_{12} f_{2i} + u_{1i}\] \[y_{2i} = \lambda_{21} f_{1i} + \lambda_{22} f_{2i} + u_{2i}\] \[y_{3i} = \lambda_{31} f_{1i} + \lambda_{32} f_{2i} + u_{3i}\] \[y_{4i} = \lambda_{41} f_{1i} + \lambda_{42} f_{2i} + u_{4i}\] \[y_{5i} = \lambda_{51} f_{1i} + \lambda_{52} f_{2i} + u_{5i}\] \[y_{6i} = \lambda_{61} f_{1i} + \lambda_{62} f_{2i} + u_{6i}\] which can be written in a compact matrix form as

\[ \boldsymbol{Y_{i}} = \boldsymbol{\Lambda}\boldsymbol{F_{i}} + \boldsymbol{U_{i}} \] where \(\boldsymbol{Y_{i}}\) is a \(p\) x 1 vector of observed variable scores, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{F_{i}}\) is a \(q\) x 1 vector of common factor scores, and \(\boldsymbol{U_{i}}\) is a p x 1 vector of unique factor scores.

Extension to multiple persons provided for mapping to the observed correlation matrix, \(\boldsymbol{\Sigma} = \boldsymbol{Y}'\boldsymbol{Y}\) and the common factor model becomes

\[ \boldsymbol{\Sigma} = \boldsymbol{\Lambda}\boldsymbol{\Psi}\boldsymbol{\Lambda}' + \boldsymbol{\Theta} \] where \(\boldsymbol{\Sigma}\) is a p x p covariance (or correlation) matrix of the observed variables, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{\Psi}\) is a q x q covariance matrix of the latent factor variables, and \(\boldsymbol{\Theta}\) is a diagonal matrix of unique factor variances.

The analysis is focused around modeling of \(\boldsymbol{\Sigma}\), a covariance (or correlation matrix). Traditionally, factor analysis is conceived as analysis of interindividual differences and analysis of an R-slice (persons x variables x 1 occasion) of the data box. Alternatively, the \(\boldsymbol{\Sigma}\), a covariance (or correlation matrix), can be made from a P-slice (occasions x variables x 1 person), such that the factor analysis becomes an analysis of intraindividual differences and within-person covariation.

In all of the above, we simply replace the i subscript with at t.

2. An Example using Exploratory P-technique Factor Analysis

For our examples we use a small 2-person data simulated set modeled after “The Lebo Data” used in the Brose and Ram (2012) step-by-step guide to P-technique.

Prelim - Reading in Repeated Measures Data

There are 2 samples in the data - designated by id.

#Reading the data from web location
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/ptechdata.csv"
#read in the .csv file using the url() function
pdat <- read.csv(file=url(filepath),header=TRUE)

Lets split the data for the two persons into separate data frames

#splitting data
pdat1 <- pdat[pdat$id==1, ] 
pdat2 <- pdat[pdat$id==2, ]

P-technique Factor Analysis for Person 1

Lets have a look at Person 1’s data and descriptives.

#data structure
head(pdat1,10)
##    id    v1    v2    v3    v4    v5    v6    v7    v8    v9
## 1   1 -0.84 -0.27 -0.59 -0.97 -0.72 -1.86  1.11  0.46  0.09
## 2   1 -0.04 -0.33 -0.54  2.57  1.30  0.59  1.42  2.56  0.66
## 3   1  0.34 -0.23  0.99 -1.84 -1.62 -1.55  1.05  0.22  0.60
## 4   1 -1.01 -1.96 -1.17 -2.93 -1.75 -1.89 -0.51 -0.07  1.14
## 5   1  2.02  0.21  0.97  0.13 -0.01 -0.88 -1.22  0.54 -0.28
## 6   1 -0.24 -0.19  0.23  0.80  1.06  1.61  1.31 -1.10  0.26
## 7   1 -0.61 -1.37  0.15 -0.68 -1.50 -0.44 -1.63  1.40  1.79
## 8   1 -1.26 -2.59 -1.10 -0.40 -0.66  0.29 -1.45 -1.06  0.15
## 9   1  0.23  0.25 -0.21  0.23 -0.42  1.01  0.15  1.16 -0.36
## 10  1  1.01 -0.08  1.95  1.14 -0.06 -0.48 -0.62 -1.27 -0.08
#descriptives (without id column)
describe(pdat1[,-1])
##    vars   n  mean   sd median trimmed  mad   min  max range  skew kurtosis
## v1    1 100  0.00 0.95   0.02    0.00 0.96 -2.33 2.44  4.77  0.04    -0.41
## v2    2 100 -0.14 0.97  -0.11   -0.12 0.90 -2.59 2.07  4.66 -0.21    -0.21
## v3    3 100  0.04 0.91   0.09    0.08 0.90 -3.11 2.10  5.21 -0.46     0.72
## v4    4 100 -0.13 0.93  -0.06   -0.11 0.93 -2.93 2.57  5.50 -0.15     0.21
## v5    5 100 -0.18 0.85  -0.29   -0.20 0.83 -2.04 1.75  3.79  0.25    -0.31
## v6    6 100 -0.01 0.98   0.06    0.01 0.81 -2.37 2.24  4.61 -0.21    -0.34
## v7    7 100  0.12 0.94   0.12    0.12 1.02 -1.91 2.14  4.05  0.04    -0.68
## v8    8 100 -0.03 1.04   0.00   -0.03 0.93 -2.58 2.56  5.14 -0.05    -0.32
## v9    9 100  0.17 0.96   0.26    0.18 0.93 -2.54 2.47  5.01 -0.20    -0.17
##      se
## v1 0.10
## v2 0.10
## v3 0.09
## v4 0.09
## v5 0.09
## v6 0.10
## v7 0.09
## v8 0.10
## v9 0.10

Data appear to already be in standardized form.

Lets make a plot of the 9-variate raw time-series.

#preparing data
#adding time variable
day <- 1:100
pdat1_plot <- cbind(pdat1$id,day,pdat1[,-1])
pdat2_plot <- cbind(pdat2$id,day,pdat2[,-1])
#Plotting observed scores
ggplot(data=pdat1_plot, aes(x=day)) +
  geom_line(aes(y=v1), color= 1) + 
  geom_line(aes(y=v2), color= 2) + 
  geom_line(aes(y=v3), color= 3) + 
  geom_line(aes(y=v4), color= 4) + 
  geom_line(aes(y=v5), color= 5) + 
  geom_line(aes(y=v6), color= 6) + 
  geom_line(aes(y=v7), color= 7) + 
  geom_line(aes(y=v8), color= 8) + 
  geom_line(aes(y=v9), color= 9) +
  xlab("Day") + ylab("Observed Score") + 
  scale_x_continuous(limits=c(0,100)) +
  scale_y_continuous(limits=c(-3,3)) +
  ggtitle("Person 1: Multivariate Time Series")

The numerical basis for Factor Analysis is the covariance (or correlation) matrix. Let’s look at visual and numeric versions - keeping in mind our basix research question … Can these data be represented by a smaller number of factors?

#pairs panels
pairs.panels(pdat1[,-1])

#correlation matrix
round(cor(pdat1[,-1]),2)
##       v1    v2    v3   v4   v5    v6    v7    v8    v9
## v1  1.00  0.66  0.71 0.14 0.21  0.05 -0.02  0.07  0.03
## v2  0.66  1.00  0.61 0.13 0.16  0.16  0.03 -0.03  0.02
## v3  0.71  0.61  1.00 0.14 0.13  0.11 -0.12  0.00 -0.03
## v4  0.14  0.13  0.14 1.00 0.49  0.52  0.18  0.14  0.10
## v5  0.21  0.16  0.13 0.49 1.00  0.38  0.24  0.09  0.16
## v6  0.05  0.16  0.11 0.52 0.38  1.00  0.04  0.15 -0.03
## v7 -0.02  0.03 -0.12 0.18 0.24  0.04  1.00  0.40  0.29
## v8  0.07 -0.03  0.00 0.14 0.09  0.15  0.40  1.00  0.26
## v9  0.03  0.02 -0.03 0.10 0.16 -0.03  0.29  0.26  1.00
#correlation matrix plot from the 
corrplot(cor(pdat1[,-1]), order = "original", tl.col='black', tl.cex=.75) 

We see three “groups” of variables that are positively correlated. This gives us hope.

Traditionally, EFA was an analysis of a correlation matrix. Most programs now can also read the raw data directly. We make the correlation matrix to be explicit (and so that we know how that matrix was made - e.g, how the missing data were treated). #Store the correlation matrix of the data into an object

corpdat1 <- cor(pdat1[,-1], use="pairwise.complete.obs")
round(corpdat1,2)
##       v1    v2    v3   v4   v5    v6    v7    v8    v9
## v1  1.00  0.66  0.71 0.14 0.21  0.05 -0.02  0.07  0.03
## v2  0.66  1.00  0.61 0.13 0.16  0.16  0.03 -0.03  0.02
## v3  0.71  0.61  1.00 0.14 0.13  0.11 -0.12  0.00 -0.03
## v4  0.14  0.13  0.14 1.00 0.49  0.52  0.18  0.14  0.10
## v5  0.21  0.16  0.13 0.49 1.00  0.38  0.24  0.09  0.16
## v6  0.05  0.16  0.11 0.52 0.38  1.00  0.04  0.15 -0.03
## v7 -0.02  0.03 -0.12 0.18 0.24  0.04  1.00  0.40  0.29
## v8  0.07 -0.03  0.00 0.14 0.09  0.15  0.40  1.00  0.26
## v9  0.03  0.02 -0.03 0.10 0.16 -0.03  0.29  0.26  1.00

We can inform our choice of number of factors with a number of functions. We can use fa.parallel() in the psych package, or the nScree() function in the nScree package.

#parallel analysis for number of factors 
fa.parallel(x=corpdat1, fm="minres", fa="fa")
## Warning in fa.parallel(x = corpdat1, fm = "minres", fa = "fa"): It seems as
## if you are using a correlation matrix, but have not specified the number of
## cases. The number of subjects is arbitrarily set to be 100

## Parallel analysis suggests that the number of factors =  3  and the number of components =  NA
#multiple methods for number of factors
nScree(x=corpdat1,model="factors")
##   noc naf nparallel nkaiser
## 1   3   3         3       3
plot(nScree(x=corpdat1,model="factors"))

The output here is number of components/factors according to optimal coordinates (noc), acceleration factor (naf), parallel analysis (nparallel), and Kaiser rule (nkaiser).

For these data - everything points to choice of 3 factors.

Now lets run the factor analysis. This time we use the fa() in the psych package, with oblique oblimin rotation (rotate=“oblimin”) and principal factor extraction (fm=“pa”).

EFApdat1_3factor <- fa(r = corpdat1, nfactors = 3, 
                       rotate = "oblimin", 
                       fm = "pa")
EFApdat1_3factor
## Factor Analysis using method =  pa
## Call: fa(r = corpdat1, nfactors = 3, rotate = "oblimin", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##      PA1   PA2   PA3   h2   u2 com
## v1  0.90 -0.04  0.06 0.79 0.21 1.0
## v2  0.74  0.05  0.00 0.56 0.44 1.0
## v3  0.80  0.02 -0.10 0.66 0.34 1.0
## v4  0.00  0.78  0.04 0.62 0.38 1.0
## v5  0.09  0.54  0.15 0.38 0.62 1.2
## v6 -0.03  0.70 -0.10 0.46 0.54 1.0
## v7 -0.03  0.01  0.73 0.54 0.46 1.0
## v8  0.02  0.04  0.51 0.27 0.73 1.0
## v9  0.03 -0.03  0.45 0.20 0.80 1.0
## 
##                        PA1  PA2  PA3
## SS loadings           2.01 1.41 1.07
## Proportion Var        0.22 0.16 0.12
## Cumulative Var        0.22 0.38 0.50
## Proportion Explained  0.45 0.31 0.24
## Cumulative Proportion 0.45 0.76 1.00
## 
##  With factor correlations of 
##       PA1  PA2   PA3
## PA1  1.00 0.21 -0.02
## PA2  0.21 1.00  0.26
## PA3 -0.02 0.26  1.00
## 
## Mean item complexity =  1
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  36  and the objective function was  2.55
## The degrees of freedom for the model are 12  and the objective function was  0.15 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.06 
## 
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                 PA1  PA2  PA3
## Correlation of scores with factors             0.94 0.87 0.81
## Multiple R square of scores with factors       0.88 0.76 0.66
## Minimum correlation of possible factor scores  0.75 0.51 0.32

The solution looks pretty good.

Our objective here was data reduction. To make that explicit, we can obtain the factor scores by inputing the raw data matrix instead of the correlation matrix. Additionally, we can specify the ‘scores’ argument as below to get factor scores by a factor score regression method. To get estimated factor scores we must input the raw data.

EFApdat1_3factor <- fa(r = pdat1[,-1], nfactors = 3, 
                       rotate = "oblimin", 
                       fm = "pa",
                       scores="regression")
head(EFApdat1_3factor$scores,10)
##             PA1        PA2        PA3
## 1  -0.783420678 -1.1506737  0.5983783
## 2  -0.262502699  2.1004885  1.8211517
## 3   0.433115912 -1.7226209  0.3998862
## 4  -1.419724776 -2.6307726 -0.3797518
## 5   1.476386714 -0.1114002 -0.6023405
## 6   0.007833174  1.3577895  0.5681540
## 7  -0.614527322 -0.8404476 -0.5418789
## 8  -1.578440465 -0.3701222 -1.1402610
## 9   0.122523810  0.5020814  0.1907809
## 10  1.232518910  0.5499773 -0.7593508

NOTE: We do have factor score indeterminacy! So, be careful.

Lets look at the “reduced” data.

pairs.panels(EFApdat1_3factor$scores)

And we can look at the reduced 3-variate latent time-series.

#remaking plot data 
pdat1_plot <- cbind(pdat1$id,day,pdat1[,-1],EFApdat1_3factor$scores)
#Plotting factor scores
ggplot(data=pdat1_plot, aes(x=day)) +
  geom_line(aes(y=PA1), color= 1) + 
  geom_line(aes(y=PA2), color= 2) + 
  geom_line(aes(y=PA3), color= 3) + 
  xlab("Day") + ylab("Factor Score") + 
  scale_x_continuous(limits=c(0,100)) +
  scale_y_continuous(limits=c(-3,3)) + 
  ggtitle("Person A: Latent Factor Time Series")

But do not forget that there are also unique factors too. The factors only capture common variance. There may be quite a bit of stuff left over.

Remember that the variance accounted for by the 3 factors is only part of the total variance. Specifically, 0.499181, which we see in the main output

EFApdat1_3factor$Vaccounted[3,3]
## [1] 0.499181

All the other parts are in the uniquenesses and misfit of the correlations.

#Unique Factor Variances
round(EFApdat1_3factor$uniquenesses,3)
##    v1    v2    v3    v4    v5    v6    v7    v8    v9 
## 0.209 0.441 0.336 0.379 0.617 0.541 0.458 0.726 0.801
#Residual covariances (misfit)
round(EFApdat1_3factor$residual - diag(EFApdat1_3factor$uniquenesses),3) 
##        v1     v2     v3     v4     v5     v6     v7     v8     v9
## v1  0.000 -0.001 -0.005  0.008  0.035 -0.037 -0.026  0.027 -0.011
## v2 -0.001  0.000  0.008 -0.025 -0.017  0.043  0.048 -0.057  0.006
## v3 -0.005  0.008  0.000  0.012 -0.023  0.002 -0.018  0.035 -0.002
## v4  0.008 -0.025  0.012  0.000  0.014 -0.007 -0.001 -0.015  0.014
## v5  0.035 -0.017 -0.023  0.014  0.000 -0.006  0.027 -0.085  0.040
## v6 -0.037  0.043  0.002 -0.007 -0.006  0.000 -0.026  0.086 -0.047
## v7 -0.026  0.048 -0.018 -0.001  0.027 -0.026  0.000  0.018 -0.033
## v8  0.027 -0.057  0.035 -0.015 -0.085  0.086  0.018  0.000  0.025
## v9 -0.011  0.006 -0.002  0.014  0.040 -0.047 -0.033  0.025  0.000

P-technique Factor Analysis for Person 2

Lets have a look at Person 2’s data and descriptives.

#data structure
head(pdat2,10)
##     id    v1    v2    v3    v4    v5    v6    v7    v8    v9
## 101  2 -0.90 -1.27 -1.16 -1.24  0.33 -0.25 -0.19 -0.58 -0.14
## 102  2  0.19  0.06  0.55  0.04  0.07 -0.43 -0.55  0.60  0.65
## 103  2 -0.21  0.53  0.32  0.37  0.53  0.81  0.66  0.12  0.57
## 104  2 -1.84 -0.87 -2.02  0.92  0.00  0.39 -0.18  1.04  0.06
## 105  2  0.56  1.28 -0.32 -0.28  1.18  0.36 -0.89 -0.78 -0.51
## 106  2  0.50  0.50 -0.72  0.16  1.17 -0.39 -0.35 -1.52  0.50
## 107  2 -0.82 -1.02 -0.56 -0.15  0.86  1.05 -0.41  0.27  0.92
## 108  2 -0.24 -0.02  0.25  0.12 -0.74 -0.19 -0.47 -1.62  0.05
## 109  2  0.10 -1.63 -0.41  1.41  0.91  0.50  0.54  0.71  2.79
## 110  2  1.03  1.40  1.12 -0.17  0.53  0.28 -1.57 -0.34 -0.04
#descriptives (without id column)
describe(pdat2[,-1])
##    vars   n  mean   sd median trimmed  mad   min  max range  skew kurtosis
## v1    1 100  0.09 1.03   0.04    0.11 1.05 -2.31 2.14  4.45 -0.18    -0.41
## v2    2 100  0.10 1.04   0.22    0.14 0.90 -3.35 2.43  5.78 -0.44     0.41
## v3    3 100  0.09 1.09   0.04    0.09 1.05 -2.61 2.59  5.20  0.04    -0.38
## v4    4 100 -0.10 1.00  -0.05   -0.06 0.88 -2.86 2.20  5.06 -0.34     0.33
## v5    5 100  0.00 1.03   0.10    0.03 0.89 -4.43 2.98  7.41 -0.81     3.22
## v6    6 100  0.10 1.07   0.20    0.12 1.15 -2.84 3.33  6.17 -0.09     0.17
## v7    7 100  0.04 0.92   0.12    0.02 0.87 -2.15 2.44  4.59  0.11    -0.03
## v8    8 100  0.26 1.12   0.31    0.29 1.14 -2.87 2.78  5.65 -0.29    -0.25
## v9    9 100  0.03 1.02   0.04    0.01 1.02 -2.86 2.79  5.65  0.07     0.00
##      se
## v1 0.10
## v2 0.10
## v3 0.11
## v4 0.10
## v5 0.10
## v6 0.11
## v7 0.09
## v8 0.11
## v9 0.10

Data appear to already be in standardized form.

Lets make a plot of the 9-variate raw time-series.

#preparing data
#adding time variable
day <- 1:100
pdat2_plot <- cbind(pdat2$id,day,pdat2[,-1])
#Plotting observed scores
ggplot(data=pdat2_plot, aes(x=day)) +
  geom_line(aes(y=v1), color= 1) + 
  geom_line(aes(y=v2), color= 2) + 
  geom_line(aes(y=v3), color= 3) + 
  geom_line(aes(y=v4), color= 4) + 
  geom_line(aes(y=v5), color= 5) + 
  geom_line(aes(y=v6), color= 6) + 
  geom_line(aes(y=v7), color= 7) + 
  geom_line(aes(y=v8), color= 8) + 
  geom_line(aes(y=v9), color= 9) +
  xlab("Day") + ylab("Observed Score") + 
  scale_x_continuous(limits=c(0,100)) +
  scale_y_continuous(limits=c(-3,3)) +
  ggtitle("Person 2: Multivariate Time Series")
## Warning: Removed 1 rows containing missing values (geom_path).

Lets run the factor analysis on Person B, pdat2. First we should check if the number of factors is the same.

#correlation matrix
round(cor(pdat2[,-1]),2)
##       v1    v2    v3   v4   v5   v6    v7    v8    v9
## v1  1.00  0.78  0.79 0.21 0.05 0.24 -0.04 -0.13 -0.13
## v2  0.78  1.00  0.69 0.11 0.14 0.26 -0.01 -0.07 -0.15
## v3  0.79  0.69  1.00 0.19 0.01 0.23  0.01 -0.03 -0.11
## v4  0.21  0.11  0.19 1.00 0.49 0.55  0.39  0.29  0.32
## v5  0.05  0.14  0.01 0.49 1.00 0.47  0.30  0.09  0.28
## v6  0.24  0.26  0.23 0.55 0.47 1.00  0.28  0.12  0.31
## v7 -0.04 -0.01  0.01 0.39 0.30 0.28  1.00  0.56  0.52
## v8 -0.13 -0.07 -0.03 0.29 0.09 0.12  0.56  1.00  0.44
## v9 -0.13 -0.15 -0.11 0.32 0.28 0.31  0.52  0.44  1.00
corrplot(cor(pdat2[,-1]), order = "original", tl.col='black', tl.cex=.75) 

#parallel analysis for number of factors 
fa.parallel(x=pdat2[,-1], fm="minres", fa="fa")

## Parallel analysis suggests that the number of factors =  3  and the number of components =  NA
#multiple methods for number of factors
nScree(x=pdat2[,-1],model="factors")
##   noc naf nparallel nkaiser
## 1   2   2         2       3
plot(nScree(x=pdat2[,-1],model="factors"))

Hmmm… this peson appears to have a 2-factor solution. That may be an interesting interindividual difference. But it may also be problematic if we want to do comparisons in the same factor space.

The information above points towards 2 factors. What are those eigen values?

eigen(cor(pdat2[,-1]))$values
## [1] 2.9515136 2.5756903 1.1183328 0.5704920 0.5448644 0.4283432 0.3945002
## [8] 0.2616947 0.1545689

Hmmm… Ok lets look at the 2-factor and 3-factor solutions.

#2-factor model
EFApdat2_2factor <- fa(r = pdat1[,-1], nfactors = 2, 
                       rotate = "oblimin", 
                       fm = "pa",
                       scores="regression")
EFApdat2_2factor
## Factor Analysis using method =  pa
## Call: fa(r = pdat1[, -1], nfactors = 2, rotate = "oblimin", scores = "regression", 
##     fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##      PA1   PA2    h2   u2 com
## v1  0.85  0.02 0.724 0.28 1.0
## v2  0.74  0.05 0.564 0.44 1.0
## v3  0.84 -0.05 0.687 0.31 1.0
## v4  0.02  0.70 0.496 0.50 1.0
## v5  0.06  0.63 0.418 0.58 1.0
## v6  0.02  0.52 0.280 0.72 1.0
## v7 -0.18  0.43 0.179 0.82 1.3
## v8 -0.10  0.36 0.122 0.88 1.2
## v9 -0.09  0.28 0.072 0.93 1.2
## 
##                        PA1  PA2
## SS loadings           2.00 1.54
## Proportion Var        0.22 0.17
## Cumulative Var        0.22 0.39
## Proportion Explained  0.57 0.43
## Cumulative Proportion 0.57 1.00
## 
##  With factor correlations of 
##      PA1  PA2
## PA1 1.00 0.24
## PA2 0.24 1.00
## 
## Mean item complexity =  1.1
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  36  and the objective function was  2.55 with Chi Square of  242.79
## The degrees of freedom for the model are 19  and the objective function was  0.48 
## 
## The root mean square of the residuals (RMSR) is  0.09 
## The df corrected root mean square of the residuals is  0.12 
## 
## The harmonic number of observations is  100 with the empirical chi square  53.55  with prob <  3.9e-05 
## The total number of observations was  100  with Likelihood Chi Square =  44.91  with prob <  0.00071 
## 
## Tucker Lewis Index of factoring reliability =  0.759
## RMSEA index =  0.123  and the 90 % confidence intervals are  0.073 0.162
## BIC =  -42.59
## Fit based upon off diagonal values = 0.9
## Measures of factor score adequacy             
##                                                 PA1  PA2
## Correlation of scores with factors             0.93 0.85
## Multiple R square of scores with factors       0.86 0.72
## Minimum correlation of possible factor scores  0.72 0.44
#3-factor model
EFApdat2_3factor <- fa(r = pdat1[,-1], nfactors = 3, 
                       rotate = "oblimin", 
                       fm = "pa",
                       scores="regression")
EFApdat2_3factor
## Factor Analysis using method =  pa
## Call: fa(r = pdat1[, -1], nfactors = 3, rotate = "oblimin", scores = "regression", 
##     fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##      PA1   PA2   PA3   h2   u2 com
## v1  0.90 -0.04  0.06 0.79 0.21 1.0
## v2  0.74  0.05  0.00 0.56 0.44 1.0
## v3  0.80  0.02 -0.10 0.66 0.34 1.0
## v4  0.00  0.78  0.04 0.62 0.38 1.0
## v5  0.09  0.54  0.15 0.38 0.62 1.2
## v6 -0.03  0.70 -0.10 0.46 0.54 1.0
## v7 -0.03  0.01  0.73 0.54 0.46 1.0
## v8  0.02  0.04  0.51 0.27 0.73 1.0
## v9  0.03 -0.03  0.45 0.20 0.80 1.0
## 
##                        PA1  PA2  PA3
## SS loadings           2.01 1.41 1.07
## Proportion Var        0.22 0.16 0.12
## Cumulative Var        0.22 0.38 0.50
## Proportion Explained  0.45 0.31 0.24
## Cumulative Proportion 0.45 0.76 1.00
## 
##  With factor correlations of 
##       PA1  PA2   PA3
## PA1  1.00 0.21 -0.02
## PA2  0.21 1.00  0.26
## PA3 -0.02 0.26  1.00
## 
## Mean item complexity =  1
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  36  and the objective function was  2.55 with Chi Square of  242.79
## The degrees of freedom for the model are 12  and the objective function was  0.15 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.06 
## 
## The harmonic number of observations is  100 with the empirical chi square  7.48  with prob <  0.82 
## The total number of observations was  100  with Likelihood Chi Square =  14.19  with prob <  0.29 
## 
## Tucker Lewis Index of factoring reliability =  0.967
## RMSEA index =  0.051  and the 90 % confidence intervals are  0 0.116
## BIC =  -41.07
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                 PA1  PA2  PA3
## Correlation of scores with factors             0.94 0.87 0.81
## Multiple R square of scores with factors       0.88 0.76 0.66
## Minimum correlation of possible factor scores  0.75 0.51 0.32

Let’s go with the 3 factor solution. It looks a bit better all around.

And we can look at the reduced 3-variate latent time-series.

#remaking plot data 
pdat2_plot <- cbind(pdat2$id,day,pdat2[,-1],EFApdat2_3factor$scores)
#Plotting factor scores
ggplot(data=pdat2_plot, aes(x=day)) +
  geom_line(aes(y=PA1), color= 1) + 
  geom_line(aes(y=PA2), color= 2) + 
  geom_line(aes(y=PA3), color= 3) + 
  xlab("Day") + ylab("Factor Score") + 
  scale_x_continuous(limits=c(0,100)) +
  scale_y_continuous(limits=c(-3,3)) + 
  ggtitle("Person 2: Latent Factor Time Series")

Inter-individual comparisons -

We can look at the comparison of Person 1’s factor solution with Person 2’s factor solution.

Initially we could characterise the between-person differences with respect to number of factors (according to some established criteria (e.g., Kaiser rule: # of eigenvalues > 1)

#Person 1
nScree(x=pdat1[,-1],model="factors")
##   noc naf nparallel nkaiser
## 1   2   2         2       3
#Person 2
nScree(x=pdat2[,-1],model="factors")
##   noc naf nparallel nkaiser
## 1   2   2         2       3

With the same size p x q factor loading matrix, \(\boldsymbol{\Lambda}\), for the two samples (here 9 x 3), we can compare the pattern of loadings between the two samples using the Tucker Index of Factor Congruence.

This is easily obatined by applying the factor.congruence function in the psych package to two factor loading matrices.

fa.congruence(EFApdat1_3factor,EFApdat2_3factor)
##      PA1  PA2  PA3
## PA1 1.00 0.03 0.00
## PA2 0.03 1.00 0.04
## PA3 0.00 0.04 1.00

We see very good alignment across the two persons! From a person-specific perspective, this then gives us evidence that the two persons might be combined together in a sample.

2. An Example using Confirmatory P-technique Factor Analysis

We make use of the same 2-person data used above, but now in a confirmatory framework.

Confirmatory Factor Analysis is usually conduced within a Structural Equation Modeling (SEM) framework. There are a number of SEM packages in R. We currently tend to be using the lavaan package.
Lots of good information and instruction can be found on the package website … http://lavaan.ugent.be.

Within an SEM framework, we fit mulitple models to each persons data to determine, through formal tests, which of the hypothesized model fits best.

\[ \boldsymbol{\Sigma} = \boldsymbol{\Lambda}\boldsymbol{\Psi}\boldsymbol{\Lambda}' + \boldsymbol{\Theta} \] where \(\boldsymbol{\Sigma}\) is a p x p covariance (or correlation) matrix of the observed variables, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{\Psi}\) is a q x q covariance matrix of the latent factor variables, and \(\boldsymbol{\Theta}\) is a diagonal matrix of unique factor variances.

Then we use nested pairwise invariance tests to formally test if the two individuals can be represented by the same model, or require person-specific models.

Model Specification

We specify a 3-factor model by defining the latent variables and adding in some additional arrows. Each factor must be indicated by manifest variables, and must have a variance (maybe covariance and mean).
The identification constraints are needed and important. We will not go into all of those details here.

We set up a 1-factor model …

#Model with 1 common factor 
Ptech_1factor <- ' #start of model

# latent variable definitions (common factors)
  f1 =~ 1*v1 + 
        NA*v2 +
        NA*v3 +
        NA*v4 + 
        NA*v5 +
        NA*v6 +
        NA*v7 + 
        NA*v8 +
        NA*v9
  
# latent variable variances
  f1 ~~ NA*f1

# latent variable covariances
  
# latent variable means

# manifest variable variances (uniquenesses)
  v1 ~~ v1
  v2 ~~ v2
  v3 ~~ v3
  v4 ~~ v4
  v5 ~~ v5
  v6 ~~ v6
  v7 ~~ v7
  v8 ~~ v8
  v9 ~~ v9

# manifest variable covariances (uniquenesses)

# manifest variable means 

' #end of model

We also set up a 3-factor model …

#Model with 3 common factors 
Ptech_3factor <- ' #start of model

# latent variable definitions (common factors)
  f1 =~ 1*v1 + 
        NA*v2 +
        NA*v3

  f2 =~ 1*v4 + 
        NA*v5 +
        NA*v6

  f3 =~ 1*v7 + 
        NA*v8 +
        NA*v9
  
# latent variable variances
  f1 ~~ NA*f1
  f2 ~~ NA*f2
  f3 ~~ NA*f3

# latent variable covariances
  f1 ~~ f2 + f3
  f2 ~~ f3
  
# latent variable means

# manifest variable variances (uniquenesses)
  v1 ~~ v1
  v2 ~~ v2
  v3 ~~ v3
  v4 ~~ v4
  v5 ~~ v5
  v6 ~~ v6
  v7 ~~ v7
  v8 ~~ v8
  v9 ~~ v9

# manifest variable covariances (uniquenesses)

# manifest variable means 

' #end of model
Model Fitting

Second, we fit the specified model to the data to obtain parameter estimates.

The 1-factor model for Person 1

fit_p1_Ptech_1factor <- lavaan(Ptech_1factor, data=pdat1, mimic = "mplus")
summary(fit_p1_Ptech_1factor, standardized=TRUE, fit.measures=TRUE)
## lavaan (0.5-23.1097) converged normally after  17 iterations
## 
##   Number of observations                           100
## 
##   Number of missing patterns                         1
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic              128.229
##   Degrees of freedom                                36
##   P-value (Chi-square)                           0.000
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              255.119
##   Degrees of freedom                                36
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.579
##   Tucker-Lewis Index (TLI)                       0.579
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1159.852
##   Loglikelihood unrestricted model (H1)      -1095.737
## 
##   Number of free parameters                         18
##   Akaike (AIC)                                2355.704
##   Bayesian (BIC)                              2402.597
##   Sample-size adjusted Bayesian (BIC)         2345.748
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.160
##   90 Percent Confidence Interval          0.131  0.190
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.150
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   f1 =~                                                                 
##     v1                1.000                               0.826    0.873
##     v2                0.882    0.114    7.724    0.000    0.729    0.748
##     v3                0.881    0.106    8.344    0.000    0.728    0.801
##     v4                0.223    0.123    1.804    0.071    0.184    0.196
##     v5                0.247    0.113    2.192    0.028    0.204    0.235
##     v6                0.164    0.130    1.261    0.207    0.135    0.139
##     v7               -0.036    0.124   -0.289    0.772   -0.030   -0.031
##     v8                0.052    0.135    0.388    0.698    0.043    0.042
##     v9                0.018    0.127    0.141    0.888    0.015    0.015
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .v1                0.000                               0.000    0.000
##    .v2                0.000                               0.000    0.000
##    .v3                0.000                               0.000    0.000
##    .v4                0.000                               0.000    0.000
##    .v5                0.000                               0.000    0.000
##    .v6                0.000                               0.000    0.000
##    .v7                0.000                               0.000    0.000
##    .v8                0.000                               0.000    0.000
##    .v9                0.000                               0.000    0.000
##     f1                0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     f1                0.683    0.137    4.984    0.000    1.000    1.000
##    .v1                0.214    0.067    3.179    0.001    0.214    0.238
##    .v2                0.417    0.076    5.452    0.000    0.417    0.440
##    .v3                0.296    0.062    4.765    0.000    0.296    0.358
##    .v4                0.847    0.121    7.018    0.000    0.847    0.962
##    .v5                0.713    0.102    6.997    0.000    0.713    0.945
##    .v6                0.927    0.132    7.042    0.000    0.927    0.981
##    .v7                0.886    0.125    7.070    0.000    0.886    0.999
##    .v8                1.060    0.150    7.069    0.000    1.060    0.998
##    .v9                0.939    0.133    7.071    0.000    0.939    1.000

The 3-factor model for Person 1

fit_p1_Ptech_3factor <- lavaan(Ptech_3factor, data=pdat1, mimic = "mplus")
summary(fit_p1_Ptech_3factor, standardized=TRUE, fit.measures=TRUE)
## lavaan (0.5-23.1097) converged normally after  28 iterations
## 
##   Number of observations                           100
## 
##   Number of missing patterns                         1
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               38.495
##   Degrees of freedom                                33
##   P-value (Chi-square)                           0.235
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              255.119
##   Degrees of freedom                                36
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.975
##   Tucker-Lewis Index (TLI)                       0.973
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1114.984
##   Loglikelihood unrestricted model (H1)      -1095.737
## 
##   Number of free parameters                         21
##   Akaike (AIC)                                2271.969
##   Bayesian (BIC)                              2326.677
##   Sample-size adjusted Bayesian (BIC)         2260.354
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.041
##   90 Percent Confidence Interval          0.000  0.087
##   P-value RMSEA <= 0.05                          0.581
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.065
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   f1 =~                                                                 
##     v1                1.000                               0.827    0.874
##     v2                0.878    0.114    7.727    0.000    0.726    0.746
##     v3                0.886    0.108    8.219    0.000    0.733    0.807
##   f2 =~                                                                 
##     v4                1.000                               0.766    0.816
##     v5                0.709    0.162    4.389    0.000    0.543    0.625
##     v6                0.777    0.166    4.679    0.000    0.595    0.613
##   f3 =~                                                                 
##     v7                1.000                               0.668    0.710
##     v8                0.867    0.333    2.603    0.009    0.579    0.562
##     v9                0.623    0.241    2.581    0.010    0.416    0.430
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   f1 ~~                                                                 
##     f2                0.143    0.080    1.787    0.074    0.225    0.225
##     f3               -0.013    0.077   -0.173    0.863   -0.024   -0.024
##   f2 ~~                                                                 
##     f3                0.144    0.078    1.837    0.066    0.281    0.281
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .v1                0.000                               0.000    0.000
##    .v2                0.000                               0.000    0.000
##    .v3                0.000                               0.000    0.000
##    .v4                0.000                               0.000    0.000
##    .v5                0.000                               0.000    0.000
##    .v6                0.000                               0.000    0.000
##    .v7                0.000                               0.000    0.000
##    .v8                0.000                               0.000    0.000
##    .v9                0.000                               0.000    0.000
##     f1                0.000                               0.000    0.000
##     f2                0.000                               0.000    0.000
##     f3                0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     f1                0.684    0.138    4.973    0.000    1.000    1.000
##     f2                0.587    0.161    3.654    0.000    1.000    1.000
##     f3                0.447    0.196    2.285    0.022    1.000    1.000
##    .v1                0.212    0.068    3.107    0.002    0.212    0.237
##    .v2                0.420    0.076    5.498    0.000    0.420    0.443
##    .v3                0.288    0.063    4.566    0.000    0.288    0.349
##    .v4                0.294    0.117    2.506    0.012    0.294    0.334
##    .v5                0.460    0.089    5.192    0.000    0.460    0.609
##    .v6                0.590    0.105    5.608    0.000    0.590    0.625
##    .v7                0.440    0.174    2.530    0.011    0.440    0.496
##    .v8                0.726    0.162    4.479    0.000    0.726    0.684
##    .v9                0.766    0.127    6.046    0.000    0.766    0.815
Model Evaluation

Some basic questions to ask after fitting a model -
Did it work?
Does it make sense? Are the parameters viable?

Lets look at diagrams of the model to check if they match what we intended (this should also be done with the summary information.)

Person 1: 1-factor model

semPaths(fit_p1_Ptech_1factor, what="est", intercepts=FALSE,
         sizeLat = 7, sizeMan = 7, edge.label.cex = .75)

Person 1: 3-factor model

semPaths(fit_p1_Ptech_3factor, what="est", intercepts=FALSE,
         sizeLat = 7, sizeMan = 7, edge.label.cex = .75)

Then, we can look more closely at how well the model fit, and interpret the model more generally.

  • Note that there are different degrees of freedom in the models.
  • Note how much better the fit of the 3-factor model is.
  • We can formalize the improvement in fit.

We do model comparison…
Looking at fit statistics

#Compare model fit statistics
round(cbind(p1_f1=inspect(fit_p1_Ptech_1factor, 'fit.measures'), p1_f3=inspect(fit_p1_Ptech_3factor, 'fit.measures')),3)
##                         p1_f1     p1_f3
## npar                   18.000    21.000
## fmin                    0.641     0.192
## chisq                 128.229    38.495
## df                     36.000    33.000
## pvalue                  0.000     0.235
## baseline.chisq        255.119   255.119
## baseline.df            36.000    36.000
## baseline.pvalue         0.000     0.000
## cfi                     0.579     0.975
## tli                     0.579     0.973
## nnfi                    0.579     0.973
## rfi                     0.497     0.835
## nfi                     0.497     0.849
## pnfi                    0.497     0.778
## ifi                     0.579     0.975
## rni                     0.579     0.975
## logl                -1159.852 -1114.984
## unrestricted.logl   -1095.737 -1095.737
## aic                  2355.704  2271.969
## bic                  2402.597  2326.677
## ntotal                100.000   100.000
## bic2                 2345.748  2260.354
## rmsea                   0.160     0.041
## rmsea.ci.lower          0.131     0.000
## rmsea.ci.upper          0.190     0.087
## rmsea.pvalue            0.000     0.581
## rmr                     0.137     0.039
## rmr_nomean              0.137     0.039
## srmr                    0.150     0.065
## srmr_bentler            0.150     0.064
## srmr_bentler_nomean     0.155     0.045
## srmr_bollen             0.150     0.064
## srmr_bollen_nomean      0.155     0.044
## srmr_mplus              0.150     0.065
## srmr_mplus_nomean       0.155     0.045
## cn_05                  40.771   124.134
## cn_01                  46.714   143.294
## gfi                     0.767     0.924
## agfi                    0.650     0.876
## pgfi                    0.511     0.565
## mfi                     0.631     0.973
## ecvi                       NA        NA

Formal model test

#Chi-square difference test for nested models 
anova(fit_p1_Ptech_1factor, fit_p1_Ptech_3factor)
## Chi Square Difference Test
## 
##                      Df    AIC    BIC   Chisq Chisq diff Df diff
## fit_p1_Ptech_3factor 33 2272.0 2326.7  38.495                   
## fit_p1_Ptech_1factor 36 2355.7 2402.6 128.229     89.735       3
##                      Pr(>Chisq)    
## fit_p1_Ptech_3factor               
## fit_p1_Ptech_1factor  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can reject the null hypothesis that the models fit the same.

We conclude that the 3-factor model is the optimal model, of those evaluated.

The same is true for Person 2. The 1-factor and 3-factor models for Person 2

#1-factor model
fit_p2_Ptech_1factor <- lavaan(Ptech_1factor, data=pdat2, mimic = "mplus")
#3-factor model
fit_p2_Ptech_3factor <- lavaan(Ptech_3factor, data=pdat2, mimic = "mplus")
#Chi-square difference test for nested models 
anova(fit_p2_Ptech_1factor, fit_p2_Ptech_3factor)
## Chi Square Difference Test
## 
##                      Df    AIC    BIC   Chisq Chisq diff Df diff
## fit_p2_Ptech_3factor 33 2287.7 2342.4  44.865                   
## fit_p2_Ptech_1factor 36 2437.3 2484.2 200.456     155.59       3
##                      Pr(>Chisq)    
## fit_p2_Ptech_3factor               
## fit_p2_Ptech_1factor  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Inter-individual comparisons -

We can look at the comparison of Person 1’s factor solution with Person 2’s factor solution.

Formally we set up two 2-group models and follow the logic of standard invariance tests. In the Free model we allow the factor loadings to differ. In the Invariance model we constrain the factor loadings to be the same.

Model Specification

We specify a 3-factor model. This is the same as above.

#Model with 3 common factors 
Ptech_3factor <- ' #start of model

# latent variable definitions (common factors)
  f1 =~ 1*v1 + 
        NA*v2 +
        NA*v3

  f2 =~ 1*v4 + 
        NA*v5 +
        NA*v6

  f3 =~ 1*v7 + 
        NA*v8 +
        NA*v9
  
# latent variable variances
  f1 ~~ NA*f1
  f2 ~~ NA*f2
  f3 ~~ NA*f3

# latent variable covariances
  f1 ~~ f2 + f3
  f2 ~~ f3
  
# latent variable means

# manifest variable variances (uniquenesses)
  v1 ~~ v1
  v2 ~~ v2
  v3 ~~ v3
  v4 ~~ v4
  v5 ~~ v5
  v6 ~~ v6
  v7 ~~ v7
  v8 ~~ v8
  v9 ~~ v9

# manifest variable covariances (uniquenesses)

# manifest variable means 

' #end of model
Model Fitting

Second, we fit the specified model to the data to obtain parameter estimates.

The Free model for multiple groups

fit_Ptech_3factor_Free <- lavaan(Ptech_3factor, data=pdat, mimic = "mplus",
                                 group = "id")
summary(fit_Ptech_3factor_Free, standardized=TRUE, fit.measures=TRUE)
## lavaan (0.5-23.1097) converged normally after  43 iterations
## 
##   Number of observations per group         
##   1                                                100
##   2                                                100
## 
##   Number of missing patterns per group     
##   1                                                  1
##   2                                                  1
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               83.360
##   Degrees of freedom                                66
##   P-value (Chi-square)                           0.073
## 
## Chi-square for each group:
## 
##   1                                             38.495
##   2                                             44.865
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              656.505
##   Degrees of freedom                                72
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.970
##   Tucker-Lewis Index (TLI)                       0.968
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -2237.843
##   Loglikelihood unrestricted model (H1)      -2196.163
## 
##   Number of free parameters                         42
##   Akaike (AIC)                                4559.687
##   Bayesian (BIC)                              4698.216
##   Sample-size adjusted Bayesian (BIC)         4565.156
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.051
##   90 Percent Confidence Interval          0.000  0.082
##   P-value RMSEA <= 0.05                          0.454
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.065
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                             Standard
## 
## 
## Group 1 [1]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   f1 =~                                                                 
##     v1                1.000                               0.827    0.874
##     v2                0.878    0.114    7.726    0.000    0.726    0.746
##     v3                0.886    0.108    8.219    0.000    0.733    0.807
##   f2 =~                                                                 
##     v4                1.000                               0.766    0.816
##     v5                0.709    0.162    4.389    0.000    0.543    0.625
##     v6                0.777    0.166    4.679    0.000    0.595    0.613
##   f3 =~                                                                 
##     v7                1.000                               0.668    0.710
##     v8                0.867    0.333    2.603    0.009    0.579    0.562
##     v9                0.623    0.241    2.581    0.010    0.416    0.430
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   f1 ~~                                                                 
##     f2                0.143    0.080    1.787    0.074    0.225    0.225
##     f3               -0.013    0.077   -0.173    0.863   -0.024   -0.024
##   f2 ~~                                                                 
##     f3                0.144    0.078    1.837    0.066    0.281    0.281
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .v1                0.000                               0.000    0.000
##    .v2                0.000                               0.000    0.000
##    .v3                0.000                               0.000    0.000
##    .v4                0.000                               0.000    0.000
##    .v5                0.000                               0.000    0.000
##    .v6                0.000                               0.000    0.000
##    .v7                0.000                               0.000    0.000
##    .v8                0.000                               0.000    0.000
##    .v9                0.000                               0.000    0.000
##     f1                0.000                               0.000    0.000
##     f2                0.000                               0.000    0.000
##     f3                0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     f1                0.684    0.138    4.973    0.000    1.000    1.000
##     f2                0.587    0.161    3.654    0.000    1.000    1.000
##     f3                0.447    0.196    2.285    0.022    1.000    1.000
##    .v1                0.212    0.068    3.107    0.002    0.212    0.237
##    .v2                0.420    0.076    5.498    0.000    0.420    0.443
##    .v3                0.288    0.063    4.566    0.000    0.288    0.349
##    .v4                0.294    0.117    2.506    0.012    0.294    0.334
##    .v5                0.460    0.089    5.192    0.000    0.460    0.609
##    .v6                0.590    0.105    5.608    0.000    0.590    0.625
##    .v7                0.440    0.174    2.530    0.011    0.440    0.496
##    .v8                0.726    0.162    4.479    0.000    0.726    0.684
##    .v9                0.766    0.127    6.046    0.000    0.766    0.815
## 
## 
## Group 2 [2]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   f1 =~                                                                 
##     v1                1.000                               0.975    0.950
##     v2                0.880    0.081   10.921    0.000    0.859    0.823
##     v3                0.928    0.083   11.231    0.000    0.905    0.835
##   f2 =~                                                                 
##     v4                1.000                               0.771    0.774
##     v5                0.829    0.159    5.207    0.000    0.639    0.625
##     v6                0.986    0.183    5.383    0.000    0.760    0.713
##   f3 =~                                                                 
##     v7                1.000                               0.744    0.812
##     v8                0.991    0.176    5.618    0.000    0.738    0.647
##     v9                0.917    0.178    5.152    0.000    0.683    0.674
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   f1 ~~                                                                 
##     f2                0.201    0.093    2.149    0.032    0.267    0.267
##     f3               -0.069    0.086   -0.800    0.424   -0.095   -0.095
##   f2 ~~                                                                 
##     f3                0.312    0.090    3.445    0.001    0.543    0.543
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .v1                0.000                               0.000    0.000
##    .v2                0.000                               0.000    0.000
##    .v3                0.000                               0.000    0.000
##    .v4                0.000                               0.000    0.000
##    .v5                0.000                               0.000    0.000
##    .v6                0.000                               0.000    0.000
##    .v7                0.000                               0.000    0.000
##    .v8                0.000                               0.000    0.000
##    .v9                0.000                               0.000    0.000
##     f1                0.000                               0.000    0.000
##     f2                0.000                               0.000    0.000
##     f3                0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     f1                0.951    0.157    6.052    0.000    1.000    1.000
##     f2                0.594    0.155    3.822    0.000    1.000    1.000
##     f3                0.554    0.139    3.996    0.000    1.000    1.000
##    .v1                0.103    0.054    1.923    0.054    0.103    0.098
##    .v2                0.352    0.064    5.457    0.000    0.352    0.323
##    .v3                0.357    0.067    5.300    0.000    0.357    0.303
##    .v4                0.397    0.104    3.819    0.000    0.397    0.401
##    .v5                0.636    0.111    5.738    0.000    0.636    0.609
##    .v6                0.559    0.120    4.666    0.000    0.559    0.492
##    .v7                0.287    0.091    3.140    0.002    0.287    0.341
##    .v8                0.756    0.133    5.672    0.000    0.756    0.581
##    .v9                0.559    0.111    5.027    0.000    0.559    0.546

The Invariance model for multiple groups

fit_Ptech_3factor_Invariance <- lavaan(Ptech_3factor, data=pdat, mimic = "mplus",
                                 group = "id",
                                 group.equal = c("loadings"))
summary(fit_Ptech_3factor_Invariance, standardized=TRUE, fit.measures=TRUE)
## lavaan (0.5-23.1097) converged normally after  33 iterations
## 
##   Number of observations per group         
##   1                                                100
##   2                                                100
## 
##   Number of missing patterns per group     
##   1                                                  1
##   2                                                  1
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               85.107
##   Degrees of freedom                                72
##   P-value (Chi-square)                           0.139
## 
## Chi-square for each group:
## 
##   1                                             39.609
##   2                                             45.497
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              656.505
##   Degrees of freedom                                72
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.978
##   Tucker-Lewis Index (TLI)                       0.978
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -2238.717
##   Loglikelihood unrestricted model (H1)      -2196.163
## 
##   Number of free parameters                         36
##   Akaike (AIC)                                4549.433
##   Bayesian (BIC)                              4668.173
##   Sample-size adjusted Bayesian (BIC)         4554.121
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.043
##   90 Percent Confidence Interval          0.000  0.075
##   P-value RMSEA <= 0.05                          0.612
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.067
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                             Standard
## 
## 
## Group 1 [1]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   f1 =~                                                                 
##     v1                1.000                               0.819    0.868
##     v2      (.p2.)    0.879    0.066   13.368    0.000    0.720    0.743
##     v3      (.p3.)    0.911    0.066   13.801    0.000    0.747    0.815
##   f2 =~                                                                 
##     v4                1.000                               0.722    0.779
##     v5      (.p5.)    0.774    0.113    6.845    0.000    0.559    0.641
##     v6      (.p6.)    0.884    0.124    7.129    0.000    0.638    0.646
##   f3 =~                                                                 
##     v7                1.000                               0.601    0.646
##     v8      (.p8.)    0.971    0.154    6.317    0.000    0.583    0.568
##     v9      (.p9.)    0.831    0.145    5.731    0.000    0.499    0.505
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   f1 ~~                                                                 
##     f2                0.136    0.075    1.806    0.071    0.230    0.230
##     f3               -0.009    0.069   -0.126    0.900   -0.018   -0.018
##   f2 ~~                                                                 
##     f3                0.123    0.067    1.831    0.067    0.284    0.284
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .v1                0.000                               0.000    0.000
##    .v2                0.000                               0.000    0.000
##    .v3                0.000                               0.000    0.000
##    .v4                0.000                               0.000    0.000
##    .v5                0.000                               0.000    0.000
##    .v6                0.000                               0.000    0.000
##    .v7                0.000                               0.000    0.000
##    .v8                0.000                               0.000    0.000
##    .v9                0.000                               0.000    0.000
##     f1                0.000                               0.000    0.000
##     f2                0.000                               0.000    0.000
##     f3                0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     f1                0.671    0.120    5.572    0.000    1.000    1.000
##     f2                0.522    0.127    4.092    0.000    1.000    1.000
##     f3                0.361    0.110    3.274    0.001    1.000    1.000
##    .v1                0.219    0.060    3.620    0.000    0.219    0.246
##    .v2                0.421    0.074    5.714    0.000    0.421    0.448
##    .v3                0.282    0.059    4.754    0.000    0.282    0.336
##    .v4                0.338    0.094    3.580    0.000    0.338    0.393
##    .v5                0.447    0.082    5.486    0.000    0.447    0.589
##    .v6                0.568    0.102    5.567    0.000    0.568    0.582
##    .v7                0.505    0.114    4.444    0.000    0.505    0.583
##    .v8                0.713    0.132    5.404    0.000    0.713    0.677
##    .v9                0.728    0.122    5.960    0.000    0.728    0.745
## 
## 
## Group 2 [2]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   f1 =~                                                                 
##     v1                1.000                               0.979    0.952
##     v2      (.p2.)    0.879    0.066   13.368    0.000    0.861    0.823
##     v3      (.p3.)    0.911    0.066   13.801    0.000    0.893    0.829
##   f2 =~                                                                 
##     v4                1.000                               0.809    0.803
##     v5      (.p5.)    0.774    0.113    6.845    0.000    0.626    0.615
##     v6      (.p6.)    0.884    0.124    7.129    0.000    0.715    0.680
##   f3 =~                                                                 
##     v7                1.000                               0.768    0.832
##     v8      (.p8.)    0.971    0.154    6.317    0.000    0.745    0.651
##     v9      (.p9.)    0.831    0.145    5.731    0.000    0.638    0.640
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   f1 ~~                                                                 
##     f2                0.207    0.097    2.139    0.032    0.261    0.261
##     f3               -0.066    0.089   -0.747    0.455   -0.088   -0.088
##   f2 ~~                                                                 
##     f3                0.336    0.091    3.674    0.000    0.541    0.541
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .v1                0.000                               0.000    0.000
##    .v2                0.000                               0.000    0.000
##    .v3                0.000                               0.000    0.000
##    .v4                0.000                               0.000    0.000
##    .v5                0.000                               0.000    0.000
##    .v6                0.000                               0.000    0.000
##    .v7                0.000                               0.000    0.000
##    .v8                0.000                               0.000    0.000
##    .v9                0.000                               0.000    0.000
##     f1                0.000                               0.000    0.000
##     f2                0.000                               0.000    0.000
##     f3                0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     f1                0.959    0.154    6.235    0.000    1.000    1.000
##     f2                0.654    0.144    4.532    0.000    1.000    1.000
##     f3                0.590    0.134    4.396    0.000    1.000    1.000
##    .v1                0.098    0.051    1.920    0.055    0.098    0.093
##    .v2                0.354    0.064    5.531    0.000    0.354    0.323
##    .v3                0.361    0.066    5.454    0.000    0.361    0.312
##    .v4                0.361    0.098    3.689    0.000    0.361    0.356
##    .v5                0.645    0.109    5.901    0.000    0.645    0.622
##    .v6                0.595    0.114    5.213    0.000    0.595    0.538
##    .v7                0.262    0.088    2.967    0.003    0.262    0.308
##    .v8                0.754    0.132    5.724    0.000    0.754    0.576
##    .v9                0.588    0.108    5.437    0.000    0.588    0.591
Model Evaluation

Some basic questions to ask after fitting a model -
Did it work?
Does it make sense? Are the parameters viable?

Lets look at diagrams of the models to check if they match what we intended (this should also be done with the summary information.)

Free Model

semPaths(fit_Ptech_3factor_Free, what="est", intercepts=FALSE,
         sizeLat = 7, sizeMan = 7, edge.label.cex = .75)

Invariance Model

semPaths(fit_Ptech_3factor_Invariance, what="est", intercepts=FALSE,
         sizeLat = 7, sizeMan = 7, edge.label.cex = .75)

Formal model test

#Chi-square difference test for nested models 
anova(fit_Ptech_3factor_Free, fit_Ptech_3factor_Invariance)
## Chi Square Difference Test
## 
##                              Df    AIC    BIC  Chisq Chisq diff Df diff
## fit_Ptech_3factor_Free       66 4559.7 4698.2 83.360                   
## fit_Ptech_3factor_Invariance 72 4549.4 4668.2 85.107     1.7467       6
##                              Pr(>Chisq)
## fit_Ptech_3factor_Free                 
## fit_Ptech_3factor_Invariance     0.9415

We cannot reject the null hypothesis that the models fit the same.

We conclude that there is model invariance (of the models evaluated) between Person 1 and Person 2.

This gives us a strong basis for pooling these two individuals into a multi-person analytic framework.

4. Conclusion

We have made a first push into P-technique Factor Analysis models - providing a basis for doing analysis of multivariate intraindividual variability. Be wary of the assumption of independence of observations that is the basis of these models - and look for extensions into Dynamic Factor Analysis models that additionally model the time-dependencies. (Ideally, what was done here should only be done after those dependencies have been removed). There are still a variety of details that must be worked out and understood as we try to fit these models in R to many persons data. But, since looping is so easy - it should be relatively straightforward to extend to the many-person case. Now we just need the data!

Good Luck! And, as always please be careful.