Individuals, at any given moment, are a complex configuration of characteristics. Some of these characteristics are relatively labile, and some are relatively stable. A general question is: How, when, and why do the many characteristics of an individual change or covary over time? Inherently this question is multivariate (many characteristics), requires time-series data (change over time), and focused on within-person covariation (single-subject). Previously we have examined intraindividual covariation of two or three variables. Extension to multiple (many) variables is relatively straightforward, but require a “multi-outcome” framework. In this tutorial, we engage with how Factor Analysis (a basic data reduction technique) is used to examine multivariate intraindividual variability - P-technique Factor Analsyis.
Introduction to The P-technique Factor Analysis Model
An Example using Exploratory P-technique Factor Analysis
An Example using Confirmatory P-technique Factor Analysis
Conclusion
library(psych)
library(ggplot2)
library(corrplot) #plotting correlation matrices
library(GPArotation) #methods for factor rotation
library(nFactors) #methods for determining the number of factors
library(lavaan) #for fitting structural equation models
library(semPlot) #for automatically making diagrams
The basic factor analysis model is written as
\[y_{pi} = \lambda_{pq} f_{qi} + u_{pi}\] where \(y_{pi}\) is individual i’s score on the pth observed variable, \(f_{qi}\) is individual i’s score on the qth latent common factor, \(u_{pi}\) is individual i’s score on the pth latent unique factor, and \(\lambda_{pq}\) is the factor loading that indicates the relation between the pth observed variable and the qth latent common factor.
Typically, we have multiple observed variables and one or more common factors. For instance in the 6 variable, 2 factor case we would have …
\[y_{1i} = \lambda_{11} f_{1i} + \lambda_{12} f_{2i} + u_{1i}\] \[y_{2i} = \lambda_{21} f_{1i} + \lambda_{22} f_{2i} + u_{2i}\] \[y_{3i} = \lambda_{31} f_{1i} + \lambda_{32} f_{2i} + u_{3i}\] \[y_{4i} = \lambda_{41} f_{1i} + \lambda_{42} f_{2i} + u_{4i}\] \[y_{5i} = \lambda_{51} f_{1i} + \lambda_{52} f_{2i} + u_{5i}\] \[y_{6i} = \lambda_{61} f_{1i} + \lambda_{62} f_{2i} + u_{6i}\] which can be written in a compact matrix form as
\[ \boldsymbol{Y_{i}} = \boldsymbol{\Lambda}\boldsymbol{F_{i}} + \boldsymbol{U_{i}} \] where \(\boldsymbol{Y_{i}}\) is a \(p\) x 1 vector of observed variable scores, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{F_{i}}\) is a \(q\) x 1 vector of common factor scores, and \(\boldsymbol{U_{i}}\) is a p x 1 vector of unique factor scores.
Extension to multiple persons provided for mapping to the observed correlation matrix, \(\boldsymbol{\Sigma} = \boldsymbol{Y}'\boldsymbol{Y}\) and the common factor model becomes
\[ \boldsymbol{\Sigma} = \boldsymbol{\Lambda}\boldsymbol{\Psi}\boldsymbol{\Lambda}' + \boldsymbol{\Theta} \] where \(\boldsymbol{\Sigma}\) is a p x p covariance (or correlation) matrix of the observed variables, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{\Psi}\) is a q x q covariance matrix of the latent factor variables, and \(\boldsymbol{\Theta}\) is a diagonal matrix of unique factor variances.
The analysis is focused around modeling of \(\boldsymbol{\Sigma}\), a covariance (or correlation matrix). Traditionally, factor analysis is conceived as analysis of interindividual differences and analysis of an R-slice (persons x variables x 1 occasion) of the data box. Alternatively, the \(\boldsymbol{\Sigma}\), a covariance (or correlation matrix), can be made from a P-slice (occasions x variables x 1 person), such that the factor analysis becomes an analysis of intraindividual differences and within-person covariation.
In all of the above, we simply replace the i subscript with at t.
For our examples we use a small 2-person data simulated set modeled after “The Lebo Data” used in the Brose and Ram (2012) step-by-step guide to P-technique.
There are 2 samples in the data - designated by id
.
#Reading the data from web location
#set filepath for data file
filepath <- "https://quantdev.ssri.psu.edu/sites/qdev/files/ptechdata.csv"
#read in the .csv file using the url() function
pdat <- read.csv(file=url(filepath),header=TRUE)
Lets split the data for the two persons into separate data frames
#splitting data
pdat1 <- pdat[pdat$id==1, ]
pdat2 <- pdat[pdat$id==2, ]
Lets have a look at Person 1’s data and descriptives.
#data structure
head(pdat1,10)
## id v1 v2 v3 v4 v5 v6 v7 v8 v9
## 1 1 -0.84 -0.27 -0.59 -0.97 -0.72 -1.86 1.11 0.46 0.09
## 2 1 -0.04 -0.33 -0.54 2.57 1.30 0.59 1.42 2.56 0.66
## 3 1 0.34 -0.23 0.99 -1.84 -1.62 -1.55 1.05 0.22 0.60
## 4 1 -1.01 -1.96 -1.17 -2.93 -1.75 -1.89 -0.51 -0.07 1.14
## 5 1 2.02 0.21 0.97 0.13 -0.01 -0.88 -1.22 0.54 -0.28
## 6 1 -0.24 -0.19 0.23 0.80 1.06 1.61 1.31 -1.10 0.26
## 7 1 -0.61 -1.37 0.15 -0.68 -1.50 -0.44 -1.63 1.40 1.79
## 8 1 -1.26 -2.59 -1.10 -0.40 -0.66 0.29 -1.45 -1.06 0.15
## 9 1 0.23 0.25 -0.21 0.23 -0.42 1.01 0.15 1.16 -0.36
## 10 1 1.01 -0.08 1.95 1.14 -0.06 -0.48 -0.62 -1.27 -0.08
#descriptives (without id column)
describe(pdat1[,-1])
## vars n mean sd median trimmed mad min max range skew kurtosis
## v1 1 100 0.00 0.95 0.02 0.00 0.96 -2.33 2.44 4.77 0.04 -0.41
## v2 2 100 -0.14 0.97 -0.11 -0.12 0.90 -2.59 2.07 4.66 -0.21 -0.21
## v3 3 100 0.04 0.91 0.09 0.08 0.90 -3.11 2.10 5.21 -0.46 0.72
## v4 4 100 -0.13 0.93 -0.06 -0.11 0.93 -2.93 2.57 5.50 -0.15 0.21
## v5 5 100 -0.18 0.85 -0.29 -0.20 0.83 -2.04 1.75 3.79 0.25 -0.31
## v6 6 100 -0.01 0.98 0.06 0.01 0.81 -2.37 2.24 4.61 -0.21 -0.34
## v7 7 100 0.12 0.94 0.12 0.12 1.02 -1.91 2.14 4.05 0.04 -0.68
## v8 8 100 -0.03 1.04 0.00 -0.03 0.93 -2.58 2.56 5.14 -0.05 -0.32
## v9 9 100 0.17 0.96 0.26 0.18 0.93 -2.54 2.47 5.01 -0.20 -0.17
## se
## v1 0.10
## v2 0.10
## v3 0.09
## v4 0.09
## v5 0.09
## v6 0.10
## v7 0.09
## v8 0.10
## v9 0.10
Data appear to already be in standardized form.
Lets make a plot of the 9-variate raw time-series.
#preparing data
#adding time variable
day <- 1:100
pdat1_plot <- cbind(pdat1$id,day,pdat1[,-1])
pdat2_plot <- cbind(pdat2$id,day,pdat2[,-1])
#Plotting observed scores
ggplot(data=pdat1_plot, aes(x=day)) +
geom_line(aes(y=v1), color= 1) +
geom_line(aes(y=v2), color= 2) +
geom_line(aes(y=v3), color= 3) +
geom_line(aes(y=v4), color= 4) +
geom_line(aes(y=v5), color= 5) +
geom_line(aes(y=v6), color= 6) +
geom_line(aes(y=v7), color= 7) +
geom_line(aes(y=v8), color= 8) +
geom_line(aes(y=v9), color= 9) +
xlab("Day") + ylab("Observed Score") +
scale_x_continuous(limits=c(0,100)) +
scale_y_continuous(limits=c(-3,3)) +
ggtitle("Person 1: Multivariate Time Series")
The numerical basis for Factor Analysis is the covariance (or correlation) matrix. Let’s look at visual and numeric versions - keeping in mind our basix research question … Can these data be represented by a smaller number of factors?
#pairs panels
pairs.panels(pdat1[,-1])
#correlation matrix
round(cor(pdat1[,-1]),2)
## v1 v2 v3 v4 v5 v6 v7 v8 v9
## v1 1.00 0.66 0.71 0.14 0.21 0.05 -0.02 0.07 0.03
## v2 0.66 1.00 0.61 0.13 0.16 0.16 0.03 -0.03 0.02
## v3 0.71 0.61 1.00 0.14 0.13 0.11 -0.12 0.00 -0.03
## v4 0.14 0.13 0.14 1.00 0.49 0.52 0.18 0.14 0.10
## v5 0.21 0.16 0.13 0.49 1.00 0.38 0.24 0.09 0.16
## v6 0.05 0.16 0.11 0.52 0.38 1.00 0.04 0.15 -0.03
## v7 -0.02 0.03 -0.12 0.18 0.24 0.04 1.00 0.40 0.29
## v8 0.07 -0.03 0.00 0.14 0.09 0.15 0.40 1.00 0.26
## v9 0.03 0.02 -0.03 0.10 0.16 -0.03 0.29 0.26 1.00
#correlation matrix plot from the
corrplot(cor(pdat1[,-1]), order = "original", tl.col='black', tl.cex=.75)
We see three “groups” of variables that are positively correlated. This gives us hope.
Traditionally, EFA was an analysis of a correlation matrix. Most programs now can also read the raw data directly. We make the correlation matrix to be explicit (and so that we know how that matrix was made - e.g, how the missing data were treated). #Store the correlation matrix of the data into an object
corpdat1 <- cor(pdat1[,-1], use="pairwise.complete.obs")
round(corpdat1,2)
## v1 v2 v3 v4 v5 v6 v7 v8 v9
## v1 1.00 0.66 0.71 0.14 0.21 0.05 -0.02 0.07 0.03
## v2 0.66 1.00 0.61 0.13 0.16 0.16 0.03 -0.03 0.02
## v3 0.71 0.61 1.00 0.14 0.13 0.11 -0.12 0.00 -0.03
## v4 0.14 0.13 0.14 1.00 0.49 0.52 0.18 0.14 0.10
## v5 0.21 0.16 0.13 0.49 1.00 0.38 0.24 0.09 0.16
## v6 0.05 0.16 0.11 0.52 0.38 1.00 0.04 0.15 -0.03
## v7 -0.02 0.03 -0.12 0.18 0.24 0.04 1.00 0.40 0.29
## v8 0.07 -0.03 0.00 0.14 0.09 0.15 0.40 1.00 0.26
## v9 0.03 0.02 -0.03 0.10 0.16 -0.03 0.29 0.26 1.00
We can inform our choice of number of factors with a number of functions. We can use fa.parallel()
in the psych
package, or the nScree()
function in the nScree
package.
#parallel analysis for number of factors
fa.parallel(x=corpdat1, fm="minres", fa="fa")
## Warning in fa.parallel(x = corpdat1, fm = "minres", fa = "fa"): It seems as
## if you are using a correlation matrix, but have not specified the number of
## cases. The number of subjects is arbitrarily set to be 100
## Parallel analysis suggests that the number of factors = 3 and the number of components = NA
#multiple methods for number of factors
nScree(x=corpdat1,model="factors")
## noc naf nparallel nkaiser
## 1 3 3 3 3
plot(nScree(x=corpdat1,model="factors"))
The output here is number of components/factors according to optimal coordinates (noc), acceleration factor (naf), parallel analysis (nparallel), and Kaiser rule (nkaiser).
For these data - everything points to choice of 3 factors.
Now lets run the factor analysis. This time we use the fa()
in the psych
package, with oblique oblimin rotation (rotate=“oblimin”) and principal factor extraction (fm=“pa”).
EFApdat1_3factor <- fa(r = corpdat1, nfactors = 3,
rotate = "oblimin",
fm = "pa")
EFApdat1_3factor
## Factor Analysis using method = pa
## Call: fa(r = corpdat1, nfactors = 3, rotate = "oblimin", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA2 PA3 h2 u2 com
## v1 0.90 -0.04 0.06 0.79 0.21 1.0
## v2 0.74 0.05 0.00 0.56 0.44 1.0
## v3 0.80 0.02 -0.10 0.66 0.34 1.0
## v4 0.00 0.78 0.04 0.62 0.38 1.0
## v5 0.09 0.54 0.15 0.38 0.62 1.2
## v6 -0.03 0.70 -0.10 0.46 0.54 1.0
## v7 -0.03 0.01 0.73 0.54 0.46 1.0
## v8 0.02 0.04 0.51 0.27 0.73 1.0
## v9 0.03 -0.03 0.45 0.20 0.80 1.0
##
## PA1 PA2 PA3
## SS loadings 2.01 1.41 1.07
## Proportion Var 0.22 0.16 0.12
## Cumulative Var 0.22 0.38 0.50
## Proportion Explained 0.45 0.31 0.24
## Cumulative Proportion 0.45 0.76 1.00
##
## With factor correlations of
## PA1 PA2 PA3
## PA1 1.00 0.21 -0.02
## PA2 0.21 1.00 0.26
## PA3 -0.02 0.26 1.00
##
## Mean item complexity = 1
## Test of the hypothesis that 3 factors are sufficient.
##
## The degrees of freedom for the null model are 36 and the objective function was 2.55
## The degrees of freedom for the model are 12 and the objective function was 0.15
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.06
##
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy
## PA1 PA2 PA3
## Correlation of scores with factors 0.94 0.87 0.81
## Multiple R square of scores with factors 0.88 0.76 0.66
## Minimum correlation of possible factor scores 0.75 0.51 0.32
The solution looks pretty good.
Our objective here was data reduction. To make that explicit, we can obtain the factor scores by inputing the raw data matrix instead of the correlation matrix. Additionally, we can specify the ‘scores’ argument as below to get factor scores by a factor score regression method. To get estimated factor scores we must input the raw data.
EFApdat1_3factor <- fa(r = pdat1[,-1], nfactors = 3,
rotate = "oblimin",
fm = "pa",
scores="regression")
head(EFApdat1_3factor$scores,10)
## PA1 PA2 PA3
## 1 -0.783420678 -1.1506737 0.5983783
## 2 -0.262502699 2.1004885 1.8211517
## 3 0.433115912 -1.7226209 0.3998862
## 4 -1.419724776 -2.6307726 -0.3797518
## 5 1.476386714 -0.1114002 -0.6023405
## 6 0.007833174 1.3577895 0.5681540
## 7 -0.614527322 -0.8404476 -0.5418789
## 8 -1.578440465 -0.3701222 -1.1402610
## 9 0.122523810 0.5020814 0.1907809
## 10 1.232518910 0.5499773 -0.7593508
NOTE: We do have factor score indeterminacy! So, be careful.
Lets look at the “reduced” data.
pairs.panels(EFApdat1_3factor$scores)
And we can look at the reduced 3-variate latent time-series.
#remaking plot data
pdat1_plot <- cbind(pdat1$id,day,pdat1[,-1],EFApdat1_3factor$scores)
#Plotting factor scores
ggplot(data=pdat1_plot, aes(x=day)) +
geom_line(aes(y=PA1), color= 1) +
geom_line(aes(y=PA2), color= 2) +
geom_line(aes(y=PA3), color= 3) +
xlab("Day") + ylab("Factor Score") +
scale_x_continuous(limits=c(0,100)) +
scale_y_continuous(limits=c(-3,3)) +
ggtitle("Person A: Latent Factor Time Series")
But do not forget that there are also unique factors too. The factors only capture common variance. There may be quite a bit of stuff left over.
Remember that the variance accounted for by the 3 factors is only part of the total variance. Specifically, 0.499181, which we see in the main output
EFApdat1_3factor$Vaccounted[3,3]
## [1] 0.499181
All the other parts are in the uniquenesses and misfit of the correlations.
#Unique Factor Variances
round(EFApdat1_3factor$uniquenesses,3)
## v1 v2 v3 v4 v5 v6 v7 v8 v9
## 0.209 0.441 0.336 0.379 0.617 0.541 0.458 0.726 0.801
#Residual covariances (misfit)
round(EFApdat1_3factor$residual - diag(EFApdat1_3factor$uniquenesses),3)
## v1 v2 v3 v4 v5 v6 v7 v8 v9
## v1 0.000 -0.001 -0.005 0.008 0.035 -0.037 -0.026 0.027 -0.011
## v2 -0.001 0.000 0.008 -0.025 -0.017 0.043 0.048 -0.057 0.006
## v3 -0.005 0.008 0.000 0.012 -0.023 0.002 -0.018 0.035 -0.002
## v4 0.008 -0.025 0.012 0.000 0.014 -0.007 -0.001 -0.015 0.014
## v5 0.035 -0.017 -0.023 0.014 0.000 -0.006 0.027 -0.085 0.040
## v6 -0.037 0.043 0.002 -0.007 -0.006 0.000 -0.026 0.086 -0.047
## v7 -0.026 0.048 -0.018 -0.001 0.027 -0.026 0.000 0.018 -0.033
## v8 0.027 -0.057 0.035 -0.015 -0.085 0.086 0.018 0.000 0.025
## v9 -0.011 0.006 -0.002 0.014 0.040 -0.047 -0.033 0.025 0.000
Lets have a look at Person 2’s data and descriptives.
#data structure
head(pdat2,10)
## id v1 v2 v3 v4 v5 v6 v7 v8 v9
## 101 2 -0.90 -1.27 -1.16 -1.24 0.33 -0.25 -0.19 -0.58 -0.14
## 102 2 0.19 0.06 0.55 0.04 0.07 -0.43 -0.55 0.60 0.65
## 103 2 -0.21 0.53 0.32 0.37 0.53 0.81 0.66 0.12 0.57
## 104 2 -1.84 -0.87 -2.02 0.92 0.00 0.39 -0.18 1.04 0.06
## 105 2 0.56 1.28 -0.32 -0.28 1.18 0.36 -0.89 -0.78 -0.51
## 106 2 0.50 0.50 -0.72 0.16 1.17 -0.39 -0.35 -1.52 0.50
## 107 2 -0.82 -1.02 -0.56 -0.15 0.86 1.05 -0.41 0.27 0.92
## 108 2 -0.24 -0.02 0.25 0.12 -0.74 -0.19 -0.47 -1.62 0.05
## 109 2 0.10 -1.63 -0.41 1.41 0.91 0.50 0.54 0.71 2.79
## 110 2 1.03 1.40 1.12 -0.17 0.53 0.28 -1.57 -0.34 -0.04
#descriptives (without id column)
describe(pdat2[,-1])
## vars n mean sd median trimmed mad min max range skew kurtosis
## v1 1 100 0.09 1.03 0.04 0.11 1.05 -2.31 2.14 4.45 -0.18 -0.41
## v2 2 100 0.10 1.04 0.22 0.14 0.90 -3.35 2.43 5.78 -0.44 0.41
## v3 3 100 0.09 1.09 0.04 0.09 1.05 -2.61 2.59 5.20 0.04 -0.38
## v4 4 100 -0.10 1.00 -0.05 -0.06 0.88 -2.86 2.20 5.06 -0.34 0.33
## v5 5 100 0.00 1.03 0.10 0.03 0.89 -4.43 2.98 7.41 -0.81 3.22
## v6 6 100 0.10 1.07 0.20 0.12 1.15 -2.84 3.33 6.17 -0.09 0.17
## v7 7 100 0.04 0.92 0.12 0.02 0.87 -2.15 2.44 4.59 0.11 -0.03
## v8 8 100 0.26 1.12 0.31 0.29 1.14 -2.87 2.78 5.65 -0.29 -0.25
## v9 9 100 0.03 1.02 0.04 0.01 1.02 -2.86 2.79 5.65 0.07 0.00
## se
## v1 0.10
## v2 0.10
## v3 0.11
## v4 0.10
## v5 0.10
## v6 0.11
## v7 0.09
## v8 0.11
## v9 0.10
Data appear to already be in standardized form.
Lets make a plot of the 9-variate raw time-series.
#preparing data
#adding time variable
day <- 1:100
pdat2_plot <- cbind(pdat2$id,day,pdat2[,-1])
#Plotting observed scores
ggplot(data=pdat2_plot, aes(x=day)) +
geom_line(aes(y=v1), color= 1) +
geom_line(aes(y=v2), color= 2) +
geom_line(aes(y=v3), color= 3) +
geom_line(aes(y=v4), color= 4) +
geom_line(aes(y=v5), color= 5) +
geom_line(aes(y=v6), color= 6) +
geom_line(aes(y=v7), color= 7) +
geom_line(aes(y=v8), color= 8) +
geom_line(aes(y=v9), color= 9) +
xlab("Day") + ylab("Observed Score") +
scale_x_continuous(limits=c(0,100)) +
scale_y_continuous(limits=c(-3,3)) +
ggtitle("Person 2: Multivariate Time Series")
## Warning: Removed 1 rows containing missing values (geom_path).
Lets run the factor analysis on Person B, pdat2
. First we should check if the number of factors is the same.
#correlation matrix
round(cor(pdat2[,-1]),2)
## v1 v2 v3 v4 v5 v6 v7 v8 v9
## v1 1.00 0.78 0.79 0.21 0.05 0.24 -0.04 -0.13 -0.13
## v2 0.78 1.00 0.69 0.11 0.14 0.26 -0.01 -0.07 -0.15
## v3 0.79 0.69 1.00 0.19 0.01 0.23 0.01 -0.03 -0.11
## v4 0.21 0.11 0.19 1.00 0.49 0.55 0.39 0.29 0.32
## v5 0.05 0.14 0.01 0.49 1.00 0.47 0.30 0.09 0.28
## v6 0.24 0.26 0.23 0.55 0.47 1.00 0.28 0.12 0.31
## v7 -0.04 -0.01 0.01 0.39 0.30 0.28 1.00 0.56 0.52
## v8 -0.13 -0.07 -0.03 0.29 0.09 0.12 0.56 1.00 0.44
## v9 -0.13 -0.15 -0.11 0.32 0.28 0.31 0.52 0.44 1.00
corrplot(cor(pdat2[,-1]), order = "original", tl.col='black', tl.cex=.75)
#parallel analysis for number of factors
fa.parallel(x=pdat2[,-1], fm="minres", fa="fa")
## Parallel analysis suggests that the number of factors = 3 and the number of components = NA
#multiple methods for number of factors
nScree(x=pdat2[,-1],model="factors")
## noc naf nparallel nkaiser
## 1 2 2 2 3
plot(nScree(x=pdat2[,-1],model="factors"))
Hmmm… this peson appears to have a 2-factor solution. That may be an interesting interindividual difference. But it may also be problematic if we want to do comparisons in the same factor space.
The information above points towards 2 factors. What are those eigen values?
eigen(cor(pdat2[,-1]))$values
## [1] 2.9515136 2.5756903 1.1183328 0.5704920 0.5448644 0.4283432 0.3945002
## [8] 0.2616947 0.1545689
Hmmm… Ok lets look at the 2-factor and 3-factor solutions.
#2-factor model
EFApdat2_2factor <- fa(r = pdat1[,-1], nfactors = 2,
rotate = "oblimin",
fm = "pa",
scores="regression")
EFApdat2_2factor
## Factor Analysis using method = pa
## Call: fa(r = pdat1[, -1], nfactors = 2, rotate = "oblimin", scores = "regression",
## fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA2 h2 u2 com
## v1 0.85 0.02 0.724 0.28 1.0
## v2 0.74 0.05 0.564 0.44 1.0
## v3 0.84 -0.05 0.687 0.31 1.0
## v4 0.02 0.70 0.496 0.50 1.0
## v5 0.06 0.63 0.418 0.58 1.0
## v6 0.02 0.52 0.280 0.72 1.0
## v7 -0.18 0.43 0.179 0.82 1.3
## v8 -0.10 0.36 0.122 0.88 1.2
## v9 -0.09 0.28 0.072 0.93 1.2
##
## PA1 PA2
## SS loadings 2.00 1.54
## Proportion Var 0.22 0.17
## Cumulative Var 0.22 0.39
## Proportion Explained 0.57 0.43
## Cumulative Proportion 0.57 1.00
##
## With factor correlations of
## PA1 PA2
## PA1 1.00 0.24
## PA2 0.24 1.00
##
## Mean item complexity = 1.1
## Test of the hypothesis that 2 factors are sufficient.
##
## The degrees of freedom for the null model are 36 and the objective function was 2.55 with Chi Square of 242.79
## The degrees of freedom for the model are 19 and the objective function was 0.48
##
## The root mean square of the residuals (RMSR) is 0.09
## The df corrected root mean square of the residuals is 0.12
##
## The harmonic number of observations is 100 with the empirical chi square 53.55 with prob < 3.9e-05
## The total number of observations was 100 with Likelihood Chi Square = 44.91 with prob < 0.00071
##
## Tucker Lewis Index of factoring reliability = 0.759
## RMSEA index = 0.123 and the 90 % confidence intervals are 0.073 0.162
## BIC = -42.59
## Fit based upon off diagonal values = 0.9
## Measures of factor score adequacy
## PA1 PA2
## Correlation of scores with factors 0.93 0.85
## Multiple R square of scores with factors 0.86 0.72
## Minimum correlation of possible factor scores 0.72 0.44
#3-factor model
EFApdat2_3factor <- fa(r = pdat1[,-1], nfactors = 3,
rotate = "oblimin",
fm = "pa",
scores="regression")
EFApdat2_3factor
## Factor Analysis using method = pa
## Call: fa(r = pdat1[, -1], nfactors = 3, rotate = "oblimin", scores = "regression",
## fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA2 PA3 h2 u2 com
## v1 0.90 -0.04 0.06 0.79 0.21 1.0
## v2 0.74 0.05 0.00 0.56 0.44 1.0
## v3 0.80 0.02 -0.10 0.66 0.34 1.0
## v4 0.00 0.78 0.04 0.62 0.38 1.0
## v5 0.09 0.54 0.15 0.38 0.62 1.2
## v6 -0.03 0.70 -0.10 0.46 0.54 1.0
## v7 -0.03 0.01 0.73 0.54 0.46 1.0
## v8 0.02 0.04 0.51 0.27 0.73 1.0
## v9 0.03 -0.03 0.45 0.20 0.80 1.0
##
## PA1 PA2 PA3
## SS loadings 2.01 1.41 1.07
## Proportion Var 0.22 0.16 0.12
## Cumulative Var 0.22 0.38 0.50
## Proportion Explained 0.45 0.31 0.24
## Cumulative Proportion 0.45 0.76 1.00
##
## With factor correlations of
## PA1 PA2 PA3
## PA1 1.00 0.21 -0.02
## PA2 0.21 1.00 0.26
## PA3 -0.02 0.26 1.00
##
## Mean item complexity = 1
## Test of the hypothesis that 3 factors are sufficient.
##
## The degrees of freedom for the null model are 36 and the objective function was 2.55 with Chi Square of 242.79
## The degrees of freedom for the model are 12 and the objective function was 0.15
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.06
##
## The harmonic number of observations is 100 with the empirical chi square 7.48 with prob < 0.82
## The total number of observations was 100 with Likelihood Chi Square = 14.19 with prob < 0.29
##
## Tucker Lewis Index of factoring reliability = 0.967
## RMSEA index = 0.051 and the 90 % confidence intervals are 0 0.116
## BIC = -41.07
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy
## PA1 PA2 PA3
## Correlation of scores with factors 0.94 0.87 0.81
## Multiple R square of scores with factors 0.88 0.76 0.66
## Minimum correlation of possible factor scores 0.75 0.51 0.32
Let’s go with the 3 factor solution. It looks a bit better all around.
And we can look at the reduced 3-variate latent time-series.
#remaking plot data
pdat2_plot <- cbind(pdat2$id,day,pdat2[,-1],EFApdat2_3factor$scores)
#Plotting factor scores
ggplot(data=pdat2_plot, aes(x=day)) +
geom_line(aes(y=PA1), color= 1) +
geom_line(aes(y=PA2), color= 2) +
geom_line(aes(y=PA3), color= 3) +
xlab("Day") + ylab("Factor Score") +
scale_x_continuous(limits=c(0,100)) +
scale_y_continuous(limits=c(-3,3)) +
ggtitle("Person 2: Latent Factor Time Series")
We can look at the comparison of Person 1’s factor solution with Person 2’s factor solution.
Initially we could characterise the between-person differences with respect to number of factors (according to some established criteria (e.g., Kaiser rule: # of eigenvalues > 1)
#Person 1
nScree(x=pdat1[,-1],model="factors")
## noc naf nparallel nkaiser
## 1 2 2 2 3
#Person 2
nScree(x=pdat2[,-1],model="factors")
## noc naf nparallel nkaiser
## 1 2 2 2 3
With the same size p x q factor loading matrix, \(\boldsymbol{\Lambda}\), for the two samples (here 9 x 3), we can compare the pattern of loadings between the two samples using the Tucker Index of Factor Congruence.
This is easily obatined by applying the factor.congruence
function in the psych
package to two factor loading matrices.
fa.congruence(EFApdat1_3factor,EFApdat2_3factor)
## PA1 PA2 PA3
## PA1 1.00 0.03 0.00
## PA2 0.03 1.00 0.04
## PA3 0.00 0.04 1.00
We see very good alignment across the two persons! From a person-specific perspective, this then gives us evidence that the two persons might be combined together in a sample.
We make use of the same 2-person data used above, but now in a confirmatory framework.
Confirmatory Factor Analysis is usually conduced within a Structural Equation Modeling (SEM) framework. There are a number of SEM packages in R. We currently tend to be using the lavaan
package.
Lots of good information and instruction can be found on the package website … http://lavaan.ugent.be.
Within an SEM framework, we fit mulitple models to each persons data to determine, through formal tests, which of the hypothesized model fits best.
\[ \boldsymbol{\Sigma} = \boldsymbol{\Lambda}\boldsymbol{\Psi}\boldsymbol{\Lambda}' + \boldsymbol{\Theta} \] where \(\boldsymbol{\Sigma}\) is a p x p covariance (or correlation) matrix of the observed variables, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{\Psi}\) is a q x q covariance matrix of the latent factor variables, and \(\boldsymbol{\Theta}\) is a diagonal matrix of unique factor variances.
Then we use nested pairwise invariance tests to formally test if the two individuals can be represented by the same model, or require person-specific models.
We specify a 3-factor model by defining the latent variables and adding in some additional arrows. Each factor must be indicated by manifest variables, and must have a variance (maybe covariance and mean).
The identification constraints are needed and important. We will not go into all of those details here.
We set up a 1-factor model …
#Model with 1 common factor
Ptech_1factor <- ' #start of model
# latent variable definitions (common factors)
f1 =~ 1*v1 +
NA*v2 +
NA*v3 +
NA*v4 +
NA*v5 +
NA*v6 +
NA*v7 +
NA*v8 +
NA*v9
# latent variable variances
f1 ~~ NA*f1
# latent variable covariances
# latent variable means
# manifest variable variances (uniquenesses)
v1 ~~ v1
v2 ~~ v2
v3 ~~ v3
v4 ~~ v4
v5 ~~ v5
v6 ~~ v6
v7 ~~ v7
v8 ~~ v8
v9 ~~ v9
# manifest variable covariances (uniquenesses)
# manifest variable means
' #end of model
We also set up a 3-factor model …
#Model with 3 common factors
Ptech_3factor <- ' #start of model
# latent variable definitions (common factors)
f1 =~ 1*v1 +
NA*v2 +
NA*v3
f2 =~ 1*v4 +
NA*v5 +
NA*v6
f3 =~ 1*v7 +
NA*v8 +
NA*v9
# latent variable variances
f1 ~~ NA*f1
f2 ~~ NA*f2
f3 ~~ NA*f3
# latent variable covariances
f1 ~~ f2 + f3
f2 ~~ f3
# latent variable means
# manifest variable variances (uniquenesses)
v1 ~~ v1
v2 ~~ v2
v3 ~~ v3
v4 ~~ v4
v5 ~~ v5
v6 ~~ v6
v7 ~~ v7
v8 ~~ v8
v9 ~~ v9
# manifest variable covariances (uniquenesses)
# manifest variable means
' #end of model
Second, we fit the specified model to the data to obtain parameter estimates.
The 1-factor model for Person 1
fit_p1_Ptech_1factor <- lavaan(Ptech_1factor, data=pdat1, mimic = "mplus")
summary(fit_p1_Ptech_1factor, standardized=TRUE, fit.measures=TRUE)
## lavaan (0.5-23.1097) converged normally after 17 iterations
##
## Number of observations 100
##
## Number of missing patterns 1
##
## Estimator ML
## Minimum Function Test Statistic 128.229
## Degrees of freedom 36
## P-value (Chi-square) 0.000
##
## Model test baseline model:
##
## Minimum Function Test Statistic 255.119
## Degrees of freedom 36
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.579
## Tucker-Lewis Index (TLI) 0.579
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1159.852
## Loglikelihood unrestricted model (H1) -1095.737
##
## Number of free parameters 18
## Akaike (AIC) 2355.704
## Bayesian (BIC) 2402.597
## Sample-size adjusted Bayesian (BIC) 2345.748
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.160
## 90 Percent Confidence Interval 0.131 0.190
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.150
##
## Parameter Estimates:
##
## Information Observed
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 =~
## v1 1.000 0.826 0.873
## v2 0.882 0.114 7.724 0.000 0.729 0.748
## v3 0.881 0.106 8.344 0.000 0.728 0.801
## v4 0.223 0.123 1.804 0.071 0.184 0.196
## v5 0.247 0.113 2.192 0.028 0.204 0.235
## v6 0.164 0.130 1.261 0.207 0.135 0.139
## v7 -0.036 0.124 -0.289 0.772 -0.030 -0.031
## v8 0.052 0.135 0.388 0.698 0.043 0.042
## v9 0.018 0.127 0.141 0.888 0.015 0.015
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .v1 0.000 0.000 0.000
## .v2 0.000 0.000 0.000
## .v3 0.000 0.000 0.000
## .v4 0.000 0.000 0.000
## .v5 0.000 0.000 0.000
## .v6 0.000 0.000 0.000
## .v7 0.000 0.000 0.000
## .v8 0.000 0.000 0.000
## .v9 0.000 0.000 0.000
## f1 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 0.683 0.137 4.984 0.000 1.000 1.000
## .v1 0.214 0.067 3.179 0.001 0.214 0.238
## .v2 0.417 0.076 5.452 0.000 0.417 0.440
## .v3 0.296 0.062 4.765 0.000 0.296 0.358
## .v4 0.847 0.121 7.018 0.000 0.847 0.962
## .v5 0.713 0.102 6.997 0.000 0.713 0.945
## .v6 0.927 0.132 7.042 0.000 0.927 0.981
## .v7 0.886 0.125 7.070 0.000 0.886 0.999
## .v8 1.060 0.150 7.069 0.000 1.060 0.998
## .v9 0.939 0.133 7.071 0.000 0.939 1.000
The 3-factor model for Person 1
fit_p1_Ptech_3factor <- lavaan(Ptech_3factor, data=pdat1, mimic = "mplus")
summary(fit_p1_Ptech_3factor, standardized=TRUE, fit.measures=TRUE)
## lavaan (0.5-23.1097) converged normally after 28 iterations
##
## Number of observations 100
##
## Number of missing patterns 1
##
## Estimator ML
## Minimum Function Test Statistic 38.495
## Degrees of freedom 33
## P-value (Chi-square) 0.235
##
## Model test baseline model:
##
## Minimum Function Test Statistic 255.119
## Degrees of freedom 36
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.975
## Tucker-Lewis Index (TLI) 0.973
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1114.984
## Loglikelihood unrestricted model (H1) -1095.737
##
## Number of free parameters 21
## Akaike (AIC) 2271.969
## Bayesian (BIC) 2326.677
## Sample-size adjusted Bayesian (BIC) 2260.354
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.041
## 90 Percent Confidence Interval 0.000 0.087
## P-value RMSEA <= 0.05 0.581
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.065
##
## Parameter Estimates:
##
## Information Observed
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 =~
## v1 1.000 0.827 0.874
## v2 0.878 0.114 7.727 0.000 0.726 0.746
## v3 0.886 0.108 8.219 0.000 0.733 0.807
## f2 =~
## v4 1.000 0.766 0.816
## v5 0.709 0.162 4.389 0.000 0.543 0.625
## v6 0.777 0.166 4.679 0.000 0.595 0.613
## f3 =~
## v7 1.000 0.668 0.710
## v8 0.867 0.333 2.603 0.009 0.579 0.562
## v9 0.623 0.241 2.581 0.010 0.416 0.430
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 ~~
## f2 0.143 0.080 1.787 0.074 0.225 0.225
## f3 -0.013 0.077 -0.173 0.863 -0.024 -0.024
## f2 ~~
## f3 0.144 0.078 1.837 0.066 0.281 0.281
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .v1 0.000 0.000 0.000
## .v2 0.000 0.000 0.000
## .v3 0.000 0.000 0.000
## .v4 0.000 0.000 0.000
## .v5 0.000 0.000 0.000
## .v6 0.000 0.000 0.000
## .v7 0.000 0.000 0.000
## .v8 0.000 0.000 0.000
## .v9 0.000 0.000 0.000
## f1 0.000 0.000 0.000
## f2 0.000 0.000 0.000
## f3 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 0.684 0.138 4.973 0.000 1.000 1.000
## f2 0.587 0.161 3.654 0.000 1.000 1.000
## f3 0.447 0.196 2.285 0.022 1.000 1.000
## .v1 0.212 0.068 3.107 0.002 0.212 0.237
## .v2 0.420 0.076 5.498 0.000 0.420 0.443
## .v3 0.288 0.063 4.566 0.000 0.288 0.349
## .v4 0.294 0.117 2.506 0.012 0.294 0.334
## .v5 0.460 0.089 5.192 0.000 0.460 0.609
## .v6 0.590 0.105 5.608 0.000 0.590 0.625
## .v7 0.440 0.174 2.530 0.011 0.440 0.496
## .v8 0.726 0.162 4.479 0.000 0.726 0.684
## .v9 0.766 0.127 6.046 0.000 0.766 0.815
Some basic questions to ask after fitting a model -
Did it work?
Does it make sense? Are the parameters viable?
Lets look at diagrams of the model to check if they match what we intended (this should also be done with the summary information.)
Person 1: 1-factor model
semPaths(fit_p1_Ptech_1factor, what="est", intercepts=FALSE,
sizeLat = 7, sizeMan = 7, edge.label.cex = .75)
Person 1: 3-factor model
semPaths(fit_p1_Ptech_3factor, what="est", intercepts=FALSE,
sizeLat = 7, sizeMan = 7, edge.label.cex = .75)
Then, we can look more closely at how well the model fit, and interpret the model more generally.
We do model comparison…
Looking at fit statistics
#Compare model fit statistics
round(cbind(p1_f1=inspect(fit_p1_Ptech_1factor, 'fit.measures'), p1_f3=inspect(fit_p1_Ptech_3factor, 'fit.measures')),3)
## p1_f1 p1_f3
## npar 18.000 21.000
## fmin 0.641 0.192
## chisq 128.229 38.495
## df 36.000 33.000
## pvalue 0.000 0.235
## baseline.chisq 255.119 255.119
## baseline.df 36.000 36.000
## baseline.pvalue 0.000 0.000
## cfi 0.579 0.975
## tli 0.579 0.973
## nnfi 0.579 0.973
## rfi 0.497 0.835
## nfi 0.497 0.849
## pnfi 0.497 0.778
## ifi 0.579 0.975
## rni 0.579 0.975
## logl -1159.852 -1114.984
## unrestricted.logl -1095.737 -1095.737
## aic 2355.704 2271.969
## bic 2402.597 2326.677
## ntotal 100.000 100.000
## bic2 2345.748 2260.354
## rmsea 0.160 0.041
## rmsea.ci.lower 0.131 0.000
## rmsea.ci.upper 0.190 0.087
## rmsea.pvalue 0.000 0.581
## rmr 0.137 0.039
## rmr_nomean 0.137 0.039
## srmr 0.150 0.065
## srmr_bentler 0.150 0.064
## srmr_bentler_nomean 0.155 0.045
## srmr_bollen 0.150 0.064
## srmr_bollen_nomean 0.155 0.044
## srmr_mplus 0.150 0.065
## srmr_mplus_nomean 0.155 0.045
## cn_05 40.771 124.134
## cn_01 46.714 143.294
## gfi 0.767 0.924
## agfi 0.650 0.876
## pgfi 0.511 0.565
## mfi 0.631 0.973
## ecvi NA NA
Formal model test
#Chi-square difference test for nested models
anova(fit_p1_Ptech_1factor, fit_p1_Ptech_3factor)
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff
## fit_p1_Ptech_3factor 33 2272.0 2326.7 38.495
## fit_p1_Ptech_1factor 36 2355.7 2402.6 128.229 89.735 3
## Pr(>Chisq)
## fit_p1_Ptech_3factor
## fit_p1_Ptech_1factor < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can reject the null hypothesis that the models fit the same.
We conclude that the 3-factor model is the optimal model, of those evaluated.
The same is true for Person 2. The 1-factor and 3-factor models for Person 2
#1-factor model
fit_p2_Ptech_1factor <- lavaan(Ptech_1factor, data=pdat2, mimic = "mplus")
#3-factor model
fit_p2_Ptech_3factor <- lavaan(Ptech_3factor, data=pdat2, mimic = "mplus")
#Chi-square difference test for nested models
anova(fit_p2_Ptech_1factor, fit_p2_Ptech_3factor)
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff
## fit_p2_Ptech_3factor 33 2287.7 2342.4 44.865
## fit_p2_Ptech_1factor 36 2437.3 2484.2 200.456 155.59 3
## Pr(>Chisq)
## fit_p2_Ptech_3factor
## fit_p2_Ptech_1factor < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can look at the comparison of Person 1’s factor solution with Person 2’s factor solution.
Formally we set up two 2-group models and follow the logic of standard invariance tests. In the Free model we allow the factor loadings to differ. In the Invariance model we constrain the factor loadings to be the same.
We specify a 3-factor model. This is the same as above.
#Model with 3 common factors
Ptech_3factor <- ' #start of model
# latent variable definitions (common factors)
f1 =~ 1*v1 +
NA*v2 +
NA*v3
f2 =~ 1*v4 +
NA*v5 +
NA*v6
f3 =~ 1*v7 +
NA*v8 +
NA*v9
# latent variable variances
f1 ~~ NA*f1
f2 ~~ NA*f2
f3 ~~ NA*f3
# latent variable covariances
f1 ~~ f2 + f3
f2 ~~ f3
# latent variable means
# manifest variable variances (uniquenesses)
v1 ~~ v1
v2 ~~ v2
v3 ~~ v3
v4 ~~ v4
v5 ~~ v5
v6 ~~ v6
v7 ~~ v7
v8 ~~ v8
v9 ~~ v9
# manifest variable covariances (uniquenesses)
# manifest variable means
' #end of model
Second, we fit the specified model to the data to obtain parameter estimates.
The Free model for multiple groups
fit_Ptech_3factor_Free <- lavaan(Ptech_3factor, data=pdat, mimic = "mplus",
group = "id")
summary(fit_Ptech_3factor_Free, standardized=TRUE, fit.measures=TRUE)
## lavaan (0.5-23.1097) converged normally after 43 iterations
##
## Number of observations per group
## 1 100
## 2 100
##
## Number of missing patterns per group
## 1 1
## 2 1
##
## Estimator ML
## Minimum Function Test Statistic 83.360
## Degrees of freedom 66
## P-value (Chi-square) 0.073
##
## Chi-square for each group:
##
## 1 38.495
## 2 44.865
##
## Model test baseline model:
##
## Minimum Function Test Statistic 656.505
## Degrees of freedom 72
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.970
## Tucker-Lewis Index (TLI) 0.968
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -2237.843
## Loglikelihood unrestricted model (H1) -2196.163
##
## Number of free parameters 42
## Akaike (AIC) 4559.687
## Bayesian (BIC) 4698.216
## Sample-size adjusted Bayesian (BIC) 4565.156
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.051
## 90 Percent Confidence Interval 0.000 0.082
## P-value RMSEA <= 0.05 0.454
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.065
##
## Parameter Estimates:
##
## Information Observed
## Standard Errors Standard
##
##
## Group 1 [1]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 =~
## v1 1.000 0.827 0.874
## v2 0.878 0.114 7.726 0.000 0.726 0.746
## v3 0.886 0.108 8.219 0.000 0.733 0.807
## f2 =~
## v4 1.000 0.766 0.816
## v5 0.709 0.162 4.389 0.000 0.543 0.625
## v6 0.777 0.166 4.679 0.000 0.595 0.613
## f3 =~
## v7 1.000 0.668 0.710
## v8 0.867 0.333 2.603 0.009 0.579 0.562
## v9 0.623 0.241 2.581 0.010 0.416 0.430
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 ~~
## f2 0.143 0.080 1.787 0.074 0.225 0.225
## f3 -0.013 0.077 -0.173 0.863 -0.024 -0.024
## f2 ~~
## f3 0.144 0.078 1.837 0.066 0.281 0.281
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .v1 0.000 0.000 0.000
## .v2 0.000 0.000 0.000
## .v3 0.000 0.000 0.000
## .v4 0.000 0.000 0.000
## .v5 0.000 0.000 0.000
## .v6 0.000 0.000 0.000
## .v7 0.000 0.000 0.000
## .v8 0.000 0.000 0.000
## .v9 0.000 0.000 0.000
## f1 0.000 0.000 0.000
## f2 0.000 0.000 0.000
## f3 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 0.684 0.138 4.973 0.000 1.000 1.000
## f2 0.587 0.161 3.654 0.000 1.000 1.000
## f3 0.447 0.196 2.285 0.022 1.000 1.000
## .v1 0.212 0.068 3.107 0.002 0.212 0.237
## .v2 0.420 0.076 5.498 0.000 0.420 0.443
## .v3 0.288 0.063 4.566 0.000 0.288 0.349
## .v4 0.294 0.117 2.506 0.012 0.294 0.334
## .v5 0.460 0.089 5.192 0.000 0.460 0.609
## .v6 0.590 0.105 5.608 0.000 0.590 0.625
## .v7 0.440 0.174 2.530 0.011 0.440 0.496
## .v8 0.726 0.162 4.479 0.000 0.726 0.684
## .v9 0.766 0.127 6.046 0.000 0.766 0.815
##
##
## Group 2 [2]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 =~
## v1 1.000 0.975 0.950
## v2 0.880 0.081 10.921 0.000 0.859 0.823
## v3 0.928 0.083 11.231 0.000 0.905 0.835
## f2 =~
## v4 1.000 0.771 0.774
## v5 0.829 0.159 5.207 0.000 0.639 0.625
## v6 0.986 0.183 5.383 0.000 0.760 0.713
## f3 =~
## v7 1.000 0.744 0.812
## v8 0.991 0.176 5.618 0.000 0.738 0.647
## v9 0.917 0.178 5.152 0.000 0.683 0.674
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 ~~
## f2 0.201 0.093 2.149 0.032 0.267 0.267
## f3 -0.069 0.086 -0.800 0.424 -0.095 -0.095
## f2 ~~
## f3 0.312 0.090 3.445 0.001 0.543 0.543
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .v1 0.000 0.000 0.000
## .v2 0.000 0.000 0.000
## .v3 0.000 0.000 0.000
## .v4 0.000 0.000 0.000
## .v5 0.000 0.000 0.000
## .v6 0.000 0.000 0.000
## .v7 0.000 0.000 0.000
## .v8 0.000 0.000 0.000
## .v9 0.000 0.000 0.000
## f1 0.000 0.000 0.000
## f2 0.000 0.000 0.000
## f3 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 0.951 0.157 6.052 0.000 1.000 1.000
## f2 0.594 0.155 3.822 0.000 1.000 1.000
## f3 0.554 0.139 3.996 0.000 1.000 1.000
## .v1 0.103 0.054 1.923 0.054 0.103 0.098
## .v2 0.352 0.064 5.457 0.000 0.352 0.323
## .v3 0.357 0.067 5.300 0.000 0.357 0.303
## .v4 0.397 0.104 3.819 0.000 0.397 0.401
## .v5 0.636 0.111 5.738 0.000 0.636 0.609
## .v6 0.559 0.120 4.666 0.000 0.559 0.492
## .v7 0.287 0.091 3.140 0.002 0.287 0.341
## .v8 0.756 0.133 5.672 0.000 0.756 0.581
## .v9 0.559 0.111 5.027 0.000 0.559 0.546
The Invariance model for multiple groups
fit_Ptech_3factor_Invariance <- lavaan(Ptech_3factor, data=pdat, mimic = "mplus",
group = "id",
group.equal = c("loadings"))
summary(fit_Ptech_3factor_Invariance, standardized=TRUE, fit.measures=TRUE)
## lavaan (0.5-23.1097) converged normally after 33 iterations
##
## Number of observations per group
## 1 100
## 2 100
##
## Number of missing patterns per group
## 1 1
## 2 1
##
## Estimator ML
## Minimum Function Test Statistic 85.107
## Degrees of freedom 72
## P-value (Chi-square) 0.139
##
## Chi-square for each group:
##
## 1 39.609
## 2 45.497
##
## Model test baseline model:
##
## Minimum Function Test Statistic 656.505
## Degrees of freedom 72
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.978
## Tucker-Lewis Index (TLI) 0.978
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -2238.717
## Loglikelihood unrestricted model (H1) -2196.163
##
## Number of free parameters 36
## Akaike (AIC) 4549.433
## Bayesian (BIC) 4668.173
## Sample-size adjusted Bayesian (BIC) 4554.121
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.043
## 90 Percent Confidence Interval 0.000 0.075
## P-value RMSEA <= 0.05 0.612
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.067
##
## Parameter Estimates:
##
## Information Observed
## Standard Errors Standard
##
##
## Group 1 [1]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 =~
## v1 1.000 0.819 0.868
## v2 (.p2.) 0.879 0.066 13.368 0.000 0.720 0.743
## v3 (.p3.) 0.911 0.066 13.801 0.000 0.747 0.815
## f2 =~
## v4 1.000 0.722 0.779
## v5 (.p5.) 0.774 0.113 6.845 0.000 0.559 0.641
## v6 (.p6.) 0.884 0.124 7.129 0.000 0.638 0.646
## f3 =~
## v7 1.000 0.601 0.646
## v8 (.p8.) 0.971 0.154 6.317 0.000 0.583 0.568
## v9 (.p9.) 0.831 0.145 5.731 0.000 0.499 0.505
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 ~~
## f2 0.136 0.075 1.806 0.071 0.230 0.230
## f3 -0.009 0.069 -0.126 0.900 -0.018 -0.018
## f2 ~~
## f3 0.123 0.067 1.831 0.067 0.284 0.284
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .v1 0.000 0.000 0.000
## .v2 0.000 0.000 0.000
## .v3 0.000 0.000 0.000
## .v4 0.000 0.000 0.000
## .v5 0.000 0.000 0.000
## .v6 0.000 0.000 0.000
## .v7 0.000 0.000 0.000
## .v8 0.000 0.000 0.000
## .v9 0.000 0.000 0.000
## f1 0.000 0.000 0.000
## f2 0.000 0.000 0.000
## f3 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 0.671 0.120 5.572 0.000 1.000 1.000
## f2 0.522 0.127 4.092 0.000 1.000 1.000
## f3 0.361 0.110 3.274 0.001 1.000 1.000
## .v1 0.219 0.060 3.620 0.000 0.219 0.246
## .v2 0.421 0.074 5.714 0.000 0.421 0.448
## .v3 0.282 0.059 4.754 0.000 0.282 0.336
## .v4 0.338 0.094 3.580 0.000 0.338 0.393
## .v5 0.447 0.082 5.486 0.000 0.447 0.589
## .v6 0.568 0.102 5.567 0.000 0.568 0.582
## .v7 0.505 0.114 4.444 0.000 0.505 0.583
## .v8 0.713 0.132 5.404 0.000 0.713 0.677
## .v9 0.728 0.122 5.960 0.000 0.728 0.745
##
##
## Group 2 [2]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 =~
## v1 1.000 0.979 0.952
## v2 (.p2.) 0.879 0.066 13.368 0.000 0.861 0.823
## v3 (.p3.) 0.911 0.066 13.801 0.000 0.893 0.829
## f2 =~
## v4 1.000 0.809 0.803
## v5 (.p5.) 0.774 0.113 6.845 0.000 0.626 0.615
## v6 (.p6.) 0.884 0.124 7.129 0.000 0.715 0.680
## f3 =~
## v7 1.000 0.768 0.832
## v8 (.p8.) 0.971 0.154 6.317 0.000 0.745 0.651
## v9 (.p9.) 0.831 0.145 5.731 0.000 0.638 0.640
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 ~~
## f2 0.207 0.097 2.139 0.032 0.261 0.261
## f3 -0.066 0.089 -0.747 0.455 -0.088 -0.088
## f2 ~~
## f3 0.336 0.091 3.674 0.000 0.541 0.541
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .v1 0.000 0.000 0.000
## .v2 0.000 0.000 0.000
## .v3 0.000 0.000 0.000
## .v4 0.000 0.000 0.000
## .v5 0.000 0.000 0.000
## .v6 0.000 0.000 0.000
## .v7 0.000 0.000 0.000
## .v8 0.000 0.000 0.000
## .v9 0.000 0.000 0.000
## f1 0.000 0.000 0.000
## f2 0.000 0.000 0.000
## f3 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f1 0.959 0.154 6.235 0.000 1.000 1.000
## f2 0.654 0.144 4.532 0.000 1.000 1.000
## f3 0.590 0.134 4.396 0.000 1.000 1.000
## .v1 0.098 0.051 1.920 0.055 0.098 0.093
## .v2 0.354 0.064 5.531 0.000 0.354 0.323
## .v3 0.361 0.066 5.454 0.000 0.361 0.312
## .v4 0.361 0.098 3.689 0.000 0.361 0.356
## .v5 0.645 0.109 5.901 0.000 0.645 0.622
## .v6 0.595 0.114 5.213 0.000 0.595 0.538
## .v7 0.262 0.088 2.967 0.003 0.262 0.308
## .v8 0.754 0.132 5.724 0.000 0.754 0.576
## .v9 0.588 0.108 5.437 0.000 0.588 0.591
Some basic questions to ask after fitting a model -
Did it work?
Does it make sense? Are the parameters viable?
Lets look at diagrams of the models to check if they match what we intended (this should also be done with the summary information.)
Free Model
semPaths(fit_Ptech_3factor_Free, what="est", intercepts=FALSE,
sizeLat = 7, sizeMan = 7, edge.label.cex = .75)
Invariance Model
semPaths(fit_Ptech_3factor_Invariance, what="est", intercepts=FALSE,
sizeLat = 7, sizeMan = 7, edge.label.cex = .75)
Formal model test
#Chi-square difference test for nested models
anova(fit_Ptech_3factor_Free, fit_Ptech_3factor_Invariance)
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff
## fit_Ptech_3factor_Free 66 4559.7 4698.2 83.360
## fit_Ptech_3factor_Invariance 72 4549.4 4668.2 85.107 1.7467 6
## Pr(>Chisq)
## fit_Ptech_3factor_Free
## fit_Ptech_3factor_Invariance 0.9415
We cannot reject the null hypothesis that the models fit the same.
We conclude that there is model invariance (of the models evaluated) between Person 1 and Person 2.
This gives us a strong basis for pooling these two individuals into a multi-person analytic framework.
We have made a first push into P-technique Factor Analysis models - providing a basis for doing analysis of multivariate intraindividual variability. Be wary of the assumption of independence of observations that is the basis of these models - and look for extensions into Dynamic Factor Analysis models that additionally model the time-dependencies. (Ideally, what was done here should only be done after those dependencies have been removed). There are still a variety of details that must be worked out and understood as we try to fit these models in R to many persons data. But, since looping is so easy - it should be relatively straightforward to extend to the many-person case. Now we just need the data!
Good Luck! And, as always please be careful.