1 Overview:

Item response theory (IRT) is a paradigm for investigating the relationship between an individual’s response to a single test item and their performance on an overall measure of the ability or trait that item was intended to measure. Many models exist in the IRT field for evaulating how well an item captures an underlying latent trait, but some of the most popular IRT models are logistic IRT models. This tutorial will introduce you to 1-parameter, 2-parameter, and 3-parameter logistic IRT models.

For each model you will create:

Summary plots
Estimate latent ability scores
Test the fit of the model

2 1-parameter logistic model (1PL)

At the core of all the IRT models presented in this tutorial is the item response function (IRF). The IRF estimates the probability of getting an item “correct” (i.e., \(X=1\)) as a function of item characteristics and the individual’s latent trait/ability level (\(\theta\)). These item response functions are defined by a logistic curve (i.e., an “S”-shape from 0-1).

The 1PL (also called the Rasch model) IRT model describes test items in terms of only one parameter, item difficulty, \(b\). Item difficulty is simply how hard an item is (how high does the latent trait ability level need to be in order to have a 50% chance of getting the item right?). \(b\) is estimated for each item of the test.

The item response function for the 1PL model is given below:

\[P(X=1|\theta,b)=\frac{e^{(\theta-b)}}{1+e^{(\theta-b)}}\] ##Step 0: Read in the data We will use (simulated) results from N=500 individual taking a 10-item test (V1-V10). Items are coded 1 for correct and 0 for incorrect responses. When we get descriptives of the data, we see that the items differ in terms of the proportion of people who answered correctly, so we expect that we have some differences in item difficulty here.

library(ltm)
library(psych)
irtdat<-read.table("ouirt.dat",header=F)
head(irtdat)

V2	V3	V4	V5	V6	V7	V8	V9
1	1	1	1	1	1	1	1
1	0	0	1	0	1	0	0
0	0	0	1	1	1	0	1
0	0	0	0	0	0	0	0
0	0	0	0	0	1	0	0
0	1	0	0	0	0	0	0

describe(irtdat)

	vars	n	mean	sd	trimmed	max	range	skew	kurtosis	se
V1	1	500	0.150	0.3574290	0.0625	1	1	1.9545139	1.823784	0.0159847
V2	2	500	0.268	0.4433612	0.2100	1	1	1.0444577	-0.910918	0.0198277
V3	3	500	0.318	0.4661659	0.2725	1	1	0.7792763	-1.395507	0.0208476
V4	4	500	0.296	0.4569481	0.2450	1	1	0.8910946	-1.208355	0.0204353
V5	5	500	0.438	0.4966380	0.4225	1	1	0.2491795	-1.941781	0.0222103
V6	6	500	0.314	0.4645812	0.2675	1	1	0.7991198	-1.364124	0.0207767
V7	7	500	0.412	0.4926880	0.3900	1	1	0.3565096	-1.876642	0.0220337
V8	8	500	0.334	0.4721120	0.2925	1	1	0.7018165	-1.510463	0.0211135
V9	9	500	0.318	0.4661659	0.2725	1	1	0.7792763	-1.395507	0.0208476
V10	10	500	0.070	0.2554025	0.0000	1	1	3.3604990	9.311589	0.0114219

2.1 Step 1: Fit the 1PL model

We fit the 1PL model to our 500 responses to our 10-item test. That is, we estimate item difficulty, \(b\) based on how people answered the items.

PL1.rasch<-rasch(irtdat)
summary(PL1.rasch)

## 
## Call:
## rasch(data = irtdat)
## 
## Model Summary:
##    log.Lik      AIC      BIC
##  -2587.186 5196.372 5242.733
## 
## Coefficients:
##             value std.err  z.vals
## Dffclt.V1  1.6581  0.1303 12.7219
## Dffclt.V2  0.9818  0.1031  9.5233
## Dffclt.V3  0.7499  0.0968  7.7431
## Dffclt.V4  0.8495  0.0993  8.5522
## Dffclt.V5  0.2482  0.0891  2.7851
## Dffclt.V6  0.7677  0.0973  7.8930
## Dffclt.V7  0.3528  0.0900  3.9186
## Dffclt.V8  0.6794  0.0953  7.1311
## Dffclt.V9  0.7499  0.0968  7.7429
## Dffclt.V10 2.3978  0.1764 13.5892
## Dscrmn     1.3782  0.0741 18.5941
## 
## Integration:
## method: Gauss-Hermite
## quadrature points: 21 
## 
## Optimization:
## Convergence: 0 
## max(|grad|): 0.0021 
## quasi-Newton: BFGS

Here, we see that all of the difficulty estimates have significant z-values. For example, the difficulty estimate for Item 1 is b=1.66, z=12.7. A z-value of greater than 1.65 indicates that the difficulty parameter is significantly greater than zero (item difficulty cannot be negative) at the alpha=0.05 level. Higher difficulty estimates indicate that the item requires a higher level of the latent trait to have a 50% probability of getting the item “correct.”

2.2 Step 2: Plot the item characteristic curves of all 10 items

Item characteristic curves are the logistic curves which result from the fitted Rasch models (e.g., estimated item difficulty, b, plugged into the item response function). Latent trait/ability is plotted on the x-axis (higher values represent hight ability). Probability of a “correct” answer (\(X=1\)) to an item is plotted on the y-axis.

plot(PL1.rasch,type=c("ICC"))

From this plot, we see that item 10 is the most difficult item (it’s curve is farthest to the right), and item 5 is the easiest (it’s curve is farthest to the left). The same conclusions can be drawn by checking the difficulty estimates above.

2.3 Step 3: Plot the item information curves for all 10 items, then the whole test

Item information curves show how much “information” about the latent trait ability an item gives. Mathematically, these are the 1st derivatives of the ICCs. Item information curves peak at the difficulty value (point where the item has the highest discrimination), with less information at ability levels farther from the difficulty estimate.

Practially speaking, we can see how a very difficult item will provide very little information about persons with low ability (because the item is already too hard), and very easy items will provide little information about persons with high ability levels.

plot(PL1.rasch,type=c("IIC"))

Similar to the ICCs, we see that item 10 provides the most information about high ability levels (the peak of its IIC is farthest to the right) and item 5 provides the most information about lower ability levels (the peak of its IIC is farthest to the left). We have seen that all ICCs and IICs for the items have the same shape in the 1PL model (i.e., all items are equally good at providing information about the latent trait). In the 2PL and 3PL models, we will see that this does not have to be the case.

Next, we plot the information curve for the whole test. This is simply the sum of the individual IICs above. Ideally, we want a test which provides fairly good covereage of a wide range of latent ability levels. Otherwise, the test is only good at identifying a limited range of ability levels.

We see that this test provides the most information about slightly-higher-than-average ability levels (the peak is around ability level \(\theta=.5\)), and less information about very high and very low ability levels.

plot(PL1.rasch,type=c("IIC"),items=c(0))

2.4 Step 4: Test the fit of the 1PL model

We run the item.fit function to test whether individual items fit the 1PL model.

item.fit(PL1.rasch,simulate.p.value=T)

## 
## Item-Fit Statistics and P-values
## 
## Call:
## rasch(data = irtdat)
## 
## Alternative: Items do not fit the model
## Ability Categories: 10
## Monte Carlo samples: 100 
## 
##         X^2 Pr(>X^2)
## V1  12.1830   0.5941
## V2  15.1135    0.505
## V3  11.9995   0.7327
## V4  21.0226   0.1782
## V5  15.3784   0.5545
## V6  14.7103   0.5842
## V7   4.0974        1
## V8  23.3690   0.1683
## V9  26.4597   0.0495
## V10 25.7195   0.0693

We see from this that items 8, 9, and 10 perhaps do not fit the 1PL model so well (small p-values). This is an indication that we should consider a different IRT model.

2.5 Step 5: Estimate ability scores & plot

Next, we can take the results of our 1PL model and estimate the latent ability scores of the participants.

We estimate the ability scores with the factor.scores() function in the ltm package.

theta.rasch<-ltm::factor.scores(PL1.rasch)
summary(theta.rasch$score.dat$z1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1.0376  0.1770  0.4631  0.5542  0.9988  2.2795

We can also plot the density curve of the estimated ability scores:

plot(theta.rasch)

We see that the mean of ability scores is around 0, and the standard deviation about 1 (these are by definition-estimated ability scores are standardized). The curve is approximately normal-shaped.

2.6 Step 6: Test for unidimensionality

unidimTest(PL1.rasch,irtdat)

## 
## Unidimensionality Check using Modified Parallel Analysis
## 
## Call:
## rasch(data = irtdat)
## 
## Matrix of tertachoric correlations
##         V1     V2     V3     V4     V5     V6     V7     V8     V9    V10
## V1  1.0000 0.3476 0.3357 0.3959 0.4449 0.4422 0.5215 0.3278 0.4158 0.3403
## V2  0.3476 1.0000 0.3124 0.3945 0.3909 0.3664 0.2408 0.3352 0.4293 0.4101
## V3  0.3357 0.3124 1.0000 0.3933 0.2970 0.3734 0.3011 0.4434 0.4040 0.4058
## V4  0.3959 0.3945 0.3933 1.0000 0.3853 0.5101 0.3035 0.4368 0.5135 0.6024
## V5  0.4449 0.3909 0.2970 0.3853 1.0000 0.3370 0.3526 0.4138 0.5149 0.3979
## V6  0.4422 0.3664 0.3734 0.5101 0.3370 1.0000 0.3137 0.4146 0.4801 0.4795
## V7  0.5215 0.2408 0.3011 0.3035 0.3526 0.3137 1.0000 0.3311 0.2054 0.1714
## V8  0.3278 0.3352 0.4434 0.4368 0.4138 0.4146 0.3311 1.0000 0.5069 0.4533
## V9  0.4158 0.4293 0.4040 0.5135 0.5149 0.4801 0.2054 0.5069 1.0000 0.6099
## V10 0.3403 0.4101 0.4058 0.6024 0.3979 0.4795 0.1714 0.4533 0.6099 1.0000
## 
## Alternative hypothesis: the second eigenvalue of the observed data is substantially larger 
##          than the second eigenvalue of data under the assumed IRT model
## 
## Second eigenvalue in the observed data: 0.6863
## Average of second eigenvalues in Monte Carlo samples: 0.4899
## Monte Carlo samples: 100
## p-value: 0.0297

The test is borderline significant at alpha=0.01 (p=0.0198), so unidimensionality (the idea that we’re measuring a single trait \(\theta\) here) is rejected. Since the 1PL model did not fit very well, we should consider fitting the data to alternative IRT models. The poor fit is perhaps not surprising, given that we know the data were simulated using a 2PL model, and we are not accounting for item differences in discriminability.

3 2-parameter logistic (2PL) IRT model

The item response function for the 2-parameter logistic IRT model is: \[P(X=1|\theta,a,b)=\frac{e^{a(\theta-b)}}{1+e^{a(\theta-b)}}\] The IRF describes the probability that an individual with latent ability level \(\theta\) endorses an item (\(X=1\)) with two item characteristc paramters:
1. \(b\) is the item difficulty (same as 1PL model). Item difficulty is reflected in the position of the item characteristic curve along the x-axis (latent trait ablity).
2. \(a\) is the item discriminability. Discriminability is how well an item is able to discriminate between persons with different ability levels. Item discriminability is reflected in the steepness of the slope of the item characteristic curves.

3.1 Step 1: Fit the 2PL model

We fit the 2PL model with ltm.

PL2.rasch<-ltm(irtdat~z1)
summary(PL2.rasch)

## 
## Call:
## ltm(formula = irtdat ~ z1)
## 
## Model Summary:
##   log.Lik      AIC      BIC
##  -2575.88 5191.759 5276.052
## 
## Coefficients:
##             value std.err z.vals
## Dffclt.V1  1.6318  0.1920 8.4979
## Dffclt.V2  1.1022  0.1550 7.1098
## Dffclt.V3  0.8210  0.1270 6.4627
## Dffclt.V4  0.7778  0.1015 7.6660
## Dffclt.V5  0.2550  0.0925 2.7571
## Dffclt.V6  0.7372  0.1046 7.0473
## Dffclt.V7  0.4766  0.1339 3.5593
## Dffclt.V8  0.6493  0.0994 6.5356
## Dffclt.V9  0.6485  0.0889 7.2943
## Dffclt.V10 1.9257  0.2027 9.5001
## Dscrmn.V1  1.4183  0.2315 6.1265
## Dscrmn.V2  1.1445  0.1741 6.5733
## Dscrmn.V3  1.1910  0.1736 6.8621
## Dscrmn.V4  1.6444  0.2279 7.2157
## Dscrmn.V5  1.3398  0.1851 7.2367
## Dscrmn.V6  1.4973  0.2070 7.2335
## Dscrmn.V7  0.8748  0.1430 6.1185
## Dscrmn.V8  1.5133  0.2070 7.3116
## Dscrmn.V9  1.8716  0.2587 7.2357
## Dscrmn.V10 2.1060  0.4226 4.9834
## 
## Integration:
## method: Gauss-Hermite
## quadrature points: 21 
## 
## Optimization:
## Convergence: 0 
## max(|grad|): 0.00056 
## quasi-Newton: BFGS

All of the item characteristic estimates (difficultly and discrimination) have significant z-values (i.e. greater than 1.65, against the null hypothesis that parameters = 0). For example, for Item 1, the estimated difficulty is \(b=1.63 (z=8.50)\) and the estimated discriminability is \(a=1.42 (z=6.13)\). Higher difficulty values indicate that the item is harder (i.e., higher latent ability to answer correctly); higher discriminability estimates indicate that the item has better ability to tell the difference between different levels of latent ability. These will both be made clearer in the ICC plots.

3.2 Step 2: Plot the item characteristic curves of all 10 items

plot(PL2.rasch,type=c("ICC"))

Unlike the ICCs for the 1PL model, the ICCs for the 2PL model do not all have the same shape. Item curves which are more “spread out” indicate lower discriminability (i.e., that individuals of a range of ability levels have some probability of getting the item correct). Compare this to an item with high discriminability (steep slope): for this item, we have a better estimate of the individual’s latent ability based on whether they got the question right or wrong.

A note about difficulty: Because of the differing slopes, the rank-order of item difficulty changes across different latent ability levels. We can see that item 10 is generally still the most difficult item (i.e. lowest probability of getting correct for most latent trait values, up until about \(\theta=2\)). Items 5 and 6 are roughly the easiest, switching in the rank order somewhere around \(\theta=-.25\).

3.3 Step 3: Plot the item information curves for all 10 items, then the whole test

plot(PL2.rasch,type=c("IIC"))

The item IICs demonstrate that some items provide more information about latent ability for different ability levels. The higher the item discriminability estimate, the more information an item provides about ability levels around the point where there is a 50% chance of getting the item right (i.e. the steepest point in the ICC slope) For example, item 10 (red) clearly provides the most information at high ability levels, around \(\theta=2\), but almost no information about low ability levels (< 0) because the item is already too hard for those participants. In contrast, item 7 (yellow), which has low discriminability, doesn’t give very much information overall, but covers a wide range of ability levels.

Next, we plot the item information curve for the whole test. This is the sum of all the item IICs above.

plot(PL2.rasch,type=c("IIC"),items=c(0))

The IIC for the whole test shows that the test provides the most information for slightly-higher-than average ability levels (about \(\theta=1\)), but does not provide much information about extremely high or low ability levels.

3.4 Step 4: Test the fit of the 2PL model

Next, we test how well the 2PL model fits the data.

item.fit(PL2.rasch,simulate.p.value=T)

## 
## Item-Fit Statistics and P-values
## 
## Call:
## ltm(formula = irtdat ~ z1)
## 
## Alternative: Items do not fit the model
## Ability Categories: 10
## Monte Carlo samples: 100 
## 
##         X^2 Pr(>X^2)
## V1  12.7361   0.6436
## V2  17.3099   0.2871
## V3  20.5614   0.2178
## V4  16.1306   0.7723
## V5  20.6393   0.3762
## V6  22.8042   0.3267
## V7  32.6421   0.1188
## V8  28.8842   0.1287
## V9  26.4138   0.3762
## V10 11.9416   0.8317

All of the items fit the 2PL model (p<0.05). This is not surprising, given that the data were simulated with a 2PL model.

3.5 Step 5: Estimate ability scores & plot

Estimate the individual latent ability scores using the factor.scores() function in the ltm library.

theta.rasch<-ltm::factor.scores(PL2.rasch)
summary(theta.rasch$score.dat$z1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1.0169  0.1424  0.4934  0.5702  1.0052  2.2571

Plot the density curve of the estimated ability scores

plot(theta.rasch)

We see that the estimated ability scores are roughly normally distributed, with mean 0 and SD 1. This is by definition of the 2PL model (i.e., ability estimates are standardized).

3.6 Step 6: Test for unidimensionality

We test for unidimensionality (i.e. is there a single trait \(\theta\) being measured here?)

unidimTest(PL2.rasch,irtdat)

## 
## Unidimensionality Check using Modified Parallel Analysis
## 
## Call:
## ltm(formula = irtdat ~ z1)
## 
## Matrix of tertachoric correlations
##         V1     V2     V3     V4     V5     V6     V7     V8     V9    V10
## V1  1.0000 0.3476 0.3357 0.3959 0.4449 0.4422 0.5215 0.3278 0.4158 0.3403
## V2  0.3476 1.0000 0.3124 0.3945 0.3909 0.3664 0.2408 0.3352 0.4293 0.4101
## V3  0.3357 0.3124 1.0000 0.3933 0.2970 0.3734 0.3011 0.4434 0.4040 0.4058
## V4  0.3959 0.3945 0.3933 1.0000 0.3853 0.5101 0.3035 0.4368 0.5135 0.6024
## V5  0.4449 0.3909 0.2970 0.3853 1.0000 0.3370 0.3526 0.4138 0.5149 0.3979
## V6  0.4422 0.3664 0.3734 0.5101 0.3370 1.0000 0.3137 0.4146 0.4801 0.4795
## V7  0.5215 0.2408 0.3011 0.3035 0.3526 0.3137 1.0000 0.3311 0.2054 0.1714
## V8  0.3278 0.3352 0.4434 0.4368 0.4138 0.4146 0.3311 1.0000 0.5069 0.4533
## V9  0.4158 0.4293 0.4040 0.5135 0.5149 0.4801 0.2054 0.5069 1.0000 0.6099
## V10 0.3403 0.4101 0.4058 0.6024 0.3979 0.4795 0.1714 0.4533 0.6099 1.0000
## 
## Alternative hypothesis: the second eigenvalue of the observed data is substantially larger 
##          than the second eigenvalue of data under the assumed IRT model
## 
## Second eigenvalue in the observed data: 0.6863
## Average of second eigenvalues in Monte Carlo samples: 0.544
## Monte Carlo samples: 100
## p-value: 0.0396

The test is not significant (p=0.0396), hence unidimensionality is not rejected. This is not surprising, given that the data were simulated with a 2PL model.

4 3-Parameter logistic IRT model

The item response function for the 3-parameter logistic IRT model is: \[P(X=1|\theta,a,b,c)=c+(1-c)~\frac{e^{a(\theta-b)}}{1+e^{a(\theta-b)}}\] The IRF describes the probability that an individual with latent ability level \(\theta\) endorses an item with three item characteristc paramters:
1. \(b\) is the item difficulty. This is reflected in the position of the item characteristic curve along the x-axis (i.e. latent trait)
2. \(a\) is the item discrimination. This is reflected in the steepness of the slope of the ICC.
3. \(c\) is a parameter for guessing. Under this model, individuals with zero ability have a nonzero chance of endorsing any item, just by guessing randomly. The guessing parameter is reflected in the y-intercept (i.e. probability) of the ICC.

4.1 Step 1: Fit the 3PL model

We fit the 3PL model with the tpm function from the ltm package.

PL3.rasch<-tpm(irtdat)

## Warning in tpm(irtdat): Hessian matrix at convergence is not positive definite; unstable solution.

summary(PL3.rasch)

## 
## Call:
## tpm(data = irtdat)
## 
## Model Summary:
##   log.Lik      AIC      BIC
##  -2573.76 5207.519 5333.958
## 
## Coefficients:
##             value std.err z.vals
## Gussng.V1  0.0000  0.0009 0.0276
## Gussng.V2  0.1064  0.0571 1.8636
## Gussng.V3  0.0000  0.0019 0.0252
## Gussng.V4  0.0793  0.0429 1.8490
## Gussng.V5  0.0000  0.0024 0.0158
## Gussng.V6  0.0001  0.0026 0.0223
## Gussng.V7  0.0002  0.0058 0.0284
## Gussng.V8  0.0000  0.0003 0.0113
## Gussng.V9  0.0017     NaN    NaN
## Gussng.V10 0.0000  0.0001 0.0185
## Dffclt.V1  1.6360  0.1914 8.5456
## Dffclt.V2  1.2145  0.1387 8.7569
## Dffclt.V3  0.8177  0.1250 6.5434
## Dffclt.V4  0.8900  0.1104 8.0642
## Dffclt.V5  0.2580  0.0921 2.8002
## Dffclt.V6  0.7466  0.1055 7.0747
## Dffclt.V7  0.4772  0.1337 3.5684
## Dffclt.V8  0.6516  0.0983 6.6270
## Dffclt.V9  0.6569  0.0833 7.8816
## Dffclt.V10 1.9191  0.2001 9.5917
## Dscrmn.V1  1.4148  0.2298 6.1576
## Dscrmn.V2  2.0070  0.9136 2.1967
## Dscrmn.V3  1.2085  0.1753 6.8951
## Dscrmn.V4  2.5299  0.8111 3.1190
## Dscrmn.V5  1.3529  0.1866 7.2489
## Dscrmn.V6  1.4799  0.2061 7.1818
## Dscrmn.V7  0.8820  0.1438 6.1332
## Dscrmn.V8  1.5261  0.2069 7.3755
## Dscrmn.V9  1.8865  0.2261 8.3425
## Dscrmn.V10 2.1150  0.4193 5.0442
## 
## Integration:
## method: Gauss-Hermite
## quadrature points: 21 
## 
## Optimization:
## Optimizer: optim (BFGS)
## Convergence: 0 
## max(|grad|): 0.015

All of the item difficulty and item discriminability parameters have significant z-values. Note that most of the guessing parameters have significant z-values (the data was simulated with a 2PL model, so no guessing was systematically built into the data) Items 2 and 4 have significant guessing parameters (z>1.65), probabilities of getting the item correcty by guessing are pretty low (10.6% and 7.9%, respectively). These significant estimates could be due to chance, or some noise in the simulated data.

4.2 Step 2: Plot the item characteristic curves of all 10 items

plot(PL3.rasch,type=c("ICC"))

The slopes of the ICCs look very similar to those of the 2PL model. We can see that items 2 and 4 have y-intercepts greater than zero-so even at very low ability levels, there is some chance of getting these items correct (via guessing).

4.3 Step 3: Plot the item information curves for all 10 items, then the whole test

plot(PL3.rasch,type=c("IIC"))

The IICs have changed markedly from the 2PL model. Item 4 now provides the most information about moderate ability levels.

We plot the IIC for the entire test, the sum of the item IICS:

plot(PL3.rasch,type=c("IIC"),items=c(0))

The whole-test IIC looks similar to the IICs for the 1PL and 2PL models, providing the most information about moderate ability levels, and less about extreme ability levels.

4.4 Step 4: Estimate ability scores & plot

We estimate the ability scores with the factor.scores() function in the ltm package.

theta.rasch<-ltm::factor.scores(PL3.rasch)
summary(theta.rasch$score.dat$z1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.9516  0.1794  0.5325  0.5895  1.0111  2.2281

We then plot the density curve of the estimated ability scores

plot(theta.rasch)

Again we see an approximately normal distribution with mean approximately 0 and standard deviation approximately 1.

4.5 Step 5: Test for unidimensionality

Last, we test for the unidimensionality of the 3PL model (i.e. is there a single trait \(\theta\) being measured here?)

unidimTest(PL3.rasch,irtdat)

## 
## Unidimensionality Check using Modified Parallel Analysis
## 
## Call:
## tpm(data = irtdat)
## 
## Matrix of tertachoric correlations
##         V1     V2     V3     V4     V5     V6     V7     V8     V9    V10
## V1  1.0000 0.3476 0.3357 0.3959 0.4449 0.4422 0.5215 0.3278 0.4158 0.3403
## V2  0.3476 1.0000 0.3124 0.3945 0.3909 0.3664 0.2408 0.3352 0.4293 0.4101
## V3  0.3357 0.3124 1.0000 0.3933 0.2970 0.3734 0.3011 0.4434 0.4040 0.4058
## V4  0.3959 0.3945 0.3933 1.0000 0.3853 0.5101 0.3035 0.4368 0.5135 0.6024
## V5  0.4449 0.3909 0.2970 0.3853 1.0000 0.3370 0.3526 0.4138 0.5149 0.3979
## V6  0.4422 0.3664 0.3734 0.5101 0.3370 1.0000 0.3137 0.4146 0.4801 0.4795
## V7  0.5215 0.2408 0.3011 0.3035 0.3526 0.3137 1.0000 0.3311 0.2054 0.1714
## V8  0.3278 0.3352 0.4434 0.4368 0.4138 0.4146 0.3311 1.0000 0.5069 0.4533
## V9  0.4158 0.4293 0.4040 0.5135 0.5149 0.4801 0.2054 0.5069 1.0000 0.6099
## V10 0.3403 0.4101 0.4058 0.6024 0.3979 0.4795 0.1714 0.4533 0.6099 1.0000
## 
## Alternative hypothesis: the second eigenvalue of the observed data is substantially larger 
##          than the second eigenvalue of data under the assumed IRT model
## 
## Second eigenvalue in the observed data: 0.6863
## Average of second eigenvalues in Monte Carlo samples: 0.5536
## Monte Carlo samples: 100
## p-value: 0.1089

Unidimensionality is not rejected (p=0.0693). Although the data were simulated under a 2PL model the (small) changes made by the (few) significant guessing parameters were not enough to reject unidimensionality.

5 Summary:

This tutorial has been a brief introduction to logistic IRT models. We have learned how to fit models which describe items in terms of difficulty, discriminability, and correct for guessing. We learned to interpret model parameters graphically in item characteristic curves and item information curves, and learned how to estimate the latent ability scores of test respondents.

Logistic IRT Models

Julie Wood

November 12, 2017