Overview

The pcFactorStan package for R provides convenience functions and pre-programmed Stan models related to analysis of paired comparison data. Its purpose is to make fitting models using Stan easy and easy to understand. pcFactorStan relies on the rstan package, which should be installed first. See here for instructions on installing rstan.

One situation where a factor model might be useful is when there are people that play in tournaments of more than one game. For example, the computer player AlphaZero (Silver et al. 2018) has trained to play chess, shogi, and Go. We can take the tournament match outcome data for each of these games and find rankings among the players. We may also suspect that there is a latent board game skill that accounts for some proportion of the variance in the per-board game rankings. This proportion can be recovered by the factor model.

Our goal may be to fit a factor model, but it is necessary to build up the model step-by-step. There are essentially three models: ‘unidim’, ‘correlation’, and ‘factor’. ‘unidim’ analyzes a single item. ‘correlation’ is suitable for two or more items. Once you have vetted your items with the ‘unidim’ and ‘correlation’ models, then you can try the ‘factor’ model. There is also a special model ‘unidim_adapt’. Except for this model, the other models require scaling constants. To find appropriate scaling constants, we will fit ‘unidim_adapt’ to each item separately.

Brief tutorial

Physical activity flow propensity

The R code below first loads rstan and pcFactorStan.

library(rstan)
library(pcFactorStan)

Next we take a peek at the data.

head(phyActFlowPropensity)
pa1 pa2 skill predict novelty creative complex goal1 feedback1 chatter waiting body control present spont stakes evaluated reward
mountain biking tennis 1 -1 -2 1 1 1 1 -2 1 1 1 1 1 1 2 0
mountain biking tennis 1 2 -1 -1 -1 0 2 1 2 0 1 0 0 1 2 -1
ice skating running -2 1 -1 -2 -1 1 1 -2 -2 -1 0 0 -1 -1 -1 0
climbing rowing -2 2 -2 -2 -2 0 -1 -1 -1 -1 -1 -1 1 0 0 0
card game gardening 0 0 0 0 2 0 0 0 -2 2 1 0 0 2 -2 2
dance table tennis 0 -2 -1 -1 0 -1 -1 -1 0 0 0 0 0 0 0 1

These data consist of paired comparisons of 87 physical activities on 16 flow-related facets. Participants submitted two activities using free-form input. These activities were substituted into item templates. For example, Item predict consisted of the prompt, “How predictable is the action?” with response options:

  • A1 is much more predictable than A2.
  • A1 is somewhat more predictable than A2.
  • Both offer roughly equal predictability.
  • A2 is somewhat more predictable than A1.
  • A2 is much more predictable than A1.

If the participant selected ‘golf’ and ‘running’ for activities then ‘golf’ was substituted into A1 and ‘running’ into A2. Duly prepared, the item was presented and the participant asked to select the most plausible statement.

A somewhat more response is scored 1 or -1 and much more scored 2 or -2. A tie (i.e. roughly equal) is scored as zero. We will need to analyze each item separately before we analyze them together. Therefore, we will start with Item skill.

Data must be fed into Stan in a partially digested form. The next block of code demonstrates how a suitable data list may be constructed using the prepData() function. This function automatically determines the number of threshold parameters based on the range observed in your data. One thing it does not do is pick a varCorrection factor. The varCorrection determines the degree of adaption in the model. Usually some choice between 2.0 to 4.0 will obtain optimal results.

dl <- prepData(phyActFlowPropensity[,c(paste0('pa',1:2), 'skill')])
dl$varCorrection <- 2.0

Next we fit the model using the pcStan() function, which is a wrapper for stan() from rstan. We also choose the number of chains. As is customary Stan procedure, the first half of each chain is used to estimate the sampler’s weight matrix (i.e. warm up) and excluded from inference.

fit1 <- pcStan("unidim_adapt", data=dl)

A variety of diagnostics are available to check whether the sampler ran into trouble.

check_hmc_diagnostics(fit1)
#> 
#> Divergences:
#> 0 of 4000 iterations ended with a divergence.
#> 
#> Tree depth:
#> 0 of 4000 iterations saturated the maximum tree depth of 10.
#> 
#> Energy:
#> E-BFMI indicated no pathological behavior.

Everything looks good, but there are a few more things to check. We want \(\widehat R\) < 1.015 and effective sample size greater than 100 times the number of chains (Vehtari et al., 2019).

allPars <- summary(fit1, probs=c())$summary 
print(min(allPars[,'n_eff']))
#> [1] 788.7
print(max(allPars[,'Rhat']))
#> [1] 1.007

Again, everything looks good. If the target values were not reached then we would sample the model again with more iterations. Time for a plot,

library(ggplot2)

theta <- summary(fit1, pars=c("theta"), probs=c())$summary[,'mean']

ggplot(data.frame(x=theta, activity=dl$nameInfo$pa, y=0.47)) +
  geom_point(aes(x=x),y=0) +
  geom_text(aes(label=activity, x=x, y=y),
            angle=85, hjust=0, size=2,
            position = position_jitter(width = 0, height = 0.4)) + ylim(0,1) +
  theme(legend.position="none",
        axis.title.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())

Intuitively, this seems like a fairly reasonable ranking for skill. As pretty as the plot is, the main reason that we fit this model was to find a scaling constant to produce a standard deviation close to 1.0,

s50 <- summary(fit1, pars=c("scale"), probs=c(.5))$summary[,'50%']
print(s50)
#> [1] 0.6597

We use the median instead of the mean because scale is not likely to have a symmetric marginal posterior distribution. We obtained 0.6597, but that value is just for one item. We have to perform the same procedure for every item. Wow, that would be really tedious … if we did not have a function to do it for us! Fortunately, calibrateItems takes care of it and produces a table of the pertinent data,

result <- calibrateItems(phyActFlowPropensity, iter=1000L)
print(result)
item iter divergent treedepth low_bfmi n_eff Rhat scale thetaVar
skill 2250 0 0 0 592.81 1.003 0.6421 0.8627
predict 1000 0 0 0 469.65 1.006 0.6133 0.8496
novelty 1500 0 0 0 605.73 1.006 0.4980 0.7926
creative 1000 0 0 0 404.80 1.009 0.4916 0.7892
complex 1000 0 0 0 416.68 1.007 0.5742 0.8312
goal1 1000 0 0 2 59.24 1.074 0.0243 0.2897
feedback1 1500 0 0 1 40.97 1.036 0.1213 0.4950
chatter 1000 0 0 0 1083.81 1.002 0.2369 0.6188
waiting 1000 0 0 0 446.07 1.003 0.5265 0.8075
body 1000 0 0 0 754.20 1.003 0.3651 0.7147
control 1000 0 0 0 912.37 1.006 0.3097 0.6766
present 1000 0 0 0 1112.43 1.001 0.2230 0.6064
spont 1000 0 0 0 972.22 1.005 0.2550 0.6341
stakes 1000 0 0 0 1018.71 1.002 0.2700 0.6463
evaluated 1000 0 0 0 629.95 1.003 0.4638 0.7741
reward 1000 0 0 0 820.65 1.005 0.2002 0.5850

Items goal1 and feedback1 ran into trouble. A nonzero count of divergent transitions or low_bfmi means that these items contained too little signal to estimate. We could try again with varCorrection=1.0, but we are going to exclude them instead. The model succeeded on the rest of the items. I requested iter=1000L to demonstrate how calibrateItems will resample the model until the n_eff is large enough and the Rhat small enough. Item skill (among others) needed more than 1000 samples to converge.

Next we will fit the correlation model. We exclude the Cholesky factor of the correlation matrix rawThetaCorChol because the regular correlation matrix is also output.

pafp <- phyActFlowPropensity
excl <- match(c('goal1','feedback1'), colnames(pafp))
pafp <- pafp[,-excl]
dl <- prepData(pafp)
dl$scale <- result[-excl,'scale'] 
fit2 <- pcStan("correlation", data=dl, include=FALSE, pars=c('rawTheta', 'rawThetaCorChol')) 
check_hmc_diagnostics(fit2) 
#> 
#> Divergences:
#> 0 of 4000 iterations ended with a divergence.
#> 
#> Tree depth:
#> 0 of 4000 iterations saturated the maximum tree depth of 10.
#> 
#> Energy:
#> E-BFMI indicated no pathological behavior.

allPars <- summary(fit2, probs=0.5)$summary 
print(min(allPars[,'n_eff']))
#> [1] NaN
print(max(allPars[,'Rhat']))
#> [1] NaN

The HMC diagnostics look good, but … oh dear! Something is wrong with the n_eff and \(\widehat R\). Let us look more carefully,

head(allPars[order(allPars[,'sd']),]) 
#>               mean   se_mean        sd 50%  n_eff  Rhat
#> thetaCor[1,1]    1       NaN 0.000e+00   1    NaN   NaN
#> thetaCor[2,2]    1 1.038e-18 6.266e-17   1 3644.5 0.999
#> thetaCor[3,3]    1 1.291e-18 6.777e-17   1 2756.8 0.999
#> thetaCor[4,4]    1 3.581e-18 6.802e-17   1  360.8 0.999
#> thetaCor[5,5]    1 1.186e-18 6.905e-17   1 3387.4 0.999
#> thetaCor[7,7]    1 3.304e-18 7.695e-17   1  542.4 0.999

Ah ha! It looks like all the entries of the correlation matrix are reported, including the entries that are not stochastic but are fixed to constant values. We need to filter those out to get sensible results.

allPars <- allPars[allPars[,'sd'] > 1e-6,]  
print(min(allPars[,'n_eff']))
#> [1] 851.2
print(max(allPars[,'Rhat']))
#> [1] 1.005

Ah, much better. Now we can inspect the correlation matrix. There are many ways to visualize a correlation matrix. One of my favorite ways is to plot it using the qgraph package,

covItemNames <- dl$nameInfo$item
tc <- summary(fit2, pars=c("thetaCor"), probs=c(.5))$summary[,'50%']
tcor <- matrix(tc, length(covItemNames), length(covItemNames))
dimnames(tcor) <- list(covItemNames, covItemNames)

library(qgraph)
#> Registered S3 methods overwritten by 'huge':
#>   method    from   
#>   plot.sim  BDgraph
#>   print.sim BDgraph
qgraph(tcor, layout = "spring", graph = "cor", labels=colnames(tcor),
       legend.cex = 0.3,
       cut = 0.3, maximum = 1, minimum = 0, esize = 20,
       vsize = 7, repulsion = 0.8, negDashed=TRUE, theme="colorblind")

Based on this plot and theoretical considerations, I decided to exclude spont, control, evaluated, and waiting from the factor model. A detailed rationale for why these items, and not others, are excluded will be presented in a forthcoming article. For now, let us focus on the mechanics of data analysis. Here are item response curves,

df <- responseCurve(dl, fit2, 
  item=setdiff(dl$nameInfo$item, c('spont','control','evaluated','waiting')),
  responseNames=c("much more","somewhat more", 'equal',
                  "somewhat less", "much less"))
ggplot(df) +
  geom_line(aes(x=worthDiff,y=prob,color=response,linetype=response,
                group=responseSample), size=.2, alpha=.2) +
  xlab("difference in latent worths") + ylab("probability") +
  ylim(0,1) + facet_wrap(~item) +
    guides(color=guide_legend(override.aes=list(alpha = 1, size=1)))

These response curves are a function of the thresholds, scale, and alpha parameters. A detailed description of the item response model can be found in the man page for responseCurve. A large alpha (>1) can mean that the item discriminates among objects well. However, it can also mean that the model predicts all responses will be equal. If most observed responses are indeed equal then this can result in good model fit, but another interpretation is that the item is useless.

alpha <- summary(fit2, pars=c("alpha"), probs=c(.5))$summary
rownames(alpha) <- covItemNames
print(alpha[alpha[,'sd']>.25,,drop=FALSE])
mean se_mean sd 50% n_eff Rhat
chatter 7.179 0.0293 1.3912 7.080 2255 0.9996
waiting 3.737 0.0101 0.5266 3.693 2715 1.0005

I already decided to exclude waiting by inspection of the correlation matrix, but it looks like Item chatter should be excluded as well.

We will enter the alpha parameters into the factor model as non-stochastic data. Trying to estimate alpha in the factor model causes bias, at least in the models that I have tried. The factor model is prone to increase both alpha and the magnitude of factor proportions at the expense of threshold accuracy. To treat alpha as non-stochastic reduces variability in the factor model, but not by much. Simulations indicate that the posterior distribution remains well calibrated.

I will fit model ‘factor_ll’ instead of ‘factor’ so I can use the loo package to look for outliers. We also need to take care that the data pafp matches, one-to-one, the data seen by Stan so we can map back from the model to the data. Hence, we update pafp using the usual the data cleaning sequence of filterGraph and normalizeData and pass the result to prepCleanData.

pafp <- pafp[,c(paste0('pa',1:2),
             setdiff(covItemNames, c('spont','control','evaluated','waiting','chatter')))]
pafp <- normalizeData(filterGraph(pafp))
dl <- prepCleanData(pafp)
dl$scale <- result[match(dl$nameInfo$item, result$item), 'scale']
dl$alpha <- alpha[match(dl$nameInfo$item, rownames(alpha)), 'mean']
fit3 <- pcStan("factor_ll", data=dl, include=FALSE, 
               pars=c('rawUnique', 'rawUniqueTheta', 'rawFactor', 'rawLoadings'))
#> Warning in throw_sampler_warnings(nfits): Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> http://mc-stan.org/misc/warnings.html#bulk-ess

To check the fit diagnostics, we have to take care to examine only the parameters of interest. The factor model outputs many parameters that should not be interpreted. For example, we do not care about log_lik because this vector contains per-observation likelihoods for loo. We also ignore theta because it is a function of the other parameters.

check_hmc_diagnostics(fit3)
#> 
#> Divergences:
#> 0 of 4000 iterations ended with a divergence.
#> 
#> Tree depth:
#> 0 of 4000 iterations saturated the maximum tree depth of 10.
#> 
#> Energy:
#> E-BFMI indicated no pathological behavior.

interest <- c("threshold", "factorLoadings",  "factorProp", "factor",
 "unique", "uniqueTheta", "lp__")

allPars <- summary(fit3, pars=interest)$summary
print(min(allPars[,'n_eff']))
#> [1] 183.8
print(max(allPars[,'Rhat']))
#> [1] 1.03

Usually a low effective sample size would suggest that the Markov chains need to run for more iterations. However, I tried that and it did not help. At 4000 or even 6000 iterations, n_eff still did not improve. Which parameters are difficult to sample?

print(allPars[head(order(allPars[,'n_eff'])),])
#>                      mean  se_mean      sd       2.5%         25%
#> unique[6]          0.9665 0.012920  0.1752  6.700e-01      0.8429
#> lp__          -13007.9520 1.223559 22.1257 -1.305e+04 -13022.4071
#> unique[4]          0.5179 0.006833  0.1498  2.441e-01      0.4155
#> unique[5]          0.1861 0.005125  0.1278  7.971e-03      0.0811
#> unique[9]          0.8190 0.010494  0.2637  3.166e-01      0.6427
#> factorProp[4]      0.7786 0.004206  0.1059  5.390e-01      0.7155
#>                       50%         75%       97.5% n_eff  Rhat
#> unique[6]          0.9471      1.0746      1.3431 183.8 1.025
#> lp__          -13007.4973 -12993.1453 -12965.7230 327.0 1.030
#> unique[4]          0.5112      0.6104      0.8362 480.4 1.018
#> unique[5]          0.1678      0.2696      0.4707 622.0 1.005
#> unique[9]          0.8070      0.9855      1.3560 631.3 1.006
#> factorProp[4]      0.7936      0.8538      0.9483 633.9 1.013

Interesting! The most troublesome parameter is unique[6]. Let us dig more,

print(dl$nameInfo$item[6])
#> [1] "body"
print(summary(fit3, pars=c('factorProp[6]', 'unique[6]', 'factorLoadings[6]'))$summary)
#>                     mean  se_mean     sd      2.5%     25%    50%   75%
#> factorProp[6]     0.1334 0.002533 0.1012 -0.000216 0.05127 0.1181 0.197
#> unique[6]         0.9665 0.012920 0.1752  0.670015 0.84290 0.9471 1.075
#> factorLoadings[6] 0.8080 0.009441 0.4115 -0.032369 0.52606 0.8203 1.088
#>                    97.5%  n_eff  Rhat
#> factorProp[6]     0.3642 1595.1 1.006
#> unique[6]         1.3431  183.8 1.025
#> factorLoadings[6] 1.6041 1899.8 1.002

The effective sample size is large enough for factorProp[6] even though this quantity is partially derived from unique[6]. Here we find that the factor proportion for Item body is not statistically different from zero at the \(\alpha=.05\) level. Rather that trying to force this model to converge somehow, Item body can be discarded.

pafp <- pafp[,c(paste0('pa',1:2),
             setdiff(covItemNames, c('spont','control','evaluated','waiting','chatter','body')))]
pafp <- normalizeData(filterGraph(pafp))
dl <- prepCleanData(pafp)
dl$scale <- result[match(dl$nameInfo$item, result$item), 'scale']
dl$alpha <- alpha[match(dl$nameInfo$item, rownames(alpha)), 'mean']

We rerun the model,

fit3 <- pcStan("factor_ll", data=dl, include=FALSE, iter=4000,
               pars=c('rawUnique', 'rawUniqueTheta', 'rawFactor', 'rawLoadings'))

Let us check the diagnostics,

check_hmc_diagnostics(fit3) 
#> 
#> Divergences:
#> 0 of 8000 iterations ended with a divergence.
#> 
#> Tree depth:
#> 0 of 8000 iterations saturated the maximum tree depth of 10.
#> 
#> Energy:
#> E-BFMI indicated no pathological behavior.

allPars <- summary(fit3, pars=interest, probs=0.5)$summary 
print(min(allPars[,'n_eff']))
#> [1] 1262
print(max(allPars[,'Rhat']))
#> [1] 1.011

Looks good! Let us see which data are the most unexpected by the model. We create a loo object and pass that to outlierTable.

options(mc.cores=1)  # otherwise loo consumes too much RAM 
l1 <- toLoo(fit3)
kThreshold <- 0.3
ot <- outlierTable(dl, l1, kThreshold)
print(ot)
pa1 pa2 item pick k
running snow skiing predict 2 0.4037
soccer artistic gymnastics skill 2 0.3921
climbing pilates predict -2 0.3918
meditation stretching predict 1 0.3904
tennis water skiing novelty -2 0.3815
martial arts boxing stakes 2 0.3735
curling quidditch (sport) reward 0 0.3714
tennis water skiing present -2 0.3574
volleyball aerobic exercise reward 0 0.3553
snow skiing water skiing skill -1 0.3534
tennis water skiing creative 2 0.3530
horseback riding obstacle course skill -2 0.3520
table tennis cue sports creative 2 0.3513
obstacle course aerobic exercise creative 2 0.3480
running on a treadmill skateboarding predict -2 0.3456
tennis water skiing predict -2 0.3453
cricket Australian football predict -2 0.3440
lacrosse skateboarding complex -2 0.3428
running artistic gymnastics predict 2 0.3416
running curling novelty -1 0.3301
hiking ice skating present 0 0.3278
disc golf roller derby complex -1 0.3277
basketball skateboarding creative 2 0.3271
mountain biking racquetball predict 2 0.3263
climbing pilates creative 2 0.3248
racquetball basketball skill 1 0.3244
lacrosse skateboarding reward -1 0.3236
tennis racquetball novelty 0 0.3235
baseball bowling complex -2 0.3230
calisthenics sex creative -2 0.3197
table tennis cue sports predict 2 0.3177
cricket netball reward 0 0.3174
volleyball skipping rope predict -1 0.3163
hiking shopping skill 0 0.3157
running on a treadmill archery predict -2 0.3139
badminton cricket reward 2 0.3134
cycling ultimate frisbee skill 2 0.3129
basketball ultimate frisbee reward -2 0.3120
snow skiing water skiing predict 1 0.3108
hiking racquetball novelty 0 0.3104
football dodgeball complex -1 0.3075
bowling rowing novelty 1 0.3045
bowling rowing stakes 0 0.3027
table tennis cue sports novelty -1 0.3006
running rowing skill 2 0.3002
soccer artistic gymnastics creative 2 0.3000

Observations with \(k>0.5\) can be regarded as outliers. Every sampling run may identify outliers in a slightly different order. We use a threshold of 0.3 instead of 0.5 to ensure that at least a few lines are shown.

xx <- which(ot[,'pa1'] == 'tennis' & ot[,'pa2'] == 'water skiing' & ot[,'item'] == 'predict' & ot[,'pick'] == -2)

We will take a closer look at row 16. What does a pick of -2 mean? Pick numbers are converted to response categories by adding the number of thresholds plus one. There are two thresholds (much and somewhat) so 3 + -2 = 1. Looking back at our item response curve plot, the legend gives the response category order from top (1) to bottom (5). The first response category is much more. Putting it all together we obtain an endorsement of tennis is much more predictable than water skiing. Specifically what about that assertion is unexpected? We can examine how other participants have responded,

pafp[pafp$pa1 == ot[xx,'pa1'] & pafp$pa2 == ot[xx,'pa2'],
     c('pa1','pa2', as.character(ot[xx,'item']))]
#>        pa1          pa2 predict
#> 142 tennis water skiing      -2

Hm, this is the only participant that compared tennis and water skiing. Let us look a little deeper to understand why this response was unexpected.

loc <- sapply(ot[xx,c('pa1','pa2','item')], unfactor) 
exam <- summary(fit3, pars=paste0("theta[",loc[paste0('pa',1:2)],
                          ",", loc['item'],"]"))$summary
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
theta[2,2] -0.7830 0.0039 0.2780 -1.329 -0.9707 -0.7812 -0.5899 -0.2455 5211 1.0005
theta[21,2] -0.2433 0.0066 0.7006 -1.636 -0.7076 -0.2425 0.2294 1.1282 11250 0.9998

Here we find that, based on comparisons with other activities, water skiing was estimated about 0.5397 units more predictable than tennis. However, note the wide posterior quantiles! What are the sample sizes associated with these activities?

sum(c(pafp$pa1 == ot[xx,'pa1'], pafp$pa2 == ot[xx,'pa1']))
#> [1] 72
sum(c(pafp$pa1 == ot[xx,'pa2'], pafp$pa2 == ot[xx,'pa2']))
#> [1] 2

Ah, no wonder, predictability’s 95% uncertainty interval for water skiing is from -1.6357 to 1.1282. So there is practically no information. We could continue our investigation by looking at which responses justified these predict estimates. However, let us move on and plot the marginal posterior distributions of the factor proportions,

pi <- parInterval(fit3, 'factorProp', dl$nameInfo$item, label='item') 
pi <- pi[order(abs(pi$M)),]
pi$item <- factor(pi$item, levels=pi$item)

ggplot(pi) +
  geom_vline(xintercept=0, color="green") +
  geom_jitter(data=parDistributionFor(fit3, pi),
              aes(value, item), height = 0.35, alpha=.05) +
  geom_segment(aes(y=item, yend=item, x=L, xend=U),
               color="yellow", alpha=.5) +
  geom_point(aes(x=M, y=item), color="red", size=1) +
  theme(axis.title.y=element_blank())

Finally, we can plot the factor scores. By default, activities with small sample sizes are retained by filterGraph if they connect other activities because they contribute information to the model. However, when we look at the per-activity factor scores, we can limit ourselves to activities with a sample size of at least 11.

pa11 <- levels(filterGraph(pafp, minDifferent=11L)$pa1) 
pick <- paste0('factor[',match(pa11, dl$nameInfo$pa),']')
pi <- parInterval(fit3, pick, pa11, label='activity')
pi <- pi[order(pi$M),]
pi$activity <- factor(pi$activity, levels=pi$activity)

ggplot(pi) +
  geom_vline(xintercept=0, color="green") +
  geom_jitter(data=parDistributionFor(fit3, pi, samples=250),
              aes(value, activity), height = 0.35, alpha=.05) +
  geom_segment(aes(y=activity, yend=activity, x=L, xend=U),
               color="yellow", alpha=.5) +
  geom_point(aes(x=M, y=activity), color="red", size=1) +
  theme(axis.title.y=element_blank()) 

And there you have it. If you have not done so already, go find a dojo and commence study of martial arts!

Technical notes

Given that my background is more in software than math, I am not a fan of the greek letters used with such enthusiasm by mathematicians. When I name variables, I favor the expressive over the succinct.

If you read through the Stan models included with this package, you will find some variables prefixed with raw. These are special variables internal to the model. In particular, you should not try to evaluate the \(\widehat R\) or effective sample size of raw parameters. These parameters are best excluded from the sampling output.

Unidim Adapt

parameter prior purpose
threshold normal(0,2) item response thresholds
theta normal(0,sigma) latent score
sigma lognormal(1,1) latent score scale
scale N/A latent score scaling constant

The ‘unidim_adapt’ model has a varCorrection constant that is used to calibrate the scale. For all other models, per-item scale must be passed in as data. scale has no substantive interpretation, but it is used to partition signal between object variance and item discrimination. While object variance has no substantive interpretation, item discrimination is interpretable.

Unidim

parameter prior purpose
threshold normal(0,2) item response thresholds
alpha exponential(0.1) item discrimination
theta normal(0,1) latent score

Correlation

parameter prior purpose
threshold normal(0,2) item response thresholds
alpha exponential(0.1) item discrimination
thetaCor lkj(2) correlations between items
theta see below latent score

Thresholds for all items are combined into a single vector. The prior for theta is multivariate normal with correlations thetaCor and scale 1.0. Exclude rawTheta and rawThetaCorChol from sampling reports.

Factor

parameter prior purpose
threshold normal(0,2) item response thresholds
unique normal(1,1) scale of unique scores
uniqueTheta normal(0,1) unique scores
factorLoadings normal(0,1) signed scale of factor scores
factor normal(0,1) factor scores
factorProp N/A signed factor variance proportion
sigma N/A relative item scale

Thresholds for all items are combined into a single vector. factorProp is computed using Equation 3 of Gelman et al. (in press) and has no prior of its own. factorLoadings is in standard deviation units but can be negative. Similarly, factorProp is a signed proportion bounded between -1 and 1. Exclude rawUnique, rawUniqueTheta, rawFactor, and rawLoadings from sampling.

References

Gelman, A., Goodrich, B., Gabry, J., & Vehtari, A. (in press). R-squared for Bayesian regression models. The American Statistician. DOI: 10.1080/00031305.2018.1549100

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., … & Lillicrap, T. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 1140-1144.

Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2019). Rank-normalization, folding, and localization: An improved \(\widehat R\) for assessing convergence of MCMC. arXiv preprint arXiv:1903.08008.

R Session Info

sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 19.04
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] qgraph_1.6.3       pcFactorStan_1.0.1 Rcpp_1.0.1        
#> [4] rstan_2.19.1       StanHeaders_2.18.9 reshape2_1.4.3    
#> [7] ggplot2_3.2.0      knitr_1.23        
#> 
#> loaded via a namespace (and not attached):
#>  [1] splines_3.6.1       gtools_3.8.1        Formula_1.2-3      
#>  [4] assertthat_0.2.1    BDgraph_2.59        highr_0.8          
#>  [7] stats4_3.6.1        latticeExtra_0.6-28 d3Network_0.5.2.1  
#> [10] yaml_2.2.0          pbivnorm_0.6.0      pillar_1.4.1       
#> [13] backports_1.1.4     lattice_0.20-38     glue_1.3.1         
#> [16] digest_0.6.19       RColorBrewer_1.1-2  checkmate_1.9.3    
#> [19] ggm_2.3             colorspace_1.4-1    htmltools_0.3.6    
#> [22] Matrix_1.2-17       plyr_1.8.4          psych_1.8.12       
#> [25] pkgconfig_2.0.2     purrr_0.3.2         corpcor_1.6.9      
#> [28] mvtnorm_1.0-11      scales_1.0.0        processx_3.3.1     
#> [31] whisker_0.3-2       glasso_1.10         jpeg_0.1-8         
#> [34] fdrtool_1.2.15      huge_1.3.2          tibble_2.1.3       
#> [37] htmlTable_1.13.1    withr_2.1.2         pbapply_1.4-0      
#> [40] nnet_7.3-12         lazyeval_0.2.2      cli_1.1.0          
#> [43] mnormt_1.5-5        survival_2.44-1.1   magrittr_1.5       
#> [46] crayon_1.3.4        evaluate_0.14       ps_1.3.0           
#> [49] MASS_7.3-51.3       nlme_3.1-140        foreign_0.8-71     
#> [52] pkgbuild_1.0.3      tools_3.6.1         loo_2.1.0          
#> [55] data.table_1.12.2   prettyunits_1.0.2   matrixStats_0.54.0 
#> [58] stringr_1.4.0       munsell_0.5.0       cluster_2.1.0      
#> [61] callr_3.2.0         compiler_3.6.1      rlang_0.3.4        
#> [64] grid_3.6.1          rstudioapi_0.10     rjson_0.2.20       
#> [67] htmlwidgets_1.3     igraph_1.2.4.1      lavaan_0.6-3       
#> [70] base64enc_0.1-3     labeling_0.3        rmarkdown_1.13     
#> [73] gtable_0.3.0        codetools_0.2-16    abind_1.4-5        
#> [76] inline_0.3.15       R6_2.4.0            gridExtra_2.3      
#> [79] dplyr_0.8.1         Hmisc_4.2-0         stringi_1.4.3      
#> [82] parallel_3.6.1      rpart_4.1-13        acepack_1.4.1      
#> [85] png_0.1-7           tidyselect_0.2.5    xfun_0.7