1 Introduction

This vignette explains how to perform scale linking with the PROsetta package. By way of illustration, we replicate the linking of the Center for Epidemiologic Studies Depression Scale (CES-D) to the PROMIS Depression metric as described in Choi, Schalet, Cook, and Cella (2014).


2 Load datasets

First step is to load the input datasets comprised of three tables with loadData(). The PROMIS Depression – CES-D linking data are included in the PROsetta package directory under the folder labeled data-raw.

fp = system.file("data-raw", package = "PROsetta")
d = loadData(
  response  = "dat_DeCESD_v2.csv",
  itemmap   = "imap_DeCESD.csv",
  anchor    = "anchor_DeCESD.csv",
  input_dir = fp)
  • response: Contains item response data from both instruments. You can supply a .csv filename or a data frame. In this example, we supply a .csv filename dat_DeCESD_v2.csv.
  • itemmap: Specifies which items belong to which instruments. Can be a .csv filename or a data frame.
  • anchor: Contains tem parameters for anchor items (e.g., PROMIS Depression). Can be a .csv filename or a data frame.
  • input_dir: (Optional) The path of the directory to look for the input .csv files.


2.1 Response data

The response data contains individual item responses on both instruments (i.e., 28 PROMIS Depression items followed by 20 CES-D items). The data table should include the following columns.

  • prosettaid: The person ID of the respondents (N = 747). This column does not have to be named prosettaid but should not conflict with other data tables (item map and anchor).
  • Other columns should include the item response fields with their unique item IDs as column names. The item names should match the item_id column in both the item map and anchor files.

Run the following code, for example, to open the response data in edit mode.

file.edit(system.file("data-raw", "dat_DeCESD_v2.csv", package = "PROsetta"))


2.2 Item map data

The item map data requires the following columns.

  • item_id: Contains the unique ID of the items. The name of this column does not have to be item_id but should be consistent with the item ID column in the anchor table. The IDs in this column should match the column names in the response data.
  • instrument: Numerals (1 or 2) indicating to which of the two instruments the items belong (e.g., 1 = PROMIS Depression; 2 = CES-D)
  • item_order: The sequential position of the items in the combined table (e.g., 1, 2, 3, …, 28, …, 48)
  • item_name: Secondary labels for the items
  • ncat: The number of response categories by item
  • min_score: The minimum score by item (0 or 1)
  • reverse: Indicating whether each item has been reverse scored (0 = not reversed; 1 = reversed)
  • scores: A string containing comma-separated values for all possible scores of each item (e.g., “1,2,3,4,5”)

Run the following code to open the item map data in edit mode.

file.edit(system.file("data-raw", "imap_DeCESD.csv"  , package = "PROsetta"))


2.3 Anchor data

The anchor data contains the item parameters for the anchor scale (e.g., PROMIS Depression) and requires the following columns.

  • item_order: The sequential position of the items in the anchor scale (e.g., 1, 2, 3, …, 28)
  • item_id: The unique ID of the anchor items. The name of this column does not have to be item_id but should be consistent with the item ID column in the item map table The IDs in this column should refer to the specific column names in the response data.
  • a: The slope parameter value for each anchor item
  • cb1, cb2, …: The category boundary parameter values for each anchor item
  • ncat: The number of response categories for each anchor item

Run the following code to open the anchor data in edit mode.

file.edit(system.file("data-raw", "anchor_DeCESD.csv", package = "PROsetta"))


3 Descriptive analysis

3.1 Basic descriptive statistics

The frequency distribution of each item in the response data is obtained by runFrequency().

freq_table = runFrequency(d)
head(freq_table)
##           1   2   3  4  5
## EDDEP04 526 112  66 29 14
## EDDEP05 488 118  91 37 12
## EDDEP06 502 119  85 30 10
## EDDEP07 420 155 107 49 16
## EDDEP09 492 132  89 25  9
## EDDEP14 445 150 101 37 14


The frequency distribution of the summed scores for the combined scale can be plotted as a histogram with plot(). The required argument is a PROsetta_data object created with loadData(). The optional scale argument specifies for which scale the summed score should be created. Setting scale = 'combined' plots the summed score distribution for the combined scale.

plot(d, scale = 'combined', title = "Combined scale")

The user can also generate the summed score distribution for the first or second scale by specifying scale = 1 or scale = 2.

plot(d, scale = 1, title = "Scale 1") # not run
plot(d, scale = 2, title = "Scale 2") # not run


Basic descriptive statistics are obtained for each item by runDescriptive().

desc_table = runDescriptive(d)
head(desc_table)
##           n mean   sd median trimmed mad min max range skew kurtosis   se
## EDDEP04 747 1.52 0.94      1    1.30   0   1   5     4 1.91     3.01 0.03
## EDDEP05 746 1.62 0.99      1    1.42   0   1   5     4 1.54     1.50 0.04
## EDDEP06 746 1.56 0.94      1    1.37   0   1   5     4 1.66     2.01 0.03
## EDDEP07 747 1.78 1.05      1    1.59   0   1   5     4 1.23     0.59 0.04
## EDDEP09 747 1.56 0.91      1    1.38   0   1   5     4 1.62     1.98 0.03
## EDDEP14 747 1.69 1.00      1    1.51   0   1   5     4 1.38     1.12 0.04


3.2 Classical reliability analysis

Classical reliability statistics can be obtained by runClassical(). By default, the analysis is performed for the combined scale.

classical_table = runClassical(d)
summary(classical_table$alpha$combined)
## 
## Reliability analysis   
##  raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.98      0.98    0.99      0.53  54 9e-04  1.7 0.69     0.54

The user can set scalewise = TRUE to request an analysis for each scale separately in addition to the combined scale.

classical_table = runClassical(d, scalewise = TRUE)
classical_table$alpha$combined # alpha values for combined scale
classical_table$alpha$`1`      # alpha values for each scale, created when scalewise = TRUE
classical_table$alpha$`2`      # alpha values for each scale, created when scalewise = TRUE

Specifying omega = TRUE returns the McDonald’s \(\omega\) coefficients as well.

classical_table = runClassical(d, scalewise = TRUE, omega = TRUE)
classical_table$omega$combined # omega values for combined scale
classical_table$omega$`1`      # omega values for each scale, created when scalewise = TRUE
classical_table$omega$`2`      # omega values for each scale, created when scalewise = TRUE

Additional arguments can be supplied to runClassical() to pass onto psych::omega().

classical_table = runClassical(d, scalewise = TRUE, omega = TRUE, nfactors = 5) # not run


3.3 Dimensionality analysis

Dimensionality analysis is performed with CFA by runCFA(). Setting scalewise = TRUE performs the dimensionality analysis for each scale separately in addition to the combined scale.

out_cfa = runCFA(d, scalewise = TRUE)

runCFA() calls for lavaan::cfa() internally and can pass additional arguments onto it.

out_cfa = runCFA(d, scalewise = TRUE, std.lv = TRUE) # not run


The CFA result for the combined scale is stored in the combined slot, and if scalewise = TRUE, the analysis for each scale is also stored in each numbered slot.

out_cfa$combined
## lavaan 0.6-7 ended normally after 23 iterations
## 
##   Estimator                                       DWLS
##   Optimization method                           NLMINB
##   Number of free parameters                        220
##                                                       
##                                                   Used       Total
##   Number of observations                           731         747
##                                                                   
## Model Test User Model:
##                                               Standard      Robust
##   Test Statistic                              4227.611    4700.780
##   Degrees of freedom                              1080        1080
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.046
##   Shift parameter                                          657.455
##        simple second-order correction
out_cfa$`1`
## lavaan 0.6-7 ended normally after 16 iterations
## 
##   Estimator                                       DWLS
##   Optimization method                           NLMINB
##   Number of free parameters                        140
##                                                       
##                                                   Used       Total
##   Number of observations                           738         747
##                                                                   
## Model Test User Model:
##                                               Standard      Robust
##   Test Statistic                               863.527    1434.277
##   Degrees of freedom                               350         350
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  0.678
##   Shift parameter                                          160.257
##        simple second-order correction
out_cfa$`2`
## lavaan 0.6-7 ended normally after 20 iterations
## 
##   Estimator                                       DWLS
##   Optimization method                           NLMINB
##   Number of free parameters                         80
##                                                       
##                                                   Used       Total
##   Number of observations                           740         747
##                                                                   
## Model Test User Model:
##                                               Standard      Robust
##   Test Statistic                              1106.148    1431.797
##   Degrees of freedom                               170         170
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  0.812
##   Shift parameter                                           69.205
##        simple second-order correction


CFA fit indices can be obtained by using summary() from the lavaan package. For the combined scale:

lavaan::summary(out_cfa$combined, fit.measures = TRUE, standardized = TRUE, estimates = FALSE)
## lavaan 0.6-7 ended normally after 23 iterations
## 
##   Estimator                                       DWLS
##   Optimization method                           NLMINB
##   Number of free parameters                        220
##                                                       
##                                                   Used       Total
##   Number of observations                           731         747
##                                                                   
## Model Test User Model:
##                                               Standard      Robust
##   Test Statistic                              4227.611    4700.780
##   Degrees of freedom                              1080        1080
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.046
##   Shift parameter                                          657.455
##        simple second-order correction                             
## 
## Model Test Baseline Model:
## 
##   Test statistic                            793198.133   91564.632
##   Degrees of freedom                              1128        1128
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  8.758
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.996       0.960
##   Tucker-Lewis Index (TLI)                       0.996       0.958
##                                                                   
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.063       0.068
##   90 Percent confidence interval - lower         0.061       0.066
##   90 Percent confidence interval - upper         0.065       0.070
##   P-value RMSEA <= 0.05                          0.000       0.000
##                                                                   
##   Robust RMSEA                                                  NA
##   90 Percent confidence interval - lower                        NA
##   90 Percent confidence interval - upper                        NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.051       0.051

and also for each scale separately:

lavaan::summary(out_cfa$`1`     , fit.measures = TRUE, standardized = TRUE, estimates = FALSE) # not run
lavaan::summary(out_cfa$`2`     , fit.measures = TRUE, standardized = TRUE, estimates = FALSE) # not run


3.4 Item parameter calibration

runCalibration() performs IRT calibration without anchoring. runCalibration() calls mirt::mirt() internally. Additional arguments can be passed onto mirt, e.g., to increase the number of EM cycles to 1000, as follows:

out_calib = runCalibration(d, technical = list(NCYCLES = 1000))


In case of nonconvergence, runCalibration() explicitly raises an error and does not return its results:

out_calib = runCalibration(d, technical = list(NCYCLES = 10))
## Error in runCalibration(d, technical = list(NCYCLES = 10)): calibration did not converge: increase iteration limit by adjusting the 'technical' argument, e.g., technical = list(NCYCLES = 510)


Also, specify fixedpar = TRUE to perform fixed parameter calibration using the anchor data.

out_calib = runCalibration(d, fixedpar = TRUE)


The output object from runCalibration() can be used to generate additional output with functions from the mirt package.

Use coef() to extract item parameters:

mirt::coef(out_calib, IRTpars = TRUE, simplify = TRUE)
## $items
##             a     b1    b2    b3    b4
## EDDEP04 4.261  0.401 0.976 1.696 2.444
## EDDEP05 3.932  0.305 0.913 1.593 2.412
## EDDEP06 4.145  0.350 0.915 1.678 2.471
## EDDEP07 2.802  0.148 0.772 1.603 2.538
## EDDEP09 3.657  0.312 0.982 1.782 2.571
## EDDEP14 2.333  0.186 0.947 1.729 2.633
## EDDEP17 3.274 -0.498 0.406 1.413 2.375
## EDDEP19 3.241  0.460 1.034 1.834 2.515
## EDDEP21 2.736  0.072 0.810 1.803 2.673
## EDDEP22 3.970  0.204 0.795 1.649 2.295
## EDDEP23 2.564 -0.038 0.693 1.653 2.584
## EDDEP26 3.093 -0.358 0.412 1.404 2.224
## EDDEP27 2.920  0.204 0.891 1.655 2.528
## EDDEP28 2.588 -0.079 0.633 1.477 2.328
## EDDEP29 4.343 -0.117 0.598 1.428 2.272
## EDDEP30 2.613 -0.023 0.868 1.864 2.826
## EDDEP31 3.183 -0.261 0.397 1.305 2.134
## EDDEP35 3.106  0.044 0.722 1.639 2.471
## EDDEP36 3.483 -0.536 0.348 1.347 2.355
## EDDEP39 3.131  0.918 1.481 2.164 2.856
## EDDEP41 4.454  0.558 1.074 1.779 2.530
## EDDEP42 2.364  0.210 0.987 1.906 2.934
## EDDEP44 2.549  0.194 1.012 2.013 3.126
## EDDEP45 2.834  0.141 0.907 1.846 2.875
## EDDEP46 2.381 -0.458 0.478 1.546 2.632
## EDDEP48 3.185  0.198 0.782 1.526 2.324
## EDDEP50 2.018 -0.050 0.926 2.000 2.966
## EDDEP54 2.685 -0.299 0.423 1.358 2.308
## CESD1   2.074  0.876 1.921 3.064    NA
## CESD2   1.262  1.387 2.670 3.721    NA
## CESD3   3.512  0.833 1.316 1.949    NA
## CESD4   1.118  0.649 1.379 2.081    NA
## CESD5   1.605  0.429 1.526 2.724    NA
## CESD6   3.635  0.493 1.176 1.729    NA
## CESD7   1.828  0.287 1.368 2.134    NA
## CESD8   1.342 -0.067 0.823 1.620    NA
## CESD9   3.003  0.748 1.374 1.855    NA
## CESD10  2.060  1.172 2.043 3.268    NA
## CESD11  1.077 -0.463 0.947 2.160    NA
## CESD12  2.229  0.169 0.945 1.737    NA
## CESD13  1.288  0.342 1.696 2.915    NA
## CESD14  2.176  0.491 1.291 1.864    NA
## CESD15  1.397  0.965 2.321 3.604    NA
## CESD16  2.133  0.272 0.922 1.808    NA
## CESD17  1.719  1.607 2.317 3.470    NA
## CESD18  2.812  0.261 1.248 1.984    NA
## CESD19  1.834  0.784 1.875 2.639    NA
## CESD20  1.491 -0.140 1.256 2.297    NA
## 
## $means
##    F1 
## -0.06 
## 
## $cov
##      F1
## F1 0.95

and also other commonly used functions:

mirt::itemfit(out_calib, empirical.plot = 1)
mirt::itemplot(out_calib, item = 1, type = "info")
mirt::itemfit(out_calib, "S_X2", na.rm = TRUE)


Scale information functions can be plotted with plotInfo. The two required arguments are an output object from runCalibration() and a PROsetta object from loadData(). The additional arguments specify the labels, colors, and line types for each scale and the combined scale. The last values in arguments scale_label, color, lty represent the values for the combined scale.

plotInfo(
  out_calib, d,
  scale_label = c("PROMIS Depression", "CES-D", "Combined"),
  color = c("blue", "red", "black"),
  lty = c(1, 2, 3))