# 1 Introduction

This vignette explains how to perform scale linking with the PROsetta package. By way of illustration, we replicate the linking of the Center for Epidemiologic Studies Depression Scale (CES-D) to the PROMIS Depression metric as described in Choi, Schalet, Cook, and Cella (2014).

First step is to load the input datasets comprised of three tables with loadData(). The PROMIS Depression – CES-D linking data are included in the PROsetta package directory under the folder labeled data-raw.

fp = system.file("data-raw", package = "PROsetta")
response  = "dat_DeCESD_v2.csv",
itemmap   = "imap_DeCESD.csv",
anchor    = "anchor_DeCESD.csv",
input_dir = fp)
• response: Contains item response data from both instruments. You can supply a .csv filename or a data frame. In this example, we supply a .csv filename dat_DeCESD_v2.csv.
• itemmap: Specifies which items belong to which instruments. Can be a .csv filename or a data frame.
• anchor: Contains tem parameters for anchor items (e.g., PROMIS Depression). Can be a .csv filename or a data frame.
• input_dir: (Optional) The path of the directory to look for the input .csv files.

## 2.1 Response data

The response data contains individual item responses on both instruments (i.e., 28 PROMIS Depression items followed by 20 CES-D items). The data table should include the following columns.

• prosettaid: The person ID of the respondents (N = 747). This column does not have to be named prosettaid but should not conflict with other data tables (item map and anchor).
• Other columns should include the item response fields with their unique item IDs as column names. The item names should match the item_id column in both the item map and anchor files.

Run the following code, for example, to open the response data in edit mode.

file.edit(system.file("data-raw", "dat_DeCESD_v2.csv", package = "PROsetta"))

## 2.2 Item map data

The item map data requires the following columns.

• item_id: Contains the unique ID of the items. The name of this column does not have to be item_id but should be consistent with the item ID column in the anchor table. The IDs in this column should match the column names in the response data.
• instrument: Numerals (1 or 2) indicating to which of the two instruments the items belong (e.g., 1 = PROMIS Depression; 2 = CES-D)
• item_order: The sequential position of the items in the combined table (e.g., 1, 2, 3, …, 28, …, 48)
• item_name: Secondary labels for the items
• ncat: The number of response categories by item
• min_score: The minimum score by item (0 or 1)
• reverse: Indicating whether each item has been reverse scored (0 = not reversed; 1 = reversed)
• scores: A string containing comma-separated values for all possible scores of each item (e.g., “1,2,3,4,5”)

Run the following code to open the item map data in edit mode.

file.edit(system.file("data-raw", "imap_DeCESD.csv"  , package = "PROsetta"))

## 2.3 Anchor data

The anchor data contains the item parameters for the anchor scale (e.g., PROMIS Depression) and requires the following columns.

• item_order: The sequential position of the items in the anchor scale (e.g., 1, 2, 3, …, 28)
• item_id: The unique ID of the anchor items. The name of this column does not have to be item_id but should be consistent with the item ID column in the item map table The IDs in this column should refer to the specific column names in the response data.
• a: The slope parameter value for each anchor item
• cb1, cb2, …: The category boundary parameter values for each anchor item
• ncat: The number of response categories for each anchor item

Run the following code to open the anchor data in edit mode.

file.edit(system.file("data-raw", "anchor_DeCESD.csv", package = "PROsetta"))

# 3 Descriptive analysis

## 3.1 Basic descriptive statistics

The frequency distribution of each item in the response data is obtained by runFrequency().

freq_table = runFrequency(d)
head(freq_table)
##           1   2   3  4  5
## EDDEP04 526 112  66 29 14
## EDDEP05 488 118  91 37 12
## EDDEP06 502 119  85 30 10
## EDDEP07 420 155 107 49 16
## EDDEP09 492 132  89 25  9
## EDDEP14 445 150 101 37 14

The frequency distribution of the summed scores for the combined scale can be plotted as a histogram with plot(). The required argument is a PROsetta_data object created with loadData(). The optional scale argument specifies for which scale the summed score should be created. Setting scale = 'combined' plots the summed score distribution for the combined scale.

plot(d, scale = 'combined', title = "Combined scale")

The user can also generate the summed score distribution for the first or second scale by specifying scale = 1 or scale = 2.

plot(d, scale = 1, title = "Scale 1") # not run
plot(d, scale = 2, title = "Scale 2") # not run

Basic descriptive statistics are obtained for each item by runDescriptive().

desc_table = runDescriptive(d)
head(desc_table)
##           n mean   sd median trimmed mad min max range skew kurtosis   se
## EDDEP04 747 1.52 0.94      1    1.30   0   1   5     4 1.91     3.01 0.03
## EDDEP05 746 1.62 0.99      1    1.42   0   1   5     4 1.54     1.50 0.04
## EDDEP06 746 1.56 0.94      1    1.37   0   1   5     4 1.66     2.01 0.03
## EDDEP07 747 1.78 1.05      1    1.59   0   1   5     4 1.23     0.59 0.04
## EDDEP09 747 1.56 0.91      1    1.38   0   1   5     4 1.62     1.98 0.03
## EDDEP14 747 1.69 1.00      1    1.51   0   1   5     4 1.38     1.12 0.04

## 3.2 Classical reliability analysis

Classical reliability statistics can be obtained by runClassical(). By default, the analysis is performed for the combined scale.

classical_table = runClassical(d)
summary(classical_table$alpha$combined)
##
## Reliability analysis
##  raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.98      0.98    0.99      0.53  54 9e-04  1.7 0.69     0.54

The user can set scalewise = TRUE to request an analysis for each scale separately in addition to the combined scale.

classical_table = runClassical(d, scalewise = TRUE)
classical_table$alpha$combined # alpha values for combined scale
classical_table$alpha$1      # alpha values for each scale, created when scalewise = TRUE
classical_table$alpha$2      # alpha values for each scale, created when scalewise = TRUE

Specifying omega = TRUE returns the McDonald’s $$\omega$$ coefficients as well.

classical_table = runClassical(d, scalewise = TRUE, omega = TRUE)
classical_table$omega$combined # omega values for combined scale
classical_table$omega$1      # omega values for each scale, created when scalewise = TRUE
classical_table$omega$2      # omega values for each scale, created when scalewise = TRUE

Additional arguments can be supplied to runClassical() to pass onto psych::omega().

classical_table = runClassical(d, scalewise = TRUE, omega = TRUE, nfactors = 5) # not run

## 3.3 Dimensionality analysis

Dimensionality analysis is performed with CFA by runCFA(). Setting scalewise = TRUE performs the dimensionality analysis for each scale separately in addition to the combined scale.

out_cfa = runCFA(d, scalewise = TRUE)

runCFA() calls for lavaan::cfa() internally and can pass additional arguments onto it.

out_cfa = runCFA(d, scalewise = TRUE, std.lv = TRUE) # not run

The CFA result for the combined scale is stored in the combined slot, and if scalewise = TRUE, the analysis for each scale is also stored in each numbered slot.

out_cfa$combined ## lavaan 0.6-7 ended normally after 23 iterations ## ## Estimator DWLS ## Optimization method NLMINB ## Number of free parameters 220 ## ## Used Total ## Number of observations 731 747 ## ## Model Test User Model: ## Standard Robust ## Test Statistic 4227.611 4700.780 ## Degrees of freedom 1080 1080 ## P-value (Chi-square) 0.000 0.000 ## Scaling correction factor 1.046 ## Shift parameter 657.455 ## simple second-order correction out_cfa$1
## lavaan 0.6-7 ended normally after 16 iterations
##
##   Estimator                                       DWLS
##   Optimization method                           NLMINB
##   Number of free parameters                        140
##
##                                                   Used       Total
##   Number of observations                           738         747
##
## Model Test User Model:
##                                               Standard      Robust
##   Test Statistic                               863.527    1434.277
##   Degrees of freedom                               350         350
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  0.678
##   Shift parameter                                          160.257
##        simple second-order correction
out_cfa$2 ## lavaan 0.6-7 ended normally after 20 iterations ## ## Estimator DWLS ## Optimization method NLMINB ## Number of free parameters 80 ## ## Used Total ## Number of observations 740 747 ## ## Model Test User Model: ## Standard Robust ## Test Statistic 1106.148 1431.797 ## Degrees of freedom 170 170 ## P-value (Chi-square) 0.000 0.000 ## Scaling correction factor 0.812 ## Shift parameter 69.205 ## simple second-order correction CFA fit indices can be obtained by using summary() from the lavaan package. For the combined scale: lavaan::summary(out_cfa$combined, fit.measures = TRUE, standardized = TRUE, estimates = FALSE)
## lavaan 0.6-7 ended normally after 23 iterations
##
##   Estimator                                       DWLS
##   Optimization method                           NLMINB
##   Number of free parameters                        220
##
##                                                   Used       Total
##   Number of observations                           731         747
##
## Model Test User Model:
##                                               Standard      Robust
##   Test Statistic                              4227.611    4700.780
##   Degrees of freedom                              1080        1080
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.046
##   Shift parameter                                          657.455
##        simple second-order correction
##
## Model Test Baseline Model:
##
##   Test statistic                            793198.133   91564.632
##   Degrees of freedom                              1128        1128
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  8.758
##
## User Model versus Baseline Model:
##
##   Comparative Fit Index (CFI)                    0.996       0.960
##   Tucker-Lewis Index (TLI)                       0.996       0.958
##
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
##
## Root Mean Square Error of Approximation:
##
##   RMSEA                                          0.063       0.068
##   90 Percent confidence interval - lower         0.061       0.066
##   90 Percent confidence interval - upper         0.065       0.070
##   P-value RMSEA <= 0.05                          0.000       0.000
##
##   Robust RMSEA                                                  NA
##   90 Percent confidence interval - lower                        NA
##   90 Percent confidence interval - upper                        NA
##
## Standardized Root Mean Square Residual:
##
##   SRMR                                           0.051       0.051

and also for each scale separately:

lavaan::summary(out_cfa$1 , fit.measures = TRUE, standardized = TRUE, estimates = FALSE) # not run lavaan::summary(out_cfa$2     , fit.measures = TRUE, standardized = TRUE, estimates = FALSE) # not run

## 3.4 Item parameter calibration

runCalibration() performs IRT calibration without anchoring. runCalibration() calls mirt::mirt() internally. Additional arguments can be passed onto mirt, e.g., to increase the number of EM cycles to 1000, as follows:

out_calib = runCalibration(d, technical = list(NCYCLES = 1000))

In case of nonconvergence, runCalibration() explicitly raises an error and does not return its results:

out_calib = runCalibration(d, technical = list(NCYCLES = 10))
## Error in runCalibration(d, technical = list(NCYCLES = 10)): calibration did not converge: increase iteration limit by adjusting the 'technical' argument, e.g., technical = list(NCYCLES = 510)

Also, specify fixedpar = TRUE to perform fixed parameter calibration using the anchor data.

out_calib = runCalibration(d, fixedpar = TRUE)

The output object from runCalibration() can be used to generate additional output with functions from the mirt package.

Use coef() to extract item parameters:

mirt::coef(out_calib, IRTpars = TRUE, simplify = TRUE)
## $items ## a b1 b2 b3 b4 ## EDDEP04 4.261 0.401 0.976 1.696 2.444 ## EDDEP05 3.932 0.305 0.913 1.593 2.412 ## EDDEP06 4.145 0.350 0.915 1.678 2.471 ## EDDEP07 2.802 0.148 0.772 1.603 2.538 ## EDDEP09 3.657 0.312 0.982 1.782 2.571 ## EDDEP14 2.333 0.186 0.947 1.729 2.633 ## EDDEP17 3.274 -0.498 0.406 1.413 2.375 ## EDDEP19 3.241 0.460 1.034 1.834 2.515 ## EDDEP21 2.736 0.072 0.810 1.803 2.673 ## EDDEP22 3.970 0.204 0.795 1.649 2.295 ## EDDEP23 2.564 -0.038 0.693 1.653 2.584 ## EDDEP26 3.093 -0.358 0.412 1.404 2.224 ## EDDEP27 2.920 0.204 0.891 1.655 2.528 ## EDDEP28 2.588 -0.079 0.633 1.477 2.328 ## EDDEP29 4.343 -0.117 0.598 1.428 2.272 ## EDDEP30 2.613 -0.023 0.868 1.864 2.826 ## EDDEP31 3.183 -0.261 0.397 1.305 2.134 ## EDDEP35 3.106 0.044 0.722 1.639 2.471 ## EDDEP36 3.483 -0.536 0.348 1.347 2.355 ## EDDEP39 3.131 0.918 1.481 2.164 2.856 ## EDDEP41 4.454 0.558 1.074 1.779 2.530 ## EDDEP42 2.364 0.210 0.987 1.906 2.934 ## EDDEP44 2.549 0.194 1.012 2.013 3.126 ## EDDEP45 2.834 0.141 0.907 1.846 2.875 ## EDDEP46 2.381 -0.458 0.478 1.546 2.632 ## EDDEP48 3.185 0.198 0.782 1.526 2.324 ## EDDEP50 2.018 -0.050 0.926 2.000 2.966 ## EDDEP54 2.685 -0.299 0.423 1.358 2.308 ## CESD1 2.074 0.876 1.921 3.064 NA ## CESD2 1.262 1.387 2.670 3.721 NA ## CESD3 3.512 0.833 1.316 1.949 NA ## CESD4 1.118 0.649 1.379 2.081 NA ## CESD5 1.605 0.429 1.526 2.724 NA ## CESD6 3.635 0.493 1.176 1.729 NA ## CESD7 1.828 0.287 1.368 2.134 NA ## CESD8 1.342 -0.067 0.823 1.620 NA ## CESD9 3.003 0.748 1.374 1.855 NA ## CESD10 2.060 1.172 2.043 3.268 NA ## CESD11 1.077 -0.463 0.947 2.160 NA ## CESD12 2.229 0.169 0.945 1.737 NA ## CESD13 1.288 0.342 1.696 2.915 NA ## CESD14 2.176 0.491 1.291 1.864 NA ## CESD15 1.397 0.965 2.321 3.604 NA ## CESD16 2.133 0.272 0.922 1.808 NA ## CESD17 1.719 1.607 2.317 3.470 NA ## CESD18 2.812 0.261 1.248 1.984 NA ## CESD19 1.834 0.784 1.875 2.639 NA ## CESD20 1.491 -0.140 1.256 2.297 NA ## ##$means
##    F1
## -0.06
##
## \$cov
##      F1
## F1 0.95

and also other commonly used functions:

mirt::itemfit(out_calib, empirical.plot = 1)
mirt::itemplot(out_calib, item = 1, type = "info")
mirt::itemfit(out_calib, "S_X2", na.rm = TRUE)

Scale information functions can be plotted with plotInfo. The two required arguments are an output object from runCalibration() and a PROsetta object from loadData(). The additional arguments specify the labels, colors, and line types for each scale and the combined scale. The last values in arguments scale_label, color, lty represent the values for the combined scale.

plotInfo(
out_calib, d,
scale_label = c("PROMIS Depression", "CES-D", "Combined"),
color = c("blue", "red", "black"),
lty = c(1, 2, 3))