# 1. Summary

The compicc package is intended for practicioners in a wide range of fields, most notably psychology, medicine, and sports science. It allows users to compare the reliability of two measurement systems or one system at two different time points. Specifically, the functions calculate a (1-$$\alpha$$)% confidence interval (CI) for the difference between two intraclass correlation coefficients (ICCs). These methods were first proposed by Ramasundarahettige et al. (2009). For example, one could compare the reliability of two different medical practicioners’ measurements on patients’ shoulder mobility in degrees of rotation (deVet et al. 2011).

There are two functions in compicc, dep_ci() and indep_ci(). The dep_ci() function calculates the difference of ICCs for the dependent case, when the two samples consist of the same subjects. On the other hand, the indep_ci() function calculates the difference of ICCs for the independent case, when the two samples consist of different subjects.

In addition, the package contains two sets of two dataframes (four total dataframes) to be used as demonstrations of the functions’ capabilities. These dataframes are titled dep_df1 and dep_df2, as well as indep_df1 and indep_df2. In Section 3, this document will call the package’s functions with these sets of dataframes and interpret the results for instructional purposes.

To start, the package is loaded with the code:

library(compicc)

# 2. Included Elements

The compicc package includes two different functions: dep_ci() and indep_ci(). Determining which function to use depends on whether the same set of subjects or a different set of subjects were tested in each dataset being compared.

## 2.1. Confidence Interval for Dependent Data: dep_ci()

Dependent data refers to the scenario in which the two dataframes consist of the same set of subjects. For example, observations in row 1 of the first dataset are taken from the same subject as observations in row 1 of the second dataset. This must hold true for every row of data, so the observations between datasets are “matched.”

The dep_ci() function is called with three arguments: data1, data2 and conf_level. The arguments data1 and data2 refer to the two different datasets. The argument conf_level refers to the confidence level of the confidence interval. This value defaults to 0.95 when not defined by the user, representing a 95% confidence interval.

The function returns three values:

• data1ICC: ICC of data1
• data2ICC: ICC of data2
• confidenceIntervalDifference: dataframe with the lower bound and upper bound of the confidence interval for the difference of the ICC of data1 and data2
• confidenceIntervalDifference$lowerBound: lower bound of confidence interval • confidenceIntervalDifference$upperBound: upper bound of confidence interval

The confidence interval represents the interval for the difference ICC(data1) - ICC(data2).

## 2.2. Confidence Interval for Inependent Data: indep_ci()

Independent data refers to the scenario in which the two dataframes consist of entirely different sets of subjects. This means there are no subjects with scores in the first dataframe and the second dataframe.

The indep_ci() function is called with three arguments: data1, data2 and conf_level. The arguments data1 and data2 refer to the two different datasets. The argument conf_level refers to the confidence level of the confidence interval. This value defaults to 0.95 when not defined by the user, representing a 95% confidence interval.

The function returns three values:

• data1ICC: ICC of data1
• data2ICC: ICC of data2
• confidenceIntervalDifference: dataframe with the lower bound and upper bound of the confidence interval for the difference between the ICC of data1 and data2
• confidenceIntervalDifference$lowerBound: lower bound of confidence interval • confidenceIntervalDifference$upperBound: upper bound of confidence interval

The confidence interval represents the interval for the difference ICC(data1) - ICC(data2).

## 2.3. Included Data

The compicc package contains four datasets so that the user may work through examples of the functions within the package. dep_df1 and dep_df2 are the dataframes for the dependent case, and indep_df1 and indep_df2 are the dataframes for the independent case.

First, consider the two dependent dataframes, dep_df1 and dep_df2. Both consist of simulated data of four trials of measurements for 100 subjects. The data represents a hypothetical score assigned to the subjects, where the overall mean score is zero. Such an example could be the measurements of subjects by a sensor at two times, and the user is looking to quantify how the sensor’s reliability has changed over time. Each dataframe contains the same 100 subjects (paired observations), meaning the dataframes are dependent. The dataframes are in wide format, meaning each row represents the measurements of one subject across four trials split column-wise. This format is required for all functions in the compicc package to run. Displayed below are the first few rows of dep_df1 to show the proper formatting of the datasets.

##      Trial 1    Trial 2    Trial 3    Trial 4
## 1  2.7908151  2.9914997  3.1671994  3.7316656
## 2 -0.3696103 -0.9891107 -1.7069014 -1.3998243
## 3  0.1631003 -1.0879259 -1.6040934 -1.1937224
## 4  0.2611442  0.6936286  1.1886768  2.3882749
## 5 -1.1405719 -1.5913244 -1.4706451 -0.8625495
## 6 -0.1340875 -0.5238328  0.2003782 -0.6403733

In contrast, indep_df1 and indep_df2 consist of simulated data of four trials of measurements for 100 and 80 subjects each, respectively. In this case, consider the situation where the 100 subjects tested in the first dataframe are different than the 80 subjects tested in the second dataframe (i.e. the two samples are independent or non-overlapping). An example of this application is each dataframe containing measurements from one rater, and the user is interested in comparing the reliability of scores from the rater across dataframes. Like the dependent case, both dataframes are in wide format with rows representing the subjects and columns representing the trials.

Note: The number of subjects is equal in both dataframes in the dependent case. This is required, since every subject of each dataframe must be found in the other dataframe. This is not the case in the independent dataframes: since the two independent dataframes have completely different sets of subjects, they are allowed to have a different number of subjects/rows.

# 3. Using the Functions with the Included Data

This package provides easily accessible data with the included datasets. The purpose of this is to demonstrate the functions’ outputs and interpretation. Both the dependent and independent case are used as examples.

## 3.1. Example with Dependent Data

This section provides an example of the usage of the dep_ci() function with the package’s provided datasets.

The arguments of the dep_ci() function are:

• data1: the first dataframe (must be in wide format)
• data2: the second dataframe (must be in wide format)
• conf_level: confidence level of the confidence interval (defaults to 0.95 when not specified)

For example, the following code computes a 95% confidence interval for the difference between the ICC of dep_df1 and dep_df2, storing the output in a variable called result:

result <- dep_ci(dep_df1, dep_df2)

The function yields the following output:

• ICC of dep_df1 = result$data1ICC = 0.795187 • ICC of dep_df2 = result$data2ICC = 0.6377282
• Confidence interval for the difference between dep_df1 ICC and dep_df2 ICC = result$confidenceIntervalDifference • Lower bound of confidence interval = result$confidenceIntervalDifference$lowerBound = 0.0848324 • Upper bound of confidence interval = result$confidenceIntervalDifference$upperBound = 0.2329263 In this case, we are 95% confident that the true difference between the ICC of dep_df1 and the ICC of dep_df2 lies in the interval (0.0848324, 0.2329263). Since the interval is strictly positive and does not include zero, we have enough evidence to conclude that the true ICCs of the sensor at the initial and final timea are not equal. This means the reliability of the measurements of the sensor has changed over time. ## 3.2. Example with Independent Data This section provides an example of the usage of the indep_ci() function with the package’s provided datasets. The inputs of the indep_ci() function are: • data1: the first dataframe (must be in wide format) • data2: the second dataframe (must be in wide format) • conf_level: confidence level of the confidence interval (defaults to 0.95 when not specified) For example, the following code computes the 90% confidence interval for the difference between the ICC of indep_df1 and indep_df2, storing the output in a variable called result2: result2 <- indep_ci(indep_df1, indep_df2, conf_level = 0.9) The function yields the following output: • ICC of indep_df1 = result2$data1ICC = 0.6624369
• ICC of indep_df2 = result2$data2ICC = 0.6913065 • Confidence interval for the difference between indep_df1 ICC and indep_df2 ICC = result2$confidenceIntervalDifference
• Lower bound of confidence interval = result2$confidenceIntervalDifference$lowerBound = -0.1268077
• Upper bound of confidence interval = result2$confidenceIntervalDifference$upperBound = 0.0702803

In this case, we are 90% confident that the true difference between the ICC of indep_df1 and the ICC of indep_df2 lies in the interval (-0.1268077, 0.0702803). Since the interval includes zero, there is not enough evidence to conclude that the true ICCs of the two dataframes differ.

# 4. Frequent Errors

To demonstrate the possible errors encountered when using the functions in the compicc package, examples of dataframes that lead to certain errors are described below.

## 4.1. Unequal Number of Trials

To compute the ICC of a dataframe, every subject must go through the same number of trials during testing. In other words, each row in the dataframe must have the same number of columns. Similarly, each dataframe being compared must have the same number of trials/columns. This means if dataframe 1 consists of four trials per subject, dataframe 2 must consist of exactly four trials per subject. An example of dataframes violating this condition is:

d1_trial1 <- c(34, 33, 36)
d1_trial2 <- c(41, 38, 40)
d1_trial3 <- c(37, 36, 37)
data1 <- data.frame(d1_trial1, d1_trial2, d1_trial3)

d2_trial1 <- c(33, 33, 35)
d2_trial2 <- c(43, 41, 42)
d2_trial3 <- c(36, 36, 38)
d2_trial4 <- c(29, 30, 29)
data2 <- data.frame(d2_trial1, d2_trial2, d2_trial3, d2_trial4)

indep_ci(data1, data2)
## Error in indep_ci(data1, data2): number of columns in data1 must equal that of data2

The dataframe data1 consists of three trials per subject, but data2 holds four trials per subject. The error message informs the user that the number of trials/columns must be equal across dataframes. This error applies to both the dep_ci() and indep_ci() functions.

## 4.2. Unequal Number of Observations (Dependent case only)

In order for the two dataframes to be dependent, the subjects/rows must be matched across dataframes. This means row 1 of dataframe1 represents the same subject as row 1 of dataframe2, and so on for each additional row of data. Therefore, if there are an unequal number of subjects/rows between the two dataframes, the function dep_ci() will return an error message. The dataframes cannot be dependent when they have an unequal number of rows of data. Below is an example:

d1_trial1 <- c(34, 33, 36)
d1_trial2 <- c(41, 38, 40)
d1_trial3 <- c(37, 36, 37)
data1 <- data.frame(d1_trial1, d1_trial2, d1_trial3)

d2_trial1 <- c(33, 33, 35, 32)
d2_trial2 <- c(43, 41, 42, 43)
d2_trial3 <- c(36, 36, 38, 38)
data2 <- data.frame(d2_trial1, d2_trial2, d2_trial3)

dep_ci(data1, data2)
## Error in dep_ci(data1, data2): number of rows in data1 must equal that of data2

The error message tells the user that the number of rows must be equal across dataframes. When receiving this message, the user should adjust the dataframes to make sure that the subjects tested in each dataframe match each other and are placed in the same row in both dataframes.

Note: This is not an issue for the independent case. The indep_ci() function accepts dataframes with unequal numbers of observations, since the subjects should not match across dataframes.

## 4.3. Missing Values

The embedded functions in the compicc functions do not work with dataframes that consist of missing values. Therefore, if the user tries to call either the dep_ci() or indep_ci() function with a dataframe that has one or more NA or NaN value, the function will stop running and return an error message. An example of this is shown below:

d1_trial1 <- c(34, 33, 36)
d1_trial2 <- c(41, 38, 40)
d1_trial3 <- c(37, NA, 37)
data1 <- data.frame(d1_trial1, d1_trial2, d1_trial3)

d2_trial1 <- c(33, 33, 35)
d2_trial2 <- c(43, 41, 42)
d2_trial3 <- c(36, 36, 38)
data2 <- data.frame(d2_trial1, d2_trial2, d2_trial3)

dep_ci(data1, data2)
## Error in dep_ci(data1, data2): cannot have NA values in dataframe

The error message states that there cannot be a missing value in either dataframe. The NA value in data1 (d1_trial3) must either be replaced with an imputed value, or subject 2’s results must be discarded in order to use either function.

# 5. Description of Calculations

As mentioned above, the functions in the compicc package include intensive calculations derived by Ramasundarahettige et al. (2009).

The approach to estimating the difference between two ICCs begins with the simple case of one ICC. In this case, the formula of the confidence interval is derived from the central limit theorem and Slutsky’s Theorem:

$$L, H = \widehat{\rho} \pm (z_{\alpha/2})\sqrt{(\widehat{var}(\widehat{\rho})}$$

Where L, H are the lower, upper bounds of the confidence interval, $$\widehat{\rho}$$ is the point estimate of the ICC, and $$z_{\alpha/2}$$ is the $${\alpha/2}$$ quantile of the normal distribution.

Extending this to the difference of two ICCs, the formula becomes:

Lower bound: $$L = \widehat{\rho_1}-\widehat{\rho_2}-\sqrt{var(\widehat{\rho}_1)+var(\widehat{\rho}_2)}$$

Upper bound: $$U = \widehat{\rho_1}-\widehat{\rho_2}+\sqrt{var(\widehat{\rho}_1)+var(\widehat{\rho}_2)}$$

The formulas are derived from Ramasundarahettige et al. (2009) to yield the following equations:

For the independent case, the confidence interval is calculated by:

Lower bound: $$L = \widehat{\rho_1}-\widehat{\rho_2}-\sqrt{(\widehat{\rho_1}-l_1)^2+(u_2-\widehat{\rho_2})^2}$$

Upper bound: $$U = \widehat{\rho_1}-\widehat{\rho_2}+\sqrt{(u_1-\widehat{\rho_1})^2+(\widehat{\rho_2}-l_2)^2}$$

For the dependent case, the confidence interval is calculated by:

Lower bound: $$L = \widehat{\rho_1}-\widehat{\rho_2}-\sqrt{(\widehat{\rho_1}-l_1)^2-2*\widehat{corr({\rho}_{1}{\rho}_{2})}*(\widehat{\rho_1}-l_1)*(u_2-\widehat{\rho_2})+(u_2-\widehat{\rho_2})^2}$$

Upper bound: $$U = \widehat{\rho_1}-\widehat{\rho_2}+\sqrt{(u_1-\widehat{\rho_1})^2-2*\widehat{corr({\rho}_{1}{\rho}_{2})}*(u_1-\widehat{\rho_1})*(\widehat{\rho_2}-l_2)+(\widehat{\rho_2}-l_2)^2}$$

Where:

$$\widehat{corr({\rho}_{1}{\rho}_{2})} = \widehat{\rho}_{12}^2*\frac{\sqrt{k_1*k_2*(k_1-1)*(k_2-1)}}{(1+(k_1-1)\widehat{\rho}_{1})(1+(k_2-1)\widehat{\rho}_{2})}$$

$$\widehat{\rho_1}$$ and $$\widehat{\rho_2}$$ are the point estimates of the two dataframes’ ICCs, $$l_1$$ and $$u_1$$ are the lower and upper bounds of the CI of the ICC of dataframe 1, $$l_2$$ and $$u_2$$ are the lower and upper bounds of the CI of the ICC of dataframe 2, $$k_1$$ and $$k_2$$ are the number of trials for dataframe 1 and dataframe 2.

For further reading into the calculations used in this package, refer to Ramasundarahettige et al. (2009).

# 6. References

de Vet, H.C., Terwee, C.B., Mokknink, L. B., & Knol, D.L. (2011). Measurement in medicine: A practical guide. New York, NY: Cambridge University Press.

Matthias Gamer, Jim Lemon and Ian Fellows Puspendra Singh (2019). irr: Various Coefficients of Interrater Reliability and Agreement. R package version 0.84.1. https://CRAN.R-project.org/package=irr

Ramasundarahettige, C. F., Donner, A., & Zou, G. Y. (2009). Confidence Interval Construction for a Difference Between Two Dependent Intraclass Correlation Coefficients. Statistics in Medicine, 28(7), 1041–1053. https://doi.org/10.1002/sim.3523