**Please note that Iota1 is outdated. Please use the new
version Iota2.** **Central definitions of Iota
have changed in the new version.**

**This vignette is not updated to future
developments.**

Reliability is a central characteristic of any assessment instrument, and describes the extent to which the instrument produces error-free data (Schreier, 2012). In terms of content analysis, Krippendorff (2019) suggests replicability as a fundamental reliability concept, which is also referred to as inter-coder reliability. This describes the degree to which “a process can be reproduced by different analysts, working under varying conditions, at different locations, or using different but functionally equivalent measuring instruments” (Krippendorff, 2019).

The package *iotarelr* provides an environment for estimating
the degree of inter-coder reliability based on the Iota Reliability
Concept developed by Berding et al. (2022). The concept provides one of
the first measures for characterizing the degree of reliability for a
complete scale and for *every single category*. Most of the older
measures are limited to information on the complete scale.

The suggested concept is applicable to any kind of content analysis
that uses a coding scheme with nominal or ordinal data regardless the
kind of coders (human or artificial intelligence), the number of coders,
and the number of categories. The following introduction shows how to
use *Iota1*.

At the beginning, data generated by at least two coders is needed. Let us assume that four coders analyzed 20 textual fragments with a coding scheme consisting of three categories A, B, and C.

```
library(iotarelr)
#> Lade nötiges Paket: ggplot2
#> Warning: Paket 'ggplot2' wurde unter R Version 4.1.3 erstellt
#> Lade nötiges Paket: ggalluvial
#> Warning: Paket 'ggalluvial' wurde unter R Version 4.1.3 erstellt
<-c("A","A","C","C","B","A","A","B","C","A","B","C","A","B","A","B","C","A","A","C")
coder_1<-c("A","A","C","B","B","A","A","B","C","A","A","C","A","C","A","B","A","A","A","C")
coder_2<-c("A","A","C","C","B","A","A","B","C","A","B","C","A","B","A","B","C","A","A","C")
coder_3<-c("A","C","C","C","B","A","A","B","C","A","B","A","A","B","A","C","B","A","A","C")
coder_4
<-cbind(coder_1,coder_2,coder_3,coder_4) coded_data
```

In this example, the characteristics are saved as characters. The
package also supports that the categories are stored as integers. The
only important aspect is that the rows must contain the coding units
(e.g., the textual fragments), and that the columns represent the
different coders. In the next step, the estimation of iota and its
elements starts via the function `compute_iota1()`

.

`<-compute_iota1(data=coded_data) results`

```
$alpha
results#> A B C
#> [1,] 0.5647321 0.140625 0.2020089
```

The component `$alpha`

saves the values for the
*chance-corrected* alpha reliabilities. These values describe the
extend in which the true characteristic of a coding unit is discovered.
It ranges from 0 to 1. 0 indicates the absence of reliability. That is,
the assignment of the true category equals a random selection between
the categories. 1 indicates that the true value is always recovered.

In the current example, the chance-corrected alpha value for category A is relatively high. This means that the coding scheme leads coders to assign characteristic A to a coding unit if the true characteristic of the coding unit is A with a high probability. In contrast, the values for the other two categories are very low. This indicates that a coding unit with the true characteristic B or C is often assigned to the other categories. That is, the coding scheme does not ensure that coders discover the true category if the true category is B or C.

To provide a more detailed insight of the coding scheme, the beta values account for the errors which occur in the case that the true category is not discovered. That is, the beta values describe the extent to which a category is influenced by errors made in other categories. For example, if the true category of a coding unit is A and a coder does not assign A to that coding unit, the data for category B and C are biased. The data representing categories B and C comprises a coding unit that should not be part of the data.

The *chance-corrected* values for the beta reliabilities are
stored in `$beta`

and range between 0 and 1. 0 indicates that
the beta reliability equals the beta reliability in the case of complete
guessing. 1 indicate the absence of any beta errors.

```
$beta
results#> A B C
#> [1,] 0.746875 0.6835938 0.6203125
```

In the current example, the chance-corrected beta reliabilities are relatively high. This means that the different categories are not strongly influenced by errors in the other categories.

Iota values summarize the different types of errors for each category
by averaging the chance-corrected reliabilities. They are stored in
`$iota`

.

```
$iota
results#> A B C
#> [1,] 0.6558036 0.4121094 0.4111607
```

Iota can range between 0 and 1. 0 indicates that the quality of the coding of a category equals random guessing. Codings of this category are not reliable. 1 indicates a perfect reliability of a category. That is, the true value of a coding unit is recovered if the true category is the category under investigation and errors made by coding coding units of other categories do note influence the data of the category under investigation. In the current example category A is quite reliable while the other categories are not.

The assignment-error matrix combines the alpha and beta values and provides the most detailed description of a coding scheme. It is based on the raw estimates without any chance-correction.

```
$assignment_error_matrix
results#> A B C
#> A 0.4285714 0.4545455 0.5454545
#> B 0.4000000 0.8461538 0.6000000
#> C 0.4444444 0.5555556 0.7857143
```

The AEM has to be read row by row because the rows represent the true
category of a coding unit and the columns represent the assigned
categories. The values on the diagonal represent the alpha-error of the
categories. That is the probability *not* to assign the true
category to a coding unit. The other cells describe the probability to
assign the category to the other categories. That is, they inform about
the probability to choose a category under the condition that the true
category is *not* recovered.

In the example, the alpha-error of category A is about .43 meaning
that a coding unit of category A is coded as another category in about
43 %. Or in other words: The probability to assign the true category to
a coding unit of category A is about 57%. Thus, the second and third
cell in the first row mean: If the true category of a coding unit
belonging to category A is *not* recovered, about 45% of the
codings are assigned as category B and 55% of the codings are assigned
to category C. Thus, category C suffers more from errors made with
coding units truly belonging to category A than category B.

The alpha-error of category B is about .85. Thus in about 85% of cases the coding units truly belonging to category B are assigned to another category. In other words: The probability to recover the true category is about 15 % if the true category of the coding unit is B. In the case that this error occurs, 40% of the cases are assigned as category A and 60 % are assigned to category C. Thus, the data of category C suffers more from errors made on coding units belonging truly to category B than category A.

The alpha-error of category C is about 79%. Thus, in about 79% of the cases a coding unit truly belonging to category C is assigned to category A or B. If this error occurs 44 % of the codings units are treated as category A and 56% are treated as category B. In consequence, category B suffers more form errors made with codings units of category C than category A. In other words: The data for category B is more strongly biased by errors of category C than the data of category A.

The measures described above provide detailed insights into the
reliability of every single category which is a new feature for content
analysis and very helpful for constructing a coding scheme or for
evaluating data of empirical studies. In many applications however, the
values have to be summarized to values representing the quality of the
complete coding scheme. In the Iota Concept this is done by averaging
the iota values for each category. The value is accessible by
`$average_iota`

.

```
$average_iota
results#> [1] 0.4930246
```

In the current example average iota is about .49. At the moment only a rule of thumb for ordinal data exist. According to Berding et al. (2022) an average iota of at least .474 is necessary for an acceptable level of reliability on the scale level. For a “good” reliability average iota should be at least .601.

- Berding, F., Elisabeth r., Simone S., Jahncke, H., Slopinski, A., and Rebmann, K. (2022). Performance and Configuration of Artificial Intelligence in Educational Settings. Introducing a New Reliability Concept Based on Content Analysis. Frontiers in Education. https://doi.org/10.3389/feduc.2022.818365
- Krippendorff, K. (2019). Content Analysis: An Introduction to Its Methodology
- Schreier, M. (2012). Qualitative Content Analysis in Practice. SAGE.