Please note that Iota1 is outdated. Please use the new version Iota2. Central definitions of Iota have changed in the new version. Please refer to Get started
This vignette is not updated to future developments.
Reliability is a central characteristic of any assessment instrument, and describes the extent to which the instrument produces error-free data (Schreier, 2012). In terms of content analysis, Krippendorff (2019) suggests replicability as a fundamental reliability concept, which is also referred to as inter-coder reliability. This describes the degree to which “a process can be reproduced by different analysts, working under varying conditions, at different locations, or using different but functionally equivalent measuring instruments” (Krippendorff, 2019).
The package iotarelr provides an environment for estimating the degree of inter-coder reliability based on the Iota Reliability Concept developed by Berding et al. (2022). The concept provides one of the first measures for characterizing the degree of reliability for a complete scale and for every single category. Most of the older measures are limited to information on the complete scale.
The suggested concept is applicable to any kind of content analysis that uses a coding scheme with nominal or ordinal data regardless the kind of coders (human or artificial intelligence), the number of coders, and the number of categories. The following introduction shows how to use Iota1.
At the beginning, data generated by at least two coders is needed. Let us assume that four coders analyzed 20 textual fragments with a coding scheme consisting of three categories A, B, and C.
library(iotarelr) #> Lade nötiges Paket: ggplot2 #> Warning: Paket 'ggplot2' wurde unter R Version 4.1.3 erstellt #> Lade nötiges Paket: ggalluvial #> Warning: Paket 'ggalluvial' wurde unter R Version 4.1.3 erstellt <-c("A","A","C","C","B","A","A","B","C","A","B","C","A","B","A","B","C","A","A","C") coder_1<-c("A","A","C","B","B","A","A","B","C","A","A","C","A","C","A","B","A","A","A","C") coder_2<-c("A","A","C","C","B","A","A","B","C","A","B","C","A","B","A","B","C","A","A","C") coder_3<-c("A","C","C","C","B","A","A","B","C","A","B","A","A","B","A","C","B","A","A","C") coder_4 <-cbind(coder_1,coder_2,coder_3,coder_4)coded_data
In this example, the characteristics are saved as characters. The
package also supports that the categories are stored as integers. The
only important aspect is that the rows must contain the coding units
(e.g., the textual fragments), and that the columns represent the
different coders. In the next step, the estimation of iota and its
elements starts via the function
$alpha results#> A B C #> [1,] 0.5647321 0.140625 0.2020089
$alpha saves the values for the
chance-corrected alpha reliabilities. These values describe the
extend in which the true characteristic of a coding unit is discovered.
It ranges from 0 to 1. 0 indicates the absence of reliability. That is,
the assignment of the true category equals a random selection between
the categories. 1 indicates that the true value is always recovered.
In the current example, the chance-corrected alpha value for category A is relatively high. This means that the coding scheme leads coders to assign characteristic A to a coding unit if the true characteristic of the coding unit is A with a high probability. In contrast, the values for the other two categories are very low. This indicates that a coding unit with the true characteristic B or C is often assigned to the other categories. That is, the coding scheme does not ensure that coders discover the true category if the true category is B or C.
To provide a more detailed insight of the coding scheme, the beta values account for the errors which occur in the case that the true category is not discovered. That is, the beta values describe the extent to which a category is influenced by errors made in other categories. For example, if the true category of a coding unit is A and a coder does not assign A to that coding unit, the data for category B and C are biased. The data representing categories B and C comprises a coding unit that should not be part of the data.
The chance-corrected values for the beta reliabilities are
$beta and range between 0 and 1. 0 indicates that
the beta reliability equals the beta reliability in the case of complete
guessing. 1 indicate the absence of any beta errors.
$beta results#> A B C #> [1,] 0.746875 0.6835938 0.6203125
In the current example, the chance-corrected beta reliabilities are relatively high. This means that the different categories are not strongly influenced by errors in the other categories.
Iota values summarize the different types of errors for each category
by averaging the chance-corrected reliabilities. They are stored in
$iota results#> A B C #> [1,] 0.6558036 0.4121094 0.4111607
Iota can range between 0 and 1. 0 indicates that the quality of the coding of a category equals random guessing. Codings of this category are not reliable. 1 indicates a perfect reliability of a category. That is, the true value of a coding unit is recovered if the true category is the category under investigation and errors made by coding coding units of other categories do note influence the data of the category under investigation. In the current example category A is quite reliable while the other categories are not.
The assignment-error matrix combines the alpha and beta values and provides the most detailed description of a coding scheme. It is based on the raw estimates without any chance-correction.
$assignment_error_matrix results#> A B C #> A 0.4285714 0.4545455 0.5454545 #> B 0.4000000 0.8461538 0.6000000 #> C 0.4444444 0.5555556 0.7857143
The AEM has to be read row by row because the rows represent the true category of a coding unit and the columns represent the assigned categories. The values on the diagonal represent the alpha-error of the categories. That is the probability not to assign the true category to a coding unit. The other cells describe the probability to assign the category to the other categories. That is, they inform about the probability to choose a category under the condition that the true category is not recovered.
In the example, the alpha-error of category A is about .43 meaning that a coding unit of category A is coded as another category in about 43 %. Or in other words: The probability to assign the true category to a coding unit of category A is about 57%. Thus, the second and third cell in the first row mean: If the true category of a coding unit belonging to category A is not recovered, about 45% of the codings are assigned as category B and 55% of the codings are assigned to category C. Thus, category C suffers more from errors made with coding units truly belonging to category A than category B.
The alpha-error of category B is about .85. Thus in about 85% of cases the coding units truly belonging to category B are assigned to another category. In other words: The probability to recover the true category is about 15 % if the true category of the coding unit is B. In the case that this error occurs, 40% of the cases are assigned as category A and 60 % are assigned to category C. Thus, the data of category C suffers more from errors made on coding units belonging truly to category B than category A.
The alpha-error of category C is about 79%. Thus, in about 79% of the cases a coding unit truly belonging to category C is assigned to category A or B. If this error occurs 44 % of the codings units are treated as category A and 56% are treated as category B. In consequence, category B suffers more form errors made with codings units of category C than category A. In other words: The data for category B is more strongly biased by errors of category C than the data of category A.
The measures described above provide detailed insights into the
reliability of every single category which is a new feature for content
analysis and very helpful for constructing a coding scheme or for
evaluating data of empirical studies. In many applications however, the
values have to be summarized to values representing the quality of the
complete coding scheme. In the Iota Concept this is done by averaging
the iota values for each category. The value is accessible by
$average_iota results#>  0.4930246
In the current example average iota is about .49. At the moment only a rule of thumb for ordinal data exist. According to Berding et al. (2022) an average iota of at least .474 is necessary for an acceptable level of reliability on the scale level. For a “good” reliability average iota should be at least .601.