**wconf is a package that allows users to create weighted
confusion matrices and accuracy scores**

Used to improve the model selection process, the package includes several weighting schemes which can be parameterized, as well as the option for custom weight configurations. Furthermore, users can decide whether they wish to positively or negatively affect the accuracy score as a result of applying weights to the confusion matrix. “wconf” integrates with the “caret” package, but it can also work standalone when provided data in matrix form.

Confusion matrices are used to visualize the performance of classification models in tabular format. A confusion matrix takes the form of an “n x n” matrix depicting:

the reference category, in columns;

the predicted category, in rows;

the number of observation corresponding to each combination of “reference - predicted” category couples, as cells of the matrix.

Visually, the simplest binary classification confusion matrix takes on the form:

\[ A = \begin{bmatrix}TP & FP \\FN & TN\\ \end{bmatrix} \] where:

\(TP\) - True Positives - the number of observations that were “positive” and were correctly predicted as being “positive”

\(TN\) - True Negatives - the number of originally “negative” observations that were correctly predicted by the model as being “negative”.

\(FP\) - False Positives - also called “Type 1 Error” - represents observations that are in fact “negative”, but were incorrectly classified by the model as being “positive”.

\(FN\) - False Negatives - also called “Type 2 Error” - represents observations that are in fact “positive”, but were incorrectly classified by the model as being “negative”.

The traditional accuracy metric is compiled by adding the true positives and true negatives, and dividing them by the total number of observations.

\[ A = \frac{TP + TN} {N} \]

A weighted confusion matrix consists in attributing weights to all classification categories based on their distance from the correctly predicted category. This is important for multi-category classification problems (where there are three or more categories), where distance from the correctly predicted category matters.

The weighted confusion matrix, for the simple binary classification, takes the form:

\[ A = \begin{bmatrix}w1*TP & w2*FP \\w2*FN & w1*TN\\ \end{bmatrix} \]

In the case of the weighted confusion matrix, a weighted accuracy score can be calculated by summing up all of the elements of the matrix and dividing the resulting amount by the number of observations.

\[ A = \frac{w1*TP + w2*FP + w2*FN + w1*TN} {N} \]

For more details on the method, see the paper:

Monahov, A. (2023). wconf: The weighted confusion matrix and accuracy scores package for R

This function compiles a weight matrix according to one of several weighting schemas and allows users to visualize the impact of the weight matrix on each element of the confusion matrix.

In R, simply call the function:

`weightmatrix(n, weight.type = "arithmetic", weight.penalty = FALSE, standard.deviation = 2, geometric.multiplier = 2, interval.high=1, interval.low = -1, custom.weights = NA, plot.weights = FALSE)`

The function takes as input:

*n* – the number of classes contained in the confusion
matrix.

*weight.type* – the weighting schema to be used. Can be one
of: “arithmetic” - a decreasing arithmetic progression weighting scheme,
“geometric” - a decreasing geometric progression weighting scheme,
“normal” - weights drawn from the right tail of a normal distribution,
“interval” - weights contained on a user-defined interval, “custom” -
custom weight vector defined by the user.

*weight.penalty* – determines whether the weights associated
with non-diagonal elements generated by the “normal”, “arithmetic” and
“geometric” weight types are positive or negative values. By default,
the value is set to FALSE, which means that generated weights will be
positive values.

*standard.deviation* – standard deviation of the normal
distribution, if the normal distribution weighting schema is used.

*geometric.multiplier* – the multiplier used to construct the
geometric progression series, if the geometric progression weighting
scheme is used.

*interval.high* – the upper bound of the weight interval, if
the interval weighting scheme is used.

*interval.low* – the lower bound of the weight interval, if
the interval weighting scheme is used.

*custom.weights* – the vector of custom weights to be applied,
is the custom weighting scheme was selected. The vector should be equal
to “n”, but can be larger, with excess values being ignored.

*plot.weights* – optional setting to enable plotting of weight
vector, corresponding to the first column of the weight matrix

The function outputs a matrix:

w | the nxn weight matrix. |

This function calculates the weighted confusion matrix by multiplying, element-by-element, a weight matrix with a supplied confusion matrix object.

In R, simply call the function:

`wconfusionmatrix(m, weight.type = "arithmetic", weight.penalty = FALSE, standard.deviation = 2, geometric.multiplier = 2, interval.high=1, interval.low = -1, custom.weights = NA, print.weighted.accuracy = FALSE)`

The function takes as input:

*m* – the caret confusion matrix object or simple matrix.

*weight.type* – the weighting schema to be used. Can be one
of: “arithmetic” - a decreasing arithmetic progression weighting scheme,
“geometric” - a decreasing geometric progression weighting scheme,
“normal” - weights drawn from the right tail of a normal distribution,
“interval” - weights contained on a user-defined interval, “custom” -
custom weight vector defined by the user.

*weight.penalty* – determines whether the weights associated
with non-diagonal elements generated by the “normal”, “arithmetic” and
“geometric” weight types are positive or negative values. By default,
the value is set to FALSE, which means that generated weights will be
positive values.

*standard.deviation* – standard deviation of the normal
distribution, if the normal distribution weighting schema is used.

*geometric.multiplier* – the multiplier used to construct the
geometric progression series, if the geometric progression weighting
scheme is used.

*interval.high* – the upper bound of the weight interval, if
the interval weighting scheme is used.

*interval.low* – the lower bound of the weight interval, if
the interval weighting scheme is used.

*custom.weights* – the vector of custom weights to be applied,
is the custom weighting scheme was selected. The vector should be equal
to “n”, but can be larger, with excess values being ignored.

*print.weighted.accuracy* – optional setting to print the
weighted accuracy metric, which represents the sum of all weighted
confusion matrix cells divided by the total number of observations.

The function outputs a matrix:

w_m | the nxn weighted confusion matrix. |

This function calculates classification accuracy scores using the sine-based formulas proposed by Starovoitov and Golub (2020). The advantage of the new method consists in producing improved results when compared with the standard balanced accuracy function, by taking into account the class distribution of errors. This feature renders the method useful when confronted with imbalanced data.

In R, simply call the function:

`balancedaccuracy(m, print.scores = TRUE)`

The function takes as input:

*m* – the caret confusion matrix object or simple matrix.

*print.scores* – used to display the accuracy scores when set
to TRUE.

The function outputs a list of objects:

ACCmetrics | accuracy metrics. |

This example provides a real-world usage example of the wconf package on the Iris dataset included in R.

To load the wconf package, run the command:

`library(wconf)`

We will attempt the more difficult task of predicting petal length from sepal width. In addition, for this task, we are only given categorical information about the length of the petals, specifically that they are:

“Short (length between: 1-3)”

“Medium (length between: 3-5 cm)”

“Long (length between: 5-7 cm)”.

Numeric data is available for the sepal width.

Using caret, we train a multinomial logistic regression model to fit the numeric sepal width onto our categorical petal length data. We run 10-fold cross-validation, repeated 3 times to avoid overfitting and find optimal regression coefficient values for various data configurations.

Finally, we extract the confusion matrix. We wish to weigh the confusion matrix to represent preference for observations fitted closer to the correct value. We would like to assign some degree of positive value to observations that are incorrectly classified, but are close to the correct category. Since our categories are equally spaced, we can use an arithmetic weighing scheme.

Let’s first visualize what this weighting schema would look like:

```
# View the weight matrix and plot for a 3-category classification problem, using the arithmetic sequence option.
weightmatrix(3, weight.type = "arithmetic", plot.weights = TRUE)
```

```
#> [,1] [,2] [,3]
#> [1,] 1.0 0.5 0.0
#> [2,] 0.5 1.0 0.5
#> [3,] 0.0 0.5 1.0
```

To obtain the weighted confusion matrix, we run the “wconfusionmatrix” command and provide it the confusion matrix object generated by caret, a weighting scheme and, optionally, parameterize it to suit our objectives. Using the “wconfusionmatrix” function will automatically determine the dimensions of the weighing matrix and the user need only specify the parameters associated with their weighting scheme of choice.

The following block of code produces the weighted confusion matrix, to out specifications.

```
# Load libraries and perform transformations
library(caret)
#> Warning: package 'caret' was built under R version 4.2.3
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 4.2.3
#> Loading required package: lattice
#> Warning: package 'lattice' was built under R version 4.2.3
data(iris)
$Petal.Length.Cat = cut(iris$Petal.Length, breaks=c(1, 3, 5, 7), right = FALSE)
iris
# Train multinomial logistic regression model using caret
set.seed(1)
<- trainControl(method="repeatedcv", number=10, repeats=3)
control <- train(Petal.Length.Cat ~ Sepal.Width, data=iris, method="multinom", trace = FALSE, trControl=control)
model
# Extract original data, predicted values and place them in a table
= iris$Petal.Length.Cat
y = predict(model)
yhat = table(data=yhat, reference=y)
preds
# Construct the confusion matrix
= confusionMatrix(preds)
confmat
# Compute the weighted confusion matrix and display the weighted accuracy score
wconfusionmatrix(confmat, weight.type = "arithmetic", print.weighted.accuracy = TRUE)
#> Weighted accuracy = 0.7233333
#>
#> [1,3) [3,5) [5,7)
#> [1,3) 38 2.5 0
#> [3,5) 1 37.0 9
#> [5,7) 0 6.0 15
```

Let us now undertake an analysis of the classification performance of a model on imbalanced data. To do so, we will make use of the “balancedaccuracy” function.

Consider the following example of loans classified into different categories of Loan-To-Value (LTV) - an indicator which tells a bank if a loan has enough collateral to cover against the clients’ default. Lower values of the indicator denote safer loans.

A bank’s risk department has come up with a model that classifies loans into one of four categories, depending on the LTV band of the loan. The results are presented below:

The classification categories can be interpreted in the following
manner:

cat. 1 - loans with LTVs between 40% and 60%

cat. 2 - loans with LTVs between 60% and 80%

cat. 3 - loans with LTVs between 80% and 100%

cat. 4 - loans with LTVs between 100% and 120%

Let’s look at the correlation matrix to get an idea of how well the model performs.

For category 1 (safest loans with an LTV ratio of 40%-60%), the model predicts all 50 loans that were issued correctly.

For category 2 loans, only 1 loan out of 107 loans that were issued with an LTV ratio of 60%-80%, was correctly predicted.

The performance of category 3 is also bad, as the smallest share of loans issued within this bucket were predicted correctly.

For category 4 loans (highest risk, with LTVs above 100%), Only 4 out of 37 of the loans belonging to this class were predicted correctly.

Overall, our conclusion is that this is a very bad model at predicting 3 out of 4 loan categories (categories 2 - 4). We therefore would want to assign a low score.

Let’s calculate the accuracy metrics of this loan using the “balancedaccuracy” function.

```
balancedaccuracy(mtx)
#> Confusion matrix:
#> [,1] [,2] [,3] [,4]
#> [1,] 50 0 118 5
#> [2,] 0 1 45 27
#> [3,] 0 84 22 1
#> [4,] 0 22 57 4
#>
#> Class accuracy metrics:
#> SinAcc - Starovoitov-Golub Sine-Accuracy Metrics for Imbalanced Classification Data
#> [,1] [,2] [,3] [,4]
#> [1,] 1 6.63064e-05 0.01237203 0.01043053
#> BalAcc - Balanced Accuracy Function
#> [,1] [,2] [,3] [,4]
#> [1,] 1 0.009345794 0.09090909 0.1081081
#> ACC - Standard Accuracy Function
#> [1] 0.1766055
#>
#> Overall accuracy metrics:
#> SinACC = 0.2557172 BalACC = 0.3020907 ACC = 0.1766055
#>
#> $SinACC
#> [1] 0.2557172
#>
#> $SinACC_class
#> [,1] [,2] [,3] [,4]
#> [1,] 1 6.63064e-05 0.01237203 0.01043053
#>
#> $BalACC
#> [1] 0.3020907
#>
#> $BalACC_class
#> [,1] [,2] [,3] [,4]
#> [1,] 1 0.009345794 0.09090909 0.1081081
#>
#> $ACC
#> [1] 0.1766055
```

Let’s analyze the scores:

SinACC - is the Starovoitov-Golub Sine-Accuracy Function BalACC - is the Balanced Accuracy Function ACC - is the standard Accuracy Function

`= 0.2557172 BalACC = 0.3020907 ACC = 0.1766055 SinACC `

For the SinACC and BalACC functions, we can also extract the per-category accuracy metrics, which show us how well each category was predicted.

```
:
Class accuracy metrics
SinAcc 1] [,2] [,3] [,4]
[,1 6.63064e-05 0.01237203 0.01043053
BalAcc 1] [,2] [,3] [,4]
[,1 0.009345794 0.09090909 0.1081081
```

We notice that, as all observations belonging to the first category were correctly predicted as being in the first category, both the SinACC and BalACC functions give it a score of 1 (or 100% correctly predicted).

For the other categories, SinACC penalizes the number of incorrect predictions more than BalACC. As a consequence, SinAcc and BalACC per-category scores will only be close to each other when the number of correctly predicted cases significantly exceeds that of the incorrectly predicted cases.

To exemplify this, consider the following case where, for the last class, the number of correctly predicted observations has been set to equal more than double the number of incorrectly predicted observations. As such mtx[4,4] = 70.

```
= t(matrix(
mtx c(50, 0, 118, 5,
0, 1, 45, 27,
0, 84, 22, 1,
0, 22, 57, 70),
nrow = 4))
balancedaccuracy(mtx)
#> Confusion matrix:
#> [,1] [,2] [,3] [,4]
#> [1,] 50 0 118 5
#> [2,] 0 1 45 27
#> [3,] 0 84 22 1
#> [4,] 0 22 57 70
#>
#> Class accuracy metrics:
#> SinAcc - Starovoitov-Golub Sine-Accuracy Metrics for Imbalanced Classification Data
#> [,1] [,2] [,3] [,4]
#> [1,] 1 6.63064e-05 0.01237203 0.6346096
#> BalAcc - Balanced Accuracy Function
#> [,1] [,2] [,3] [,4]
#> [1,] 1 0.009345794 0.09090909 0.6796117
#> ACC - Standard Accuracy Function
#> [1] 0.2848606
#>
#> Overall accuracy metrics:
#> SinACC = 0.411762 BalACC = 0.4449666 ACC = 0.2848606
#>
#> $SinACC
#> [1] 0.411762
#>
#> $SinACC_class
#> [,1] [,2] [,3] [,4]
#> [1,] 1 6.63064e-05 0.01237203 0.6346096
#>
#> $BalACC
#> [1] 0.4449666
#>
#> $BalACC_class
#> [,1] [,2] [,3] [,4]
#> [1,] 1 0.009345794 0.09090909 0.6796117
#>
#> $ACC
#> [1] 0.2848606
```

In this case:

```
= 0.411762 BalACC = 0.4449666 ACC = 0.2848606
SinACC
SinAcc 1] [,2] [,3] [,4]
[,1 6.63064e-05 0.01237203 0.6346096
BalAcc 1] [,2] [,3] [,4]
[,1 0.009345794 0.09090909 0.6796117
```

The accuracy metrics for the 4th category for SinACC and BalACC are relatively close to each other:

`4] = 0.6346096 BalACC[,4] = 0.6796117 SinACC[,`

Notice, however, that both the SinACC and BalACC scores are invariant to the distance of the predicted value from the correct category. If there is value in assigning some positive weight to predictions classified in the vicinity of the correct category or, conversely, applying a supplementary penalty to predictions situated far away from the correct category, then you should consider first applying weights to the confusion matrix using the function “wconfusionmatrix”, and then using the “balancedaccuracy” function on the weighted matrix.

Finally, let’s consider the case when there is a disproportionately large number of observations classified correctly in one if the categories. We assume the following confusion matrix, in which mtx[1,1] was changed to 5000:

When running the accuracy metrics, we obtain the following results.

```
balancedaccuracy(mtx)
#> Confusion matrix:
#> [,1] [,2] [,3] [,4]
#> [1,] 5000 0 118 5
#> [2,] 0 1 45 27
#> [3,] 0 84 22 1
#> [4,] 0 22 57 4
#>
#> Class accuracy metrics:
#> SinAcc - Starovoitov-Golub Sine-Accuracy Metrics for Imbalanced Classification Data
#> [,1] [,2] [,3] [,4]
#> [1,] 1 6.63064e-05 0.01237203 0.01043053
#> BalAcc - Balanced Accuracy Function
#> [,1] [,2] [,3] [,4]
#> [1,] 1 0.009345794 0.09090909 0.1081081
#> ACC - Standard Accuracy Function
#> [1] 0.9333457
#>
#> Overall accuracy metrics:
#> SinACC = 0.2557172 BalACC = 0.3020907 ACC = 0.9333457
#>
#> $SinACC
#> [1] 0.2557172
#>
#> $SinACC_class
#> [,1] [,2] [,3] [,4]
#> [1,] 1 6.63064e-05 0.01237203 0.01043053
#>
#> $BalACC
#> [1] 0.3020907
#>
#> $BalACC_class
#> [,1] [,2] [,3] [,4]
#> [1,] 1 0.009345794 0.09090909 0.1081081
#>
#> $ACC
#> [1] 0.9333457
```

The standard accuracy score receives a tremendous improvement, given that it only considers the total number of correctly classified observations. Both the SinACC and BalACC scores are unaffected however. This is because, just as in the initial case, the first category continues to be estimated correctly in 100% of the predictions that the model generates for loans in this category.

`= 0.2557172 BalACC = 0.3020907 ACC = 0.9333457 SinACC `

The SinACC score, remains more conservative than the BalACC, but the difference between the two continues to be the same.