BayesMallows: An R Package for Probabilistic Preference Learning with the Mallows Rank Model

Øystein Sørensen

2018-11-30

The BayesMallows package implements methods for Bayesian preference learning with the Mallows rank model, as originally described in Vitelli et al. (2018), and further developed in Asfaw et al. (2016) and Crispino et al. (2018). This vignette describes the usage of the package, starting from the complete data cases, through top-\(k\) rankings, pairwise comparisons, and finally clustering. We refer to the above mentioned papers, as well as the review Liu et al. (2018) for a thorough description of the methods. The necessary methods for data preprocessing, tuning of algorithms, and assessment of the posterior distributions will be described along the way.

library(BayesMallows)

We also need some packages for plotting and data wrangling.

library(dplyr)
library(ggplot2)
library(tidyr)

Overview of Package

Functions

Here is an overview of the most used functions. You can read their documentation with ?function_name, or search for an example in this vignette.

Main functions in the BayesMallows package.
Function Name Description
compute_mallows Compute the posterior distribution of the Bayesian Mallows model. This is the main function of the package. Returns an object of class BayesMallows.
compute_mallows_mixtures Compute multiple Mallows models with different number of mixture components. This is a convenience function for determining the number of mixtures to use.
sample_mallows Sample from the Mallows model.
plot.BayesMallows Quick plots of the posterior densities of parameters of the Mallows model.
assess_convergence Study the convergence of the Markov chain, in order to determine burnin and other algorithm parameters.
plot_elbow Create an elbow plot for comparing models with different number of clusters.
plot_top_k Plot the top-\(k\) rankings. Particularly relevant when the data is in the form of pairwise comparisons.
assign_cluster Compute the cluster assignment of assessors.
compute_consensus Com pute the CP or MAP consensus ranking of the latent ranks.
compute_posterior_intervals Compute Bayesian posterior intervals for the parameters.
generate_initial_ranking Generate an initial ranking, for the case of missing data or pairwise comparisons.
generate_transitive_closure Generate the transitive closure for a set of pairwise comparisons.
estimate_partition_function Estimate the partition function of the Mallows model using either importance sampling or an asymptotic approximation.

Datasets

Here is an overview of the example datasets in BayesMallows. You can read their documentation with ?dataset_name, or search for an example in this vignette.

Example datasets in the BayesMallows package.
Dataset Name Description
beach_preferences Stated pairwise preferences between random subsets of 15 images of beaches, by 60 assessors.
sushi_rankings Complete rankings of 10 types of sushi by 5000 assessors.
potato_visual Complete rankings of 20 potatoes by weight, based on visual inspection, by 12 assessors.
potato_weighing Complete rankings of 20 potatoes by weight, where the assessors were allowed to weigh the potatoes in their hands, by 12 assessors.
potato_true_ranking Vector of true weight rankings for the 20 potatoes in the example datasets potato_visual and potato_weighing.

Mallows’ Rank Model

We here give an informal review of the Mallows model (Mallows (1957)). The distribution of a ranking \(r \in \mathcal{P}_{n}\) of \(n\) items is modeled as

\[ P(r | \alpha, \rho) = Z_{n}(\alpha)^{-1} \exp\left\{-\frac{\alpha}{n} d(r, \rho)\right\} 1_{\mathcal{P}_{n}}(r), \]

where \(\mathcal{P}_{n}\) is the set of all permutations of \(1, \dots, n\), \(\alpha\) is a scale parameter, \(\rho \in \mathcal{P}_{n}\) is a latent consensus ranking, \(d(\cdot, \cdot)\) is a distance measure, \(1_{S}(\cdot)\) is the indicator function for the set \(S\), and \(Z_{n}(\alpha)\) is the partition function, or normalizing constant.

Given \(N\) observed rankings, \(R_{1}, \dots, R_{N}\), the likelihood of the model is

\[ P(R_{1}, \dots, R_{N} | \alpha, \rho) = Z_{n}(\alpha)^{-N} \exp\left\{-\frac{\alpha}{n} \sum_{j=1}^{N}d(R_{j}, \rho)\right\} \prod_{j=1}^{N} \left\{1_{\mathcal{P}_{n}}(R_{j}) \right\}. \]

The rankings argument to compute_mallows is assumed to be a matrix of the form \((R_{1}, R_{2}, \dots, R_{N})^{T}\), i.e., each row contains a ranking and each column is an item.

Prior Distributions

For \(\alpha\) we use an exponential prior, with density \(\pi(\alpha | \lambda) = \lambda \exp(-\lambda \alpha).\) The rate parameter \(\lambda\) can be set by the user with the lambda argument to compute_mallows. For \(\rho\) we assume a uniform prior distribution on \(\mathcal{P}_{n}\), with density \(\pi(\rho) = 1_{\mathcal{P}_{n}}(\rho) /n!.\)

Metropolis-Hastings Algorithm

We use a Metropolis-Hastings algorithm for computing the posterior distributions of \(\alpha\) and \(\rho\). We propose \(\alpha\) from a lognormal distribution \(\log \mathcal{N}(\log(\alpha), \sigma_{\alpha}^{2})\). We propose \(\rho\) with a leap-and-shift algorithm, described in detail in Vitelli et al. (2018). The standard deviation \(\sigma_{\alpha}^{2}\) and the leap size can be set by the user with the arguments alpha_prop_sd and leap_size to compute_mallows.

Partial Rankings

If each assessor \(j\) has ranked a subset of the items \(\mathcal{A}_{j}\), we use data augmentation to fill in the missing ranks. We define the augmented data vectors \(\tilde{R}_{1}, \dots, \tilde{R}_{N}\), and use a uniform prior for each assessor with support \(\mathcal(P)_{n} \setminus R_{j}\), i.e., the set of rankings not already chosen. The Metropolis-Hastings algorithm now alternates between sampling \(\tilde{R}_{1}, \dots, \tilde{R}_{N}\) given the current \(\alpha\) and \(\rho\), and sampling \(\alpha\) and \(\rho\) given the current \(\tilde{R}_{1}, \dots, \tilde{R}_{N}\).

Pairwise Comparisons

When the assessors have stated a set of pairwise comparisons, rather than rankings, we use the same data augmentation ideas as for partial rankings, but the proposal distribution is slightly more complicated in order to ensure that the proposed ranking is compliant with the ordering implied by the pairwise comparisons. In addition, the transitive closure of the stated ordering has to be computed, in order to find all implied orderings. From a user perspective, no new algorithm parameters need to be considered.

Mixtures of Mallows Models

When the assessor pool is heterogeneous, one might assume that there exist several latent consensus rankings, \(\rho_{1}, \dots, \rho_{C}\), one for each cluster of assessors. Letting \(z_{1}, \dots, z_{N} \in\{1, \dots, C\}\) assign each assessor to each cluster, the likelihood of the observed rankings is

\[P(R_{1}, \dots, R_{N} | \{\alpha_{c}, \rho_{c}\}_{c=1,\dots,C}, z_{1},\dots,z_{N}) = \prod_{j=1}^{N}\frac{1_{\mathcal{P}_{n}}(\rho)}{Z_{n}(\alpha_{z_{j}})}\exp\left\{ -\frac{\alpha_{z_{j}}}{n} d(R_{j}, \rho_{z_{j}})\right\}. \] For the scale parameters \(\alpha_{1}, \dots, \alpha_{C}\) we assume the exponential prior as before, all with the same rate parameter \(\lambda\). We assume that the cluster labels are a priori distributed according to \(P(z_{1}, \dots, z_{N} | \tau_{1}, \dots, \tau_{C}) = \prod_{j=1}^{N} \tau_{z_{j}}\), where \(\tau_{c}\) is the a priori probability that an assessor belongs to cluster \(c\). For \(\tau_{1}, \dots, \tau_{C}\) we assume the Dirichlet prior \(\pi(1, \dots, C) = \Gamma(\psi C)\Gamma(\psi)^{-C}\prod_{c=1}^{C}\tau_{c}^{\psi - 1}\), where \(\Gamma(\cdot)\) is the gamma function. The user can control the value of \(\psi\) with the psi argument to compute_mallows.

BayesMallows User Guide

Having described the model, we now show some case studies demonstrating the use of the BayesMallows package. To reduce the computing time, we have typically taken a fairly small number of posterior samples, and rather focused on demonstrating the different methods. In general, we recommend taking a larger number of samples than what we do here, in order to obtain good approximations of the posterior distributions.

Completely Ranked Data

BayesMallows comes with example data described in Liu et al. (2018). A total of 12 assessors were asked to rank 20 potatoes based on their weight. In the first round, the assessors were only allowed to study the potatoes visually, while in the second round, the assessors were also allowed to hold the potatoes in their hands in order to compare them. The data sets are named potato_visual and potato_weighing, respectively. The true ordering of the potatoes’ weights are stored in the vector potato_true_ranking.

The potato_visual dataset is shown below. The column names P1, …, P20 represent potatoes, and the row names A1, …, A12 represent assessors. The potato_weighing dataset has a similar structure.

Example dataset potato_visual.
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20
A1 10 18 19 15 6 16 4 20 3 5 12 1 2 9 17 8 7 14 13 11
A2 10 18 19 17 11 15 6 20 4 3 13 1 2 7 16 8 5 12 9 14
A3 12 15 18 16 13 11 7 20 6 3 8 2 1 4 19 5 9 14 10 17
A4 9 17 19 16 10 15 5 20 3 4 8 1 2 7 18 11 6 13 14 12
A5 12 17 19 15 7 16 2 20 3 9 13 1 4 5 18 11 6 8 10 14
A6 10 15 19 16 8 18 6 20 3 7 11 1 2 4 17 9 5 13 12 14
A7 9 16 19 17 10 15 5 20 3 8 11 1 2 6 18 7 4 14 12 13
A8 14 18 20 19 11 15 6 17 4 3 10 1 2 7 16 8 5 12 9 13
A9 8 16 18 19 12 13 6 20 5 3 7 1 4 2 17 10 9 15 14 11
A10 7 17 19 18 9 15 5 20 3 10 11 1 2 6 16 8 4 13 12 14
A11 12 16 19 15 13 18 7 20 3 5 11 1 2 6 17 10 4 14 8 9
A12 14 15 19 16 12 18 8 20 3 4 9 1 2 7 17 6 5 13 10 11

Algorithm Tuning

The compute_mallows function is the workhorse of BayesMallows. It runs the Metropolis-Hastings algorithm and returns the posterior distribution of the scale parameter \(\alpha\) and the latent ranks \(\rho\) of the Mallows model. To see all its arguments, please run ?compute_mallows in the console.

We start by using all the default values of the parameters, so we only need to supply the matrix of ranked items. We use the potato_visual data printed above.

The argument returned is a list object of class BayesMallows, which contains a whole lot of information about the MCMC run.

The function assess_convergence produces plots for visual convergence assessment. We start by studying \(\alpha\), which is the default. The plot is shown below, and looks good enough, at least to begin with.

Next, we study the convergence of \(\rho\). To avoid too complicated plots, we pick 5 items to plot. Again, you can read more about this function by running ?assess_convergence in the console.

Based on these plots, it looks like the algorithm starts to converge after around 1000 iterations. Discarding the first 1000 iterations as burn-in hence seems like a safe choice. We create a new list element for model_fit called burnin, in which this choice is saved. Having done this, we do not have to provide a burnin argument to subsequent analyses of the posterior distributions.

Posterior Distributions

Once we are confident that the algorithm parameters are reasonable, we can study the posterior distributions of the model parameters using the generic function plot.BayesMallows.

Scale Parameter \(\alpha\)

With a burnin of 1000, the original model_fit object from the previous subsection has 1000 MCMC samples. The default parameter of plot.BayesMallows is \(alpha\), so we can study the posterior distribution with the simple statement below.

We can also get the posterior credible intervals for \(\alpha\):

parameter mean median conf_level hpdi central_interval
alpha 10.95 10.908 95 % [9.759,12.241] [9.718,12.229]

Latent Ranks \(\rho\)

Obtaining posterior samples from \(\rho\) is in general harder than for \(\alpha\). Some items tend to be very sticky. We start by plotting the model_fit object from above. We now have to tell plot.BayesMallows that we want a plot of parameter = "rho" and all the items. This gives us posterior the posterior density of all the items.

We can also find the posterior intervals of the latent ranks:

item parameter mean median conf_level hpdi central_interval
P1 rho 10 10 95 % [9,11] [9,11]
P2 rho 17 17 95 % [16,18] [16,18]
P3 rho 19 19 95 % [19] [19]
P4 rho 16 16 95 % [16,18] [16,18]
P5 rho 10 10 95 % [9,12] [9,12]
P6 rho 15 15 95 % [15] [15]
P7 rho 6 6 95 % [5,7] [5,7]
P8 rho 20 20 95 % [20] [20]
P9 rho 3 3 95 % [3] [3]
P10 rho 4 4 95 % [4] [4]
P11 rho 11 11 95 % [9,11] [9,11]
P12 rho 1 1 95 % [1] [1]
P13 rho 2 2 95 % [2] [2]
P14 rho 7 7 95 % [6,7] [6,7]
P15 rho 18 18 95 % [17,18] [17,18]
P16 rho 8 8 95 % [8] [8]
P17 rho 5 5 95 % [5,6] [5,6]
P18 rho 14 14 95 % [13,14] [13,14]
P19 rho 12 12 95 % [10],[12] [10,12]
P20 rho 13 13 95 % [13,14] [13,14]

Jumping over \(\alpha\)

Updating \(\alpha\) in each step may not be necessary. With the alpha_jump argument, we can tell the MCMC algorithm to update \(\alpha\) only every alpha_jumpth iteration.

We should assess convergence again:

Thinning

Saving a large number of iterations of \(\rho\) gets quite expensive, so compute_mallows has a rho_thinning parameter. It specifies that only each rho_thinningth iteration of \(\rho\) should be saved to memory. We double the number of iterations, while setting rho_thinning = 2. This gives us the same number of posterior samples.

Please be careful with thinning. In this small data example it is definitely wasteful! Running the same number of iterations without thinning always gives a better approximation of the posterior distribution. Thinning might be useful when you need to run a large number of iterations to explore the space of latent ranks, and the latent ranks from all iterations do not fit in memory. (See, e.g., Gelman et al. (2004) for a discussion of thinning).

Varying the Distance Metric

We can try to use the Kendall distance instead of the footrule distance.

Validation of Input

It is also worth pointing out that compute_mallows checks if the input data are indeed ranks. Let us try this by manipulating the first row of potato_visual, giving rank 1 to the first two items:

Top-\(k\) Rankings

Encoding of Missing Ranks

Now imagine that the assessors in the potato experiment were asked to rank only the top-five heaviest potatoes. We generate these data by retaining only ranks 5 or higher in potato_visual, setting the rest to NA.

In Vitelli et al. (2018) it is shown that the unranked items do not effect the MAP estimates of the ranked items in this top-k setting. In this case, there are 8 potatoes which have been ranked, and so the unranked potatoes should have uniform posterior distributions between 9 and 20. However, arriving at these uniform posteriors require a large number of MCMC iterations, so we instead remove these items:

We are now left with this 12 by 8 matrix:

Example dataset potato_top.
P7 P9 P10 P12 P13 P14 P16 P17
A1 4 3 5 1 2 NA NA NA
A2 NA 4 3 1 2 NA NA 5
A3 NA NA 3 2 1 4 5 NA
A4 5 3 4 1 2 NA NA NA
A5 2 3 NA 1 4 5 NA NA
A6 NA 3 NA 1 2 4 NA 5
A7 5 3 NA 1 2 NA NA 4
A8 NA 4 3 1 2 NA NA 5
A9 NA 5 3 1 4 2 NA NA
A10 5 3 NA 1 2 NA NA 4
A11 NA 3 5 1 2 NA NA 4
A12 NA 3 4 1 2 NA NA 5

Metropolis-Hastings Algorithm with Missing Ranks

The compute_mallows function automatically recognizes the NA values as missing ranks, and augments the data, as described in Section 4.1 of Vitelli et al. (2018). Let us try:

Looking at the returned object, we see that any_missing is TRUE, so compute_mallows has correctly detected that there are missing values. We set save_aug = TRUE so we can get a diagnostic plot for the convergence of the data augmentation. After you are confident that the algorithm converges, you might want to use the default save_aug = FALSE, because saving the data takes memory.

model_fit also has assessor-wise acceptance rates for the augmentation.

assessor acceptance_rate
1 0.56
2 0.56
3 0.90
4 0.58
5 0.59
6 0.59
7 0.58
8 0.54
9 0.56
10 0.56
11 0.51
12 0.55

Algorithm Tuning

The tuning of \(\alpha\) and \(\rho\) can be done exactly as described above for the full potato data. Let us now study the convergence of the data augmentation.

Looking back, we see that assessor 1 does not have values for potatoes 14, 16, and 17. Hence, we expect these to jump between ranks 6, 7, and 8, since the 5 ranked items are fixed to values 1,…,5. Below we have plotted potato 14, which fluctuates between 6 and 8. You can confirm that these same happens to 16 and 17.

Assessor 1 has ranks for potatoes 7, 9, 10, 12, and 13, which are fixed. You can confirm that they are fixed by running the following command.

Posterior Distributions

We can study the posterior distributions of \(\alpha\) and \(\rho\) just like we did above.

Other Distance Measures

Like for the complete ranks, we can vary the distance measure used in the Mallows model. Here are command for running with Cayley, Kendall, and Spearman distance.

Ranks Missing at Random

If the ranks are missing at random, we cannot remove the unranked items as we did for top-\(k\) rankings above. Let us assume that 10 % of the data in potato_visual have disappeared due to a disk failure. We generate these in the code chunk below:

The data now look like the following:

Example dataset potato_missing.
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20
A1 10 NA 19 NA 6 16 4 20 3 5 12 1 2 9 NA 8 7 14 13 11
A2 10 NA 19 17 11 NA 6 20 4 3 13 1 2 7 16 8 5 12 9 14
A3 12 15 18 16 13 11 7 20 6 3 8 2 1 4 19 5 9 14 10 17
A4 9 17 19 16 10 15 5 20 3 4 8 1 2 7 18 11 6 13 NA 12
A5 12 17 19 NA 7 16 2 20 3 9 13 1 4 5 18 11 6 8 10 14
A6 NA 15 19 16 8 18 6 20 3 7 11 1 2 4 17 9 5 13 12 14
A7 9 16 19 17 10 15 5 20 3 8 11 NA 2 6 18 7 4 14 12 NA
A8 14 18 20 19 11 15 6 17 4 3 10 1 2 NA 16 8 5 12 9 13
A9 8 NA 18 NA 12 13 6 20 5 3 NA 1 4 NA 17 10 9 15 14 11
A10 7 17 19 18 NA 15 NA 20 3 10 11 1 2 6 16 8 NA 13 12 14
A11 NA 16 19 15 NA 18 7 20 3 NA 11 NA 2 6 17 10 4 NA 8 9
A12 14 15 19 16 12 18 8 20 3 4 9 1 2 7 17 6 5 NA 10 NA

Algorithm Tuning

We supply potato_missing to compute_mallows as before:

We check the convergence with the following commands. The plots (not shown) would show good convergence before 1,000 iterations.

Again, we can look at the acceptance rates for the augmented data. Now let us compare it to the number of missing ranks per assessor.

assessor acceptance_rate n_missing
1 0.56 3
2 0.68 2
3 1.00 0
4 1.00 1
5 1.00 1
6 1.00 1
7 0.53 2
8 1.00 1
9 0.16 4
10 0.41 3
11 0.08 5
12 0.98 2

We see that for assessors who have either 0 or 1 missing ranks, their augmentations are always accepted. In the case of 0 missing ranks, there is in fact no augmentation going on at all, but we use the convention that the assessor’s own complete data is accepted.

Posterior Distributions

The posterior distributions can be studied as shown above.

Pairwise Preferences

Handling of pairwise preferences in the Mallows rank model is described in Section 4.2 of Vitelli et al. (2018).

Introduction

Let us start by considering a toy example with two assessors and five items. Assessor 1 has stated a set of preferences \[ \mathcal{B}_{1} = \left\{A_{1} \prec A_{2}, A_{2} \prec A_{5}, A_{4} \prec A_{5} \right\} \] and assessor 2 has the set of preferences \[ \mathcal{B}_{2} = \left\{ A_{1} \prec A_{2}, A_{2} \prec A_{3}, A_{3} \prec A_{4} \right\}. \]

Data Model

Each time an assessor is asked to compare two objects, a measurement is made. Therefore, in order to keep the data tidy (Wickham (2014)), we define a dataframe in which each row corresponds to a pairwise comparison. The columns (variables) are the assessor, the bottom item, and the top item.

In the code snippet below, we define such a dataframe for the toy example presented above:

Dataset pair_comp.
assessor bottom_item top_item
1 1 2
1 2 5
1 4 5
2 1 2
2 2 3
2 3 4

Transitive Closure

Next, we need to find the transitive closure for the set of pairwise comparisons given by each user. BayesMallows comes with a function generate_transitive_closure to do just this.

As we can see, pair_comp_tc has an additional row containing the relation \(A_{4} \prec A_{5}\) for assessor 1. For assessor 2, \[\text{tc}(\mathcal{B}_{2}) = \mathcal{B}_{2} \cup \left\{ A_{1} \prec A_{3}, A_{1} \prec A_{4}, A_{2} \prec A_{4}\right\},\] so three new rows have been added.

Dataframe pair_comp_tc.
assessor bottom_item top_item
1 1 2
1 1 5
1 2 5
1 4 5
2 1 2
2 1 3
2 2 3
2 1 4
2 2 4
2 3 4

The dataframe returned by generate_transitive_closure inherits from tibble, but has subclass BayesMallowsTC. The compute_mallows function uses this information to ensure that the object provided has been through the generate_transitive_closure function. If it has not, compute_mallows will do it for us, but this may lead to additional computing time when running several diagnostic runs and trying out different parameters, since the transitive closure will be recomputed each time.

Initial Ranking

We can also generate an initial ranking, consistent with the pairwise comparisons. Again, compute_mallows will do it for us, but we may save time by computing it once and for all before we starting running the algorithms.

Rather than digging deeper into this toy example, we go on with a real application.

Beach Preferences

The beach preference dataset is described in Section 6.2 of Vitelli et al. (2018), and is available in the dataframe beach_preferences in BayesMallows. In short, \(60\) assessors were each asked to perform a random set of pairwise comparisons between pictures of \(15\) beaches. The first few rows are listed below.

Example dataset beach_preferences
assessor bottom_item top_item
1 2 15
1 5 3
1 13 3
1 4 7
1 5 15
1 12 6

Initial Ranking

Next, we generate an initial ranking.

We can also take a look at the first 6 rows in it.

First 6 rows in beach_init_rank.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
7 15 3 14 13 2 10 12 6 4 1 5 8 11 9
5 15 14 6 2 13 12 11 1 7 4 10 9 8 3
3 9 7 15 14 2 6 13 1 8 12 11 10 5 4
5 15 2 14 11 8 9 10 1 4 3 13 7 12 6
9 15 7 5 6 1 14 13 4 3 2 12 11 8 10
9 13 5 12 15 3 7 11 1 6 4 14 8 10 2

Let us add column names to the initial ranking. This will make the plots more informative. But be aware that if the names are too long, some plots will be filled with text. We therefore name them B1, B2, …, B15.

Algorithm Tuning

We can now check the convergence, using the same tools as before.

The trace plots for \(\alpha\) and \(\rho\) show good convergence.

Convergence of Rtilde

When using assess_convergence, we can set parameter = "Rtilde" and choose assessors and items. We now illustrate this by going a bit deeper into some assessors. Referring to the mathematical notation in the top of the vignette, Rtilde corresponds to the augmented rankings \(\tilde{R}_{1}, \dots, \tilde{R}_{N}\).

Assessor 1

Let us look at Beach 2 for assessor 1.

assessor bottom_item top_item
1 2 6
1 2 15

It is implied by the preferences of assessor 1 that \(\{B_{2} \prec B_{6}\}\) and \(\{B_{2} \prec B_{15}\}\).

This seems correct.

Next, no ordering is implied between beach 2 and 4 for assessor 1.

The traces of item 2 and 4 do indeed cross.

No ordering is implied between beach 5 and beach 8 either:

The traces of beaches 5 and 8 do indeed cross.

Assessor 2

We go on with assessor 2. Let us take a look at, say, beach 10.

assessor bottom_item top_item
2 10 5
2 7 10

Beach 10 is preferred to beach 7, but disfavored to beach 5. The trace plot does indeed show this:

No ordering is implied between beach 1 and beach 15 for assessor 2.

Their traces do indeed cross.

We delete the test run before going on.

Posterior Distributions

Based on the convergence diagnostics, and being fairly conservative, we set burnin = 1000, and take an additional 5,000 samples. We still save the augmented data, because we are going to call the function plot_top_k below.

The posterior densities of \(\alpha\) and \(\rho\) can be studied as shown above.

CP Consensus

We can rank the beaches according to the cumulative probability (CP) consensus. This functionality is provided by compute_consensus, which returns a dataframe.

ranking item cumprob
1 B9 0.87
2 B6 1.00
3 B3 0.57
4 B11 0.99
5 B15 1.00
6 B10 0.94
7 B1 1.00
8 B5 0.76
9 B7 0.63
10 B13 1.00
11 B8 0.66
12 B4 0.60
13 B12 0.87
14 B14 0.95
15 B2 1.00

MAP Consensus

The maximum a posterior (MAP) consensus ranking is the a posterior most likely value of \(\rho\).

probability item map_ranking
0.12 B9 1
0.12 B6 2
0.12 B3 3
0.12 B11 4
0.12 B15 5
0.12 B10 6
0.12 B1 7
0.12 B5 8
0.12 B13 9
0.12 B7 10
0.12 B8 11
0.12 B12 12
0.12 B4 13
0.12 B14 14
0.12 B2 15

Posterior Intervals

We can compute posterior intervals for several parameters. Here we do it for \(\rho\).

item parameter mean median conf_level hpdi central_interval
B1 rho 7 7 95 % [6,7] [6,7]
B2 rho 15 15 95 % [14,15] [14,15]
B3 rho 3 3 95 % [3,4] [3,4]
B4 rho 12 12 95 % [11,13] [11,13]
B5 rho 8 8 95 % [8,10] [8,10]
B6 rho 2 2 95 % [1,2] [1,2]
B7 rho 9 9 95 % [8,10] [8,10]
B8 rho 11 11 95 % [11,13] [11,13]
B9 rho 1 1 95 % [1,2] [1,2]
B10 rho 6 6 95 % [6,7] [6,7]
B11 rho 4 4 95 % [3,4] [3,4]
B12 rho 13 13 95 % [12,14] [12,15]
B13 rho 9 10 95 % [8,10] [8,10]
B14 rho 14 14 95 % [13,15] [13,15]
B15 rho 5 5 95 % [5] [4,5]

Posterior Probability of Being Ranked Top-\(k\)

We can also find the posterior probability for each beach of being ranked top-\(k\), both in \(\rho\) and among the assessors.

We can do a top-\(k\) plot with plot_top_k. By default, k = 3. It may be necessary to do some experimentation with the rel_widths argument to get a good looking plot.

Clustering of Assessors

In many situations, it is interesting to divide the assessors into clusters, in which each the assessors in each cluster have similar preferences. In Vitelli et al. (2018), Section 4.3, it is shown how the Bayesian Mallows model can be extended to do exactly this, with a mixture approach. BayesMallows supports this through the optional argument n_clusters to compute_mallows.

Introduction

To illustrate how to perform clustering, let us create some clusters in the potato_visual dataset. This dataset has 12 assessors. We leave the first 6 assessors as is, but for the last 6, we revert the rankings.

Dataset potato_manipulated.
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20
A1 10 18 19 15 6 16 4 20 3 5 12 1 2 9 17 8 7 14 13 11
A2 10 18 19 17 11 15 6 20 4 3 13 1 2 7 16 8 5 12 9 14
A3 12 15 18 16 13 11 7 20 6 3 8 2 1 4 19 5 9 14 10 17
A4 9 17 19 16 10 15 5 20 3 4 8 1 2 7 18 11 6 13 14 12
A5 12 17 19 15 7 16 2 20 3 9 13 1 4 5 18 11 6 8 10 14
A6 10 15 19 16 8 18 6 20 3 7 11 1 2 4 17 9 5 13 12 14
A7 12 5 2 4 11 6 16 1 18 13 10 20 19 15 3 14 17 7 9 8
A8 7 3 1 2 10 6 15 4 17 18 11 20 19 14 5 13 16 9 12 8
A9 13 5 3 2 9 8 15 1 16 18 14 20 17 19 4 11 12 6 7 10
A10 14 4 2 3 12 6 16 1 18 11 10 20 19 15 5 13 17 8 9 7
A11 9 5 2 6 8 3 14 1 18 16 10 20 19 15 4 11 17 7 13 12
A12 7 6 2 5 9 3 13 1 18 17 12 20 19 14 4 15 16 8 11 10

Let us try first without clustering.

Looking at the convergence of \(\alpha\), we see that this model does not settle on a consensus ranking.

Let us instead introduce clustering:

Algorithm Tuning

Convergence of \(\alpha\)

We can assess convergence in the usual way. Let us start with alpha. The assess_convergence method now shows one trace per cluster.

Convergence of \(\rho\)

We plot the potato that is most often ranked heaviest (P8), and the potato that is most often ranked lightest (P12).