BayesMallows: An R Package for Probabilistic Preference Learning with the Mallows Rank Model

Øystein Sørensen

2019-02-22

The BayesMallows package implements methods for Bayesian preference learning with the Mallows rank model, as originally described in Vitelli et al. (2018), and further developed in Asfaw et al. (2016) and Crispino et al. (2018). This vignette describes the usage of the package, starting from the complete data cases, through top-\(k\) rankings, pairwise comparisons, and finally clustering. We refer to the above mentioned papers, as well as the review Liu et al. (2019) for a thorough description of the methods. The necessary methods for data preprocessing, tuning of algorithms, and assessment of the posterior distributions will be described along the way.

Overview of Package

Functions

Here is an overview of the most used functions. You can read their documentation and see examples with ?function_name.

Main functions in the BayesMallows package.
Function Name Description
compute_mallows Compute the posterior distribution of the Bayesian Mallows model. This is the main function of the package. Returns an object of class BayesMallows.
compute_mallows_mixtures Compute multiple Mallows models with different number of mixture components. This is a convenience function for determining the number of mixtures to use.
sample_mallows Sample from the Mallows model.
plot.BayesMallows Quick plots of the posterior densities of parameters of the Mallows model.
assess_convergence Study the convergence of the Markov chain, in order to determine burnin and other algorithm parameters.
plot_elbow Create an elbow plot for comparing models with different number of clusters.
plot_top_k Plot the top-\(k\) rankings. Particularly relevant when the data is in the form of pairwise comparisons.
assign_cluster Compute the cluster assignment of assessors.
compute_consensus Compute the CP or MAP consensus ranking of the latent ranks.
compute_posterior_intervals Compute Bayesian posterior intervals for the parameters.
generate_initial_ranking Generate an initial ranking, for the case of missing data or pairwise comparisons.
generate_transitive_closure Generate the transitive closure for a set of pairwise comparisons.
estimate_partition_function Estimate the partition function of the Mallows model using either importance sampling or an asymptotic approximation.

Datasets

Here is an overview of the example datasets in BayesMallows. You can read their documentation with ?dataset_name, or search for an example in this vignette.

Example datasets in the BayesMallows package.
Dataset Name Description
beach_preferences Stated pairwise preferences between random subsets of 15 images of beaches, by 60 assessors.
sushi_rankings Complete rankings of 10 types of sushi by 5000 assessors.
potato_visual Complete rankings of 20 potatoes by weight, based on visual inspection, by 12 assessors.
potato_weighing Complete rankings of 20 potatoes by weight, where the assessors were allowed to weigh the potatoes in their hands, by 12 assessors.
potato_true_ranking Vector of true weight rankings for the 20 potatoes in the example datasets potato_visual and potato_weighing.

Mallows’ Rank Model

We here give an informal review of the Mallows model (Mallows (1957)). The distribution of a ranking \(r \in \mathcal{P}_{n}\) of \(n\) items is modeled as

\[ P(r | \alpha, \rho) = Z_{n}(\alpha)^{-1} \exp\left\{-\frac{\alpha}{n} d(r, \rho)\right\} 1_{\mathcal{P}_{n}}(r), \]

where \(\mathcal{P}_{n}\) is the set of all permutations of \(1, \dots, n\), \(\alpha\) is a scale parameter, \(\rho \in \mathcal{P}_{n}\) is a latent consensus ranking, \(d(\cdot, \cdot)\) is a distance measure, \(1_{S}(\cdot)\) is the indicator function for the set \(S\), and \(Z_{n}(\alpha)\) is the partition function, or normalizing constant.

Given \(N\) observed rankings, \(R_{1}, \dots, R_{N}\), the likelihood of the model is

\[ P(R_{1}, \dots, R_{N} | \alpha, \rho) = Z_{n}(\alpha)^{-N} \exp\left\{-\frac{\alpha}{n} \sum_{j=1}^{N}d(R_{j}, \rho)\right\} \prod_{j=1}^{N} \left\{1_{\mathcal{P}_{n}}(R_{j}) \right\}. \]

The rankings argument to compute_mallows is assumed to be a matrix of the form \((R_{1}, R_{2}, \dots, R_{N})^{T}\), i.e., each row contains a ranking and each column is an item.

Prior Distributions

For \(\alpha\) we use an exponential prior, with density \(\pi(\alpha | \lambda) = \lambda \exp(-\lambda \alpha).\) The rate parameter \(\lambda\) can be set by the user with the lambda argument to compute_mallows. For \(\rho\) we assume a uniform prior distribution on \(\mathcal{P}_{n}\), with density \(\pi(\rho) = 1_{\mathcal{P}_{n}}(\rho) /n!.\)

Metropolis-Hastings Algorithm

We use a Metropolis-Hastings algorithm for computing the posterior distributions of \(\alpha\) and \(\rho\). We propose \(\alpha\) from a lognormal distribution \(\log \mathcal{N}(\log(\alpha), \sigma_{\alpha}^{2})\). We propose \(\rho\) with a leap-and-shift algorithm, described in detail in Vitelli et al. (2018). The standard deviation \(\sigma_{\alpha}^{2}\) and the leap size can be set by the user with the arguments alpha_prop_sd and leap_size to compute_mallows.

Partial Rankings

If each assessor \(j\) has ranked a subset of the items \(\mathcal{A}_{j}\), we use data augmentation to fill in the missing ranks. We define the augmented data vectors \(\tilde{R}_{1}, \dots, \tilde{R}_{N}\), and use a uniform prior for each assessor with support \(\mathcal(P)_{n} \setminus R_{j}\), i.e., the set of rankings not already chosen. The Metropolis-Hastings algorithm now alternates between sampling \(\tilde{R}_{1}, \dots, \tilde{R}_{N}\) given the current \(\alpha\) and \(\rho\), and sampling \(\alpha\) and \(\rho\) given the current \(\tilde{R}_{1}, \dots, \tilde{R}_{N}\).

Pairwise Comparisons

When the assessors have stated a set of pairwise comparisons, rather than rankings, we use the same data augmentation ideas as for partial rankings, but the proposal distribution is slightly more complicated in order to ensure that the proposed ranking is compliant with the ordering implied by the pairwise comparisons. In addition, the transitive closure of the stated ordering has to be computed, in order to find all implied orderings. From a user perspective, no new algorithm parameters need to be considered.

Mixtures of Mallows Models

When the assessor pool is heterogeneous, one might assume that there exist several latent consensus rankings, \(\rho_{1}, \dots, \rho_{C}\), one for each cluster of assessors. Letting \(z_{1}, \dots, z_{N} \in\{1, \dots, C\}\) assign each assessor to each cluster, the likelihood of the observed rankings is

\[P(R_{1}, \dots, R_{N} | \{\alpha_{c}, \rho_{c}\}_{c=1,\dots,C}, z_{1},\dots,z_{N}) = \prod_{j=1}^{N}\frac{1_{\mathcal{P}_{n}}(\rho)}{Z_{n}(\alpha_{z_{j}})}\exp\left\{ -\frac{\alpha_{z_{j}}}{n} d(R_{j}, \rho_{z_{j}})\right\}. \] For the scale parameters \(\alpha_{1}, \dots, \alpha_{C}\) we assume the exponential prior as before, all with the same rate parameter \(\lambda\). We assume that the cluster labels are a priori distributed according to \(P(z_{1}, \dots, z_{N} | \tau_{1}, \dots, \tau_{C}) = \prod_{j=1}^{N} \tau_{z_{j}}\), where \(\tau_{c}\) is the a priori probability that an assessor belongs to cluster \(c\). For \(\tau_{1}, \dots, \tau_{C}\) we assume the Dirichlet prior \(\pi(1, \dots, C) = \Gamma(\psi C)\Gamma(\psi)^{-C}\prod_{c=1}^{C}\tau_{c}^{\psi - 1}\), where \(\Gamma(\cdot)\) is the gamma function. The user can control the value of \(\psi\) with the psi argument to compute_mallows.

References

Asfaw, D., V. Vitelli, O. Sorensen, E. Arjas, and A. Frigessi. 2016. “Time‐varying Rankings with the Bayesian Mallows Model.” Stat 6 (1): 14–30. https://onlinelibrary.wiley.com/doi/abs/10.1002/sta4.132.

Crispino, M., E. Arjas, V. Vitelli, N. Barrett, and A. Frigessi. 2018. “A Bayesian Mallows approach to non-transitive pair comparison data: how human are sounds?” Accepted for Publication in Annals of Applied Statistics.

Liu, Qinghua, Marta Crispino, Ida Scheel, Valeria Vitelli, and Arnoldo Frigessi. 2019. “Model-Based Learning from Preference Data.” Annual Review of Statistics and Its Application 6 (1). https://doi.org/10.1146/annurev-statistics-031017-100213.

Mallows, C. L. 1957. “Non-Null Ranking Models. I.” Biometrika 44 (1/2): 114–30.

Vitelli, V., O. Sorensen, M. Crispino, E. Arjas, and A. Frigessi. 2018. “Probabilistic Preference Learning with the Mallows Rank Model.” Journal of Machine Learning Research 18 (1): 1–49. http://jmlr.org/papers/v18/15-481.html.