Introduction to RprobitB

Lennart Oelschläger



RprobitB is an R package that can be used to fit latent class mixed multinomial probit (LCMMNP) models to simulated or empirical data. RprobitB is licensed under the GNU General Public License v3.0.

Do you found a bug or request a feature? Please tell us!

The package name RprobitB is a portmanteau, combining R (the programming language), probit (the model name) and B (for Bayes, the estimation method).

RprobitB is able to

At this point, you may ask yourself:

  1. How to install RprobitB?
  2. How to use RprobitB?
  3. How is the LCMMNP model defined?
  4. How does the latent class updating scheme work?
  5. How to specify a LCMMNP model in RprobitB?

Installing RprobitB

To install the latest version of RprobitB, run install.packages("RprobitB") in your R console.

Using RprobitB

To use RprobitB, follow these steps:

  1. Specify the model, see below for details.
  2. Run RprobitB::rpb(<list of model specifications>).
  3. You get on-screen information and model results in an output folder.

LCMMNP model

Assume that we observe the choices of \(N\) decision makers which decide between \(J\) alternatives at each of \(T\) choice occasions.1 Specific to each decision maker, alternative and choice occasion, we furthermore observe \(P_f+P_r\) choice attributes that we use to explain the choices. The first \(P_f\) attributes are connected to fixed coefficients, the other \(P_r\) attributes to random coefficients following a joint distribution mixed across decision makers. Person \(n\)’s utility \(\tilde{U}_{ntj}\) for alternative \(j\) at choice occasion \(t\) is modelled as \[\begin{equation} \tilde{U}_{ntj} = \tilde{W}_{ntj}'\alpha + \tilde{X}_{ntj}'\beta_n + \tilde{\epsilon}_{ntj} \end{equation}\] for \(n=1,\dots,N\), \(t=1,\dots,T\) and \(j=1,\dots,J\), where

As is well known, any utility model needs to be normalized with respect to level and scale in order to be identified. Therefore, we consider the transformed model \[\begin{equation} U_{ntj} = W_{ntj}'\alpha + X_{ntj}'\beta_n + \epsilon_{ntj}, \end{equation}\] \(n=1,\dots,N\), \(t=1,\dots,T\) and \(j=1,\dots,J-1\), where (choosing \(J\) as the reference alternative) \(U_{ntj}=\tilde{U}_{ntj} - \tilde{U}_{ntJ}\), \(W_{ntj}=\tilde{W}_{ntj}-\tilde{W}_{ntJ}\), \(X_{ntj}=\tilde{X}_{ntj}-\tilde{X}_{ntJ}\) and \(\epsilon_{ntj}=\tilde{\epsilon}_{ntj}-\tilde{\epsilon}_{ntJ}\), where \((\epsilon_{nt:}) = (\epsilon_{nt1},...,\epsilon_{nt(J-1)})' \sim \text{MVN}_{J-1} (0,\Sigma)\) and \(\Sigma\) denotes a covariance matrix with the top-left element restricted to one. While taking utility differences in order to normalize the model with respect to level is a standard procedure, alternatives to fixing an error term variance in order to normalize with respect to scale exist, for example fixing an element of \(\alpha\).

Let \(y_{nt}=j\) denote the event that decision maker \(n\) chooses alternative \(j\) at choice occasion \(t\). Assuming utility maximizing behaviour of the decision makers, the decisions are linked to the utilities via \[\begin{equation} y_{nt} = \sum_{j=1}^{J-1} j\cdot 1 \left (U_{ntj}=\max_i U_{nti}>0 \right) + J \cdot 1\left (U_{ntj}<0 ~\text{for all}~j\right), \end{equation}\] where \(1(A)\) equals \(1\) if condition \(A\) is true and \(0\) else.

We approximate the mixing distribution \(g_{P_r}\) for the random coefficients \(\beta=(\beta_n)_{n}\) by a mixture of \(P_r\)-variate normal densities \(\phi_{P_r}\) with mean vectors \(b=(b_c)_{c}\) and covariance matrices \(\Omega=(\Omega_c)_{c}\) using \(C\) components, i.e. \[\begin{equation} \beta_n\mid b,\Omega \sim \sum_{c=1}^{C} s_c \phi_{P_r} (\cdot \mid b_c,\Omega_c), \end{equation}\] where \((s_c)_{c}\) are weights satisfying \(0 < s_c\leq 1\) for \(c=1,\dots,C\) and \(\sum_c s_c=1\). One interpretation of the latent class model is obtained by introducing variables \(z=(z_n)_n\) allocating each decision maker \(n\) to class \(c\) with probability \(s_c\), i.e. \[\begin{equation} \text{Prob}(z_n=c)=s_c \quad \text{and} \quad \beta_n \mid z,b,\Omega \sim \phi_{P_r}(\cdot \mid b_{z_n},\Omega_{z_n}). \end{equation}\] We call this model the latent class mixed multinomial probit (LCMMNP) model.2

Latent class updating scheme

Updating the number \(C\) of latent classes is done within the algorithm by executing the following weight-based updating scheme.

Specifying a LCMMNP model in RprobitB

RprobitB specifications are grouped in the named lists

You can either specify none, all, or selected parameters. Unspecified parameters are set to default values.



If data = NULL, data is simulated from the model defined by model and parm.

To model empirical data, specify





A priori, parm[["alpha"]] ~ Normal(eta,Psi) with

A priori, parm[["s"]] ~ Dirichlet(delta) with

A priori, parm[["b"]][,c] ~ Normal(xi,D) with

A priori, matrix(parm[["Omega"]][,c],nrow=model[["P_r"]],ncol=model[["P_r"]]) ~ Inverse_Wishart(nu,Theta) with

A priori, parm[["Sigma"]] ~ Inverse_Wishart(kappa,E) with



RprobitB automatically normalizes with respect to level by computing utility differences, where model[["J"]] is the base alternative. The normalization with respect to scale can be specified:


Default specifications of RprobitB





Per default, parameters are randomly drawn.







Example: Simulated data

The code below fits a mixed multinomial probit model with

to simulated data with

The number of latent classes is updated, because do_lcus = TRUE is set. The Gibbs sampler draws R = 20000 samples. By default, the model is named id = "test" and results are saved in rdir = "tempdir()" (the path of the per-session temporary directory).


### model specification
model = list("N" = 100, "T" = sample(10:20,100,replace=TRUE), "J" = 3, "P_f" = 1, "P_r" = 2, "C" = 2)
lcus  = list("do_lcus" = TRUE)
mcmc  = list("R" = 20000)

### start estimation (about 3 minutes computation time)
RprobitB::rpb("model" = model, "lcus" = lcus, "mcmc" = mcmc)

Example: Empirical data

The code below fits a mixed multinomial probit model with P_f = 2 fixed and P_r = 2 random coefficients to the “Train” dataset of the mlogit package with

For normalization, the price coefficient ("parameter" = "a", "index" = 1) is fixed to "value" = -1.

### load Train data
data("Train", package = "mlogit")

### model specification
model = list("P_f" = 2, "P_r" = 2)
data  = list("data_raw" = Train, "cov_col" = 4:11, "cov_ord" = c("price","comfort","time","change"), "cov_zst" = TRUE)
lcus  = list("do_lcus" = TRUE)
mcmc  = list("R" = 1e5)
norm  = list("parameter" = "a", "index" = 1, "value" = -1)
out   = list("id" = "train", "pp" = TRUE)

### start estimation (about 20 minutes computation time)
RprobitB::rpb("model" = model, "data" = data, "lcus" = lcus, "mcmc" = mcmc, "norm" = norm, "out" = out)

On-screen information

During estimation, you get the following on-screen information:

Model results

In the output folder out[["rdir"]]/out[["id"]], you can find the files


Albert, James H., and Siddhartha Chib. 1993. “Bayesian Analysis of Binary and Polychotomous Response Data.” Journal of the American Statistical Association 88.
Allenby, Greg M., and Peter Rossi. 1998. “Marketing Models of Consumer Heterogeneity.” Journal of Econometrics 89.
Bauer, Dietmar, Sebastian Büscher, and Manuel Batram. 2019. “Non-Parameteric Estiation of Mixed Discrete Choice Models.” Second International Choice Modelling Conference in Kobe.
Croissant, Yves. 2020. “Estimation of Random Utility Models in R: The mlogit Package.” Journal of Statistical Software 95 (11): 1–41.
Geweke, John. 1998. “Efficient Simulation from the Multivariate Normal and Student-t Distributions Subject to Linear Constraints and the Evaluation of Constraint Probabilities.” Comput. Sci. Statist. 23.
Imai, Kosuke, and David A. van Dyk. 2005. “A Bayesian Analysis of the Multinomial Probit Model Using Marginal Data Augmentation.” Journal of Econometrics 124.
McCulloch, Robert, and Peter Rossi. 1994. “An Exact Likelihood Analysis of the Multinomial Probit Model.” Journal of Econometrics 64.
Mori, Harunori. 2014. “Bayes Estimation in the Hierarchical Multinomial Probit Model.” Journal of the Japan Statistical Society 44.
Nobile, Agostino. 1998. “A Hybrid Markov Chain for the Bayesian Analysis of the Multinomial Probit Model.” Statistics and Computing 8.
Oelschläger, Lennart, and Dietmar Bauer. 2020. “Bayes Estimation of Latent Class Mixed Multinomial Probit Models.” TRB Annual Meeting 2021.
Rossi, Peter. 2019. “Bayesm: Bayesian Inference for Marketing/Micro-Econometrics.”
Scaccia, Luisa, and Edoardo Marcucci. 2010. “Bayesian Flexible Modelling of Mixed Logit Models.” Proceedings from the 19th International Conference on Computational Statistics.
Train, Kenneth. 2009. Discrete Choice Methods with Simulation. 2. ed. Cambridge Univ. Press.

  1. For notational simplicity, the number of choice occasions \(T\) is assumend to be the same for each decision maker here. However, RprobitB allows for a different number of choice occasions for each decision maker.↩︎

  2. We note that the model collapses to the (normally) mixed multinomial probit model if \(P_r>0\) and \(C=1\) and to the basic multinomial probit model if \(P_r=0\).↩︎

  3. The wide data format presents each different covariate in a separate column. See the vignette of the mlogit package for an example.↩︎