The purpose of this vignette is to discuss the parameterizations of
the families (i.e., response distributions) used in brms. For a more
general overview of the package see
`vignette("brms_overview")`

.

Throughout this vignette, we denote values of the response variable
as \(y\), a density function as \(f\), and use \(\mu\) to refer to the main model parameter,
which is usually the mean of the response distribution or some closely
related quantity. In a regression framework, \(\mu\) is not estimated directly but
computed as \(\mu = g(\eta)\), where
\(\eta\) is a predictor term (see
`help(brmsformula)`

for details) and \(g\) is the response function (i.e., inverse
of the link function).

The density of the **gaussian** family is given by \[
f(y) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left(-\frac{1}{2}\left(\frac{y -
\mu}{\sigma}\right)^2\right)
\]

where \(\sigma\) is the residual
standard deviation. The density of the **student** family
is given by \[
f(y) = \frac{\Gamma((\nu + 1)/2)}{\Gamma(\nu/2)}
\frac{1}{\sqrt{\nu\pi}\sigma}\left(1 + \frac{1}{\nu} \left(\frac{y -
\mu}{\sigma}\right)^2\right)^{-(\nu+1)/2}
\]

\(\Gamma\) denotes the gamma
function and \(\nu > 1\) are the
degrees of freedom. As \(\nu \rightarrow
\infty\), the student distribution becomes the gaussian
distribution. The density of the **skew_normal** family is
given by \[
f(y) = \frac{1}{\sqrt{2\pi}\omega}
\exp\left(-\frac{1}{2} \left(\frac{y - \xi}{\omega}\right)^2 \right)
\left(1 + \text{erf} \left( \alpha \left(\frac{y - \xi}{\omega \sqrt{2}}
\right) \right) \right)
\]

where \(\xi\) is the location parameter, \(\omega\) is the positive scale parameter, \(\alpha\) the skewness parameter, and \(\text{erf}\) denotes the error function of the gaussian distribution. To parameterize the skew-normal distribution in terms of the mean \(\mu\) and standard deviation \(\sigma\), \(\omega\) and \(\xi\) are computed as \[ \omega = \frac{\sigma}{\sqrt{1 - \frac{2}{\pi} \frac{\alpha^2}{1 + \alpha^2}}} \]

\[ \xi = \mu - \omega \frac{\alpha}{\sqrt{1 + \alpha^2}} \sqrt{\frac{2}{\pi}} \]

If \(\alpha = 0\), the skew-normal distribution becomes the gaussian distribution. For location shift models, \(y\) can be any real value.

The density of the **binomial** family is given by \[
f(y) = {N \choose y} \mu^{y} (1-\mu)^{N - y}
\] where \(N\) is the number of
trials and \(y \in \{0, ... , N\}\).
When all \(N\) are \(1\) (i.e., \(y
\in \{0,1\}\)), the **bernoulli** distribution for
binary data arises.

For \(y \in \mathbb{N}_0\), the
density of the **poisson** family is given by \[
f(y) = \frac{\mu^{y}}{y!} \exp(-\mu)
\] The density of the **negbinomial** (negative
binomial) family is \[
f(y) = {y + \phi - 1 \choose y} \left(\frac{\mu}{\mu + \phi}\right)^{y}
\left(\frac{\phi}{\mu + \phi}\right)^\phi
\] where \(\phi\) is a positive
precision parameter. For \(\phi \rightarrow
\infty\), the negative binomial distribution becomes the poisson
distribution. The density of the **geometric** family
arises if \(\phi\) is set to \(1\).

With time-to-event models we mean all models that are defined on the
positive reals only, that is \(y \in
\mathbb{R}^+\). The density of the **lognormal**
family is given by \[
f(y) = \frac{1}{\sqrt{2\pi}\sigma y}
\exp\left(-\frac{1}{2}\left(\frac{\log(y) - \mu}{\sigma}\right)^2\right)
\] where \(\sigma\) is the
residual standard deviation on the log-scale. The density of the
**Gamma** family is given by \[
f(y) = \frac{(\alpha / \mu)^\alpha}{\Gamma(\alpha)} y^{\alpha-1}
\exp\left(-\frac{\alpha y}{\mu}\right)
\] where \(\alpha\) is a
positive shape parameter. The density of the **weibull**
family is given by \[
f(y) = \frac{\alpha}{s} \left(\frac{y}{s}\right)^{\alpha-1}
\exp\left(-\left(\frac{y}{s}\right)^\alpha\right)
\] where \(\alpha\) is again a
positive shape parameter and \(s = \mu /
\Gamma(1 + 1 / \alpha)\) is the scale parameter to that \(\mu\) is the mean of the distribution. The
**exponential** family arises if \(\alpha\) is set to \(1\) for either the gamma or Weibull
distribution. The density of the **inverse.gaussian**
family is given by \[
f(y) = \left(\frac{\alpha}{2 \pi y^3}\right)^{1/2} \exp
\left(\frac{-\alpha (y - \mu)^2}{2 \mu^2 y} \right)
\] where \(\alpha\) is a
positive shape parameter. The **cox** family implements Cox
proportional hazards model which assumes a hazard function of the form
\(h(y) = h_0(y) \mu\) with baseline
hazard \(h_0(y)\) expressed via
M-splines (which integrate to I-splines) in order to ensure
monotonicity. The density of the cox model is then given by \[
f(y) = h(y) S(y)
\] where \(S(y)\) is the
survival function implied by \(h(y)\).

Modeling extremes requires special distributions. One may use the
**weibull** distribution (see above) or the
**frechet** distribution with density \[
f(y) = \frac{\nu}{s} \left(\frac{y}{s}\right)^{-1-\nu}
\exp\left(-\left(\frac{y}{s}\right)^{-\nu}\right)
\] where \(s = \mu / \Gamma(1 - 1 /
\nu)\) is a positive scale parameter and \(\nu > 1\) is a shape parameter so that
\(\mu\) predicts the mean of the
Frechet distribution. A generalization of both distributions is the
generalized extreme value distribution (family
**gen_extreme_value**) with density \[
f(y) = \frac{1}{\sigma} t(y)^{\xi + 1} \exp(-t(y))
\] where \[
t(y) = \left(1 + \xi \left(\frac{y - \mu}{\sigma} \right)\right)^{-1 /
\xi}
\] with positive scale parameter \(\sigma\) and shape parameter \(\xi\).

One family that is especially suited to model reaction times is the
**exgaussian** (‘exponentially modified Gaussian’) family.
Its density is given by

\[ f(y) = \frac{1}{2 \beta} \exp\left(\frac{1}{2 \beta} \left(2\xi + \sigma^2 / \beta - 2 y \right) \right) \text{erfc}\left(\frac{\xi + \sigma^2 / \beta - y}{\sqrt{2} \sigma} \right) \] where \(\beta\) is the scale (inverse rate) of the exponential component, \(\xi\) is the mean of the Gaussian component, \(\sigma\) is the standard deviation of the Gaussian component, and \(\text{erfc}\) is the complementary error function. We parameterize \(\mu = \xi + \beta\) so that the main predictor term equals the mean of the distribution.

Another family well suited for modeling response times is the
**shifted_lognormal** distribution. It’s density equals
that of the **lognormal** distribution except that the
whole distribution is shifted to the right by a positive parameter
called *ndt* (for consistency with the **wiener**
diffusion model explained below).

A family concerned with the combined modeling of reaction times and
corresponding binary responses is the **wiener** diffusion
model. It has four model parameters each with a natural interpretation.
The parameter \(\alpha > 0\)
describes the separation between two boundaries of the diffusion
process, \(\tau > 0\) describes the
non-decision time (e.g., due to image or motor processing), \(\beta \in [0, 1]\) describes the initial
bias in favor of the upper alternative, and \(\delta \in \mathbb{R}\) describes the drift
rate to the boundaries (a positive value indicates a drift towards to
upper boundary). The density for the reaction time at the upper boundary
is given by

\[ f(y) = \frac{\alpha}{(y-\tau)^3/2} \exp \! \left(- \delta \alpha \beta - \frac{\delta^2(y-\tau)}{2}\right) \sum_{k = - \infty}^{\infty} (2k + \beta) \phi \! \left(\frac{2k + \alpha \beta}{\sqrt{y - \tau}}\right) \]

where \(\phi(x)\) denotes the
standard normal density function. The density at the lower boundary can
be obtained by substituting \(1 -
\beta\) for \(\beta\) and \(-\delta\) for \(\delta\) in the above equation. In brms the
parameters \(\alpha\), \(\tau\), and \(\beta\) are modeled as auxiliary parameters
named *bs* (‘boundary separation’), *ndt* (‘non-decision
time’), and *bias* respectively, whereas the drift rate \(\delta\) is modeled via the ordinary model
formula that is as \(\delta =
\mu\).

Quantile regression is implemented via family
**asym_laplace** (asymmetric Laplace distribution) with
density

\[
f(y) = \frac{p (1 - p)}{\sigma} \exp\left(-\rho_p\left(\frac{y -
\mu}{\sigma}\right)\right)
\] where \(\rho_p\) is given by
\(\rho_p(x) = x (p - I_{x < 0})\)
and \(I_A\) is the indicator function
of set \(A\). The parameter \(\sigma\) is a positive scale parameter and
\(p\) is the *quantile*
parameter taking on values in \((0,
1)\). For this distribution, we have \(P(Y < g(\eta)) = p\). Thus, quantile
regression can be performed by fixing \(p\) to the quantile to interest.

The density of the **Beta** family for \(y \in (0,1)\) is given by \[
f(y) = \frac{y^{\mu \phi - 1} (1-y)^{(1-\mu) \phi-1}}{B(\mu \phi,
(1-\mu) \phi)}
\] where \(B\) is the beta
function and \(\phi\) is a positive
precision parameter. A multivariate generalization of the
**Beta** family is the **dirichlet** family
with density \[
f(y) = \frac{1}{B((\mu_{1}, \ldots, \mu_{K}) \phi)}
\prod_{k=1}^K y_{k}^{\mu_{k} \phi - 1}.
\] The **dirichlet** family is implemented with the
multivariate logit link function so that \[
\mu_{j} = \frac{\exp(\eta_{j})}{\sum_{k = 1}^{K} \exp(\eta_{k})}
\] For reasons of identifiability, \(\eta_{\rm ref}\) is set to \(0\), where \({\rm
ref}\) is one of the response categories chosen as reference.

An alternative to the **dirichlet** family is the
**logistic_normal** family with density \[
f(y) = \frac{1}{\prod_{k=1}^K y_k} \times
\text{multivariate_normal}(\tilde{y} \, | \, \mu, \sigma, \Omega)
\] where \(\tilde{y}\) is the
multivariate logit transformed response \[
\tilde{y} = (\log(y_1 / y_{\rm ref}), \ldots, \log(y_{\rm ref-1} /
y_{\rm ref}),
\log(y_{\rm ref+1} / y_{\rm ref}), \ldots, \log(y_K /
y_{\rm ref}))
\] of dimension \(K-1\)
(excluding the reference category), which is modeled as multivariate
normally distributed with latent mean and standard deviation vectors
\(\mu\) and \(\sigma\), as well as correlation matrix
\(\Omega\).

The density of the **von_mises** family for \(y \in (-\pi,\pi)\) is given by \[
f(y) = \frac{\exp(\kappa \cos(y - \mu))}{2\pi I_0(\kappa)}
\] where \(I_0\) is the modified
Bessel function of order 0 and \(\kappa\) is a positive precision
parameter.

For ordinal and categorical models, \(y\) is one of the categories \(1, ..., K\). The intercepts of ordinal
models are called thresholds and are denoted as \(\tau_k\), with \(k \in \{1, ..., K-1\}\), whereas \(\eta\) does not contain a fixed effects
intercept. Note that the applied link functions \(h\) are technically distribution functions
\(\mathbb{R} \rightarrow [0,1]\). The
density of the **cumulative** family (implementing the most
basic ordinal model) is given by \[
f(y) = g(\tau_{y + 1} - \eta) - g(\tau_{y} - \eta)
\]

The densities of the **sratio** (stopping ratio) and
**cratio** (continuation ratio) families are given by \[
f(y) = g(\tau_{y + 1} - \eta) \prod_{k = 1}^{y} (1 - g(\tau_{k} -
\eta))
\] and \[
f(y) = (1 - g(\eta - \tau_{y + 1})) \prod_{k = 1}^{y} g(\eta - \tau_{k})
\]

respectively. Note that both families are equivalent for symmetric
link functions such as logit or probit. The density of the
**acat** (adjacent category) family is given by \[
f(y) = \frac{\prod_{k=1}^{y} g(\eta - \tau_{k})
\prod_{k=y+1}^K(1-g(\eta - \tau_{k}))}{\sum_{k=0}^K\prod_{j=1}^k
g(\eta-\tau_{j})
\prod_{j=k+1}^K(1-g(\eta - \tau_{j}))}
\] For the logit link, this can be simplified to \[
f(y) = \frac{\exp \left(\sum_{k=1}^{y} (\eta - \tau_{k}) \right)}
{\sum_{k=0}^K \exp\left(\sum_{j=1}^k (\eta - \tau_{j}) \right)}
\] The linear predictor \(\eta\)
can be generalized to also depend on the category \(k\) for a subset of predictors. This leads
to category specific effects (for details on how to specify them see
`help(brm)`

). Note that **cumulative** and
**sratio** models use \(\tau -
\eta\), whereas **cratio** and **acat**
use \(\eta - \tau\). This is done to
ensure that larger values of \(\eta\)
increase the probability of *higher* response categories.

The **categorical** family is currently only implemented
with the multivariate logit link function and has density \[
f(y) = \mu_{y} = \frac{\exp(\eta_{y})}{\sum_{k = 1}^{K} \exp(\eta_{k})}
\] Note that \(\eta\) does also
depend on the category \(k\). For
reasons of identifiability, \(\eta_{1}\) is set to \(0\). A generalization of the
**categorical** family to more than one trial is the
**multinomial** family with density \[
f(y) = {N \choose y_{1}, y_{2}, \ldots, y_{K}}
\prod_{k=1}^K \mu_{k}^{y_{k}}
\] where, for each category, \(\mu_{k}\) is estimated via the multivariate
logit link function shown above.

**Zero-inflated** and **hurdle** families
extend existing families by adding special processes for responses that
are zero. The density of a **zero-inflated** family is
given by \[
f_z(y) = z + (1 - z) f(0) \quad \text{if } y = 0 \\
f_z(y) = (1 - z) f(y) \quad \text{if } y > 0
\] where \(z\) denotes the
zero-inflation probability. Currently implemented families are
**zero_inflated_poisson**,
**zero_inflated_binomial**,
**zero_inflated_negbinomial**, and
**zero_inflated_beta**.

The density of a **hurdle** family is given by \[
f_z(y) = z \quad \text{if } y = 0 \\
f_z(y) = (1 - z) f(y) / (1 - f(0)) \quad \text{if } y > 0
\] Currently implemented families are
**hurdle_poisson**, **hurdle_negbinomial**,
**hurdle_gamma**, and
**hurdle_lognormal**.

The density of a **zero-one-inflated** family is given
by \[
f_{\alpha, \gamma}(y) = \alpha (1 - \gamma) \quad \text{if } y = 0 \\
f_{\alpha, \gamma}(y) = \alpha \gamma \quad \text{if } y = 1 \\
f_{\alpha, \gamma}(y) = (1 - \alpha) f(y) \quad \text{if } y \notin \{0,
1\}
\] where \(\alpha\) is the
zero-one-inflation probability (i.e. the probability that zero or one
occurs) and \(\gamma\) is the
conditional one-inflation probability (i.e. the probability that one
occurs rather than zero). Currently implemented families are
**zero_one_inflated_beta**.