The purpose of this vignette is to discuss the parameterizations of the families (i.e., response distributions) used in **brms**. For a more general overview of the package see `vignette("brms_overview")`

.

Throughout this vignette, we use \(y\) to refer to the response variable and \(\eta\) to refer to the (non-)linear predictor (see `help(brmsformula)`

for details on supported predictor terms). We write \(y_i\) and \(\eta_i\) for the response and linear predictor of observation \(i\). Furthermore, we use \(f\) for a density function, \(g\) for an inverse link function, and denote \(\mu = g(\eta)\).

The density of the **gaussian** family is given by \[
f(y_i) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left(-\frac{1}{2}\left(\frac{y_i - \mu_i}{\sigma}\right)^2\right)
\] where \(\sigma\) is the residual standard deviation. The density of the **student** family is given by \[
f(y_i) = \frac{\Gamma((\nu + 1)/2)}{\Gamma(\nu/2)} \frac{1}{\sqrt{\nu\pi}\sigma}
\left(1 + \frac{1}{\nu} \left(\frac{y_i - \mu_i}{\sigma}\right)^2\right)^{-(\nu+1)/2}
\] \(\Gamma\) denotes the gamma function and \(\nu > 1\) are the degrees of freedom. As \(\nu \rightarrow \infty\), the student distribution becomes the gaussian distribution. The density of the *skew_normal* family is given by \[
f(y_i) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left(-\frac{1}{2}\left(\frac{y_i - \xi_i}{\omega}\right)^2\right) \left(1 + \text{erf} \left( \alpha \left(\frac{y_i - \xi_i}{\omega \sqrt{2}} \right) \right) \right)
\] where \(\xi\) is the location parameter, \(\omega\) is the positive scale parameter, \(\alpha\) the skewness parameter, and \(\text{erf}\) denotes the error function of the gaussian distribution. To parameterize the skew-normal distribution in terms of the mean \(\mu\) and standard deviation \(\sigma\), \(\omega\) and \(\xi\) are computed as \[
\omega = \frac{\sigma}{\sqrt{1 - \frac{2}{\pi} \frac{\alpha^2}{1 + \alpha^2}}}
\] \[
\xi_i = \mu_i - \omega \frac{\alpha}{\sqrt{1 + \alpha^2}} \sqrt{\frac{2}{\pi}}
\] If \(\alpha = 0\), the skew-normal distribution becomes the gaussian distribution. For location shift models, \(y_i\) can be any real value.

The density of the **binomial** family is given by \[
f(y_i) = {N_i \choose y_i} \mu_i^{y_i} (1-\mu_i)^{N_i - y_i}
\] where \(N_i\) is the number of trials and \(y_i \in \{0, ... , N_i\}\). When all \(N_i\) are \(1\) (i.e., \(y_i \in \{0,1\}\)), the bernoulli distribution for binary data arises. **binomial** and **bernoulli** families are distinguished in **brms** as the bernoulli distribution has its own implementation in **Stan** that is computationlly more efficient.

For \(y_i \in \mathbb{N}_0\), the density of the **poisson** family is given by \[
f(y_i) = \frac{\mu_i^{y_i}}{y_i!} exp(-\mu_i)
\] The density of the **negative binomial** family is \[
f(y_i) = {y_i + \phi - 1 \choose y_i} \left(\frac{\mu_i}{\mu_i + \phi}\right)^{y_i}
\left(\frac{\phi}{\mu_i + \phi}\right)^\phi
\] where \(\phi\) is a positive precision parameter. For \(\phi \rightarrow \infty\), the negative binomial distribution becomes the poisson distribution. The density of the **geometric** family arises if \(\phi\) is set to \(1\).

With survival models we mean all models that are defined on the positive reals only, that is \(y_i \in \mathbb{R}^+\). The density of the **lognormal** family is given by \[
f(y_i) = \frac{1}{\sqrt{2\pi}\sigma x} \exp\left(-\frac{1}{2}\left(\frac{\log(y_i) - \mu_i}{\sigma}\right)^2\right)
\] where \(\sigma\) is the residual standard deviation on the log-scale. The density of the **Gamma** family is given by \[
f(y_i) = \frac{(\alpha / \mu_i)^\alpha}{\Gamma(\alpha)} y_i^{\alpha-1}
\exp\left(-\frac{\alpha y_i}{\mu_i}\right)
\] where \(\alpha\) is a positive shape parameter. The density of the **weibull** family is given by \[
f(y_i) = \frac{\alpha}{g(\eta_i / \alpha)} \left(\frac{y_i}{g(\eta_i / \alpha)}\right)^{\alpha-1}
\exp\left(-\left(\frac{y_i}{g(\eta_i / \alpha)}\right)^\alpha\right)
\] where \(\alpha\) is again a positive shape parameter. The **exponential** family arises if \(\alpha\) is set to \(1\) for either the gamma or Weibull distribution. The density of the **inverse.gaussian** family is given by \[
f(y_i) = \left(\frac{\alpha}{2 \pi y_i^3}\right)^{1/2} \exp \left(\frac{-\alpha (y_i - \mu_i)^2}{2 \mu_i^2 y_i} \right)
\] where \(\alpha\) is a positive shape parameter.

Modeling extremes requires special distributions. One may use the **weibull** distribution (see above) or the **frechet** distribution with density \[
f(y_i) = \frac{\nu}{s_i} \left(\frac{y_i}{s_i}\right)^{-1-\nu} \exp\left(-\left(\frac{y_i}{s_i}\right)^{-\nu}\right)
\] where \(s_i = \mu_i / \Gamma(1 - 1 / \nu)\) is a positive scale parameter and \(\nu > 1\) is a shape parameter so that \(\mu_i\) predicts the mean of the Frechet distribution. A generalization of both distributions is the generalized extreme value distribution (family **gen_extreme_value**) with density \[
f(y_i) = \frac{1}{\sigma} t(y_i)^{-1 - 1 / \xi} \exp(-t(y_i))
\] where \[
t(y_i) = \left(1 + \xi \left(\frac{y_i - \mu_i}{\sigma} \right)\right)^{-1 / \xi}
\] with positive scale parameter \(\sigma\) and shape parameter \(\xi\).

One family that is especially suited to model reaction times is the **exgaussian** (‘exponentially modified Gaussian’) family. Its density is given by

\[ f(y_i) = \frac{1}{2 \beta} \exp\left(\frac{1}{2 \beta} \left(2\mu_i + \sigma^2 / \beta - 2 y_i \right) \right) \text{erfc}\left(\frac{\mu_i + \sigma^2 / \beta - y_i}{\sqrt{2} \sigma} \right) \] where \(\beta\) is the scale (inverse rate) of the exponential component, \(\mu_i = \mu\) is the mean of the Gaussian componenent, \(\sigma\) is the standard deviation of the Gaussian component, and \(\text{erfc}\) is the complementary error function.

A family concerned with the combined modelling of reaction times and corresponding binary responses is the **wiener** diffusion model. It has four model parameters each with a natural interpreation. The parameter \(\alpha > 0\) describes the separation between two boundaries of the diffusion process, \(\tau > 0\) describes the non-decision time (e.g., due to image or motor processing), \(\beta \in [0, 1]\) describes the initial bias in favor of the upper alternative, and \(\delta \in \mathbb{R}\) describes the drift rate to the boundaries (a positive value indicates a drift towards to upper boundary). The density for the reaction time at the upper boundary is given by

\[ f(y_i) = \frac{\alpha}{(y_i-\tau)^3/2} \exp \! \left(- \delta \alpha \beta - \frac{\delta^2(y_i-\tau)}{2}\right) \sum_{k = - \infty}^{\infty} (2k + \beta) \phi \! \left(\frac{2k + \alpha \beta}{\sqrt{y_i - \tau}}\right) \]

where \(\phi(x)\) denotes the standard normal density function. The density at the lower boundary can be obtained by substituting \(1 - \beta\) for \(\beta\) and \(-\delta\) for \(\delta\) in the above equation. In **brms** the parameters \(\alpha\), \(\tau\), and \(\beta\) are modeled as auxiliary parameters named *bs* (‘boundary separation’), *ndt* (‘non-decision time’), and *bias* respectively, whereas the drift rate \(\delta\) is modeled via the ordinary model formula that is as \(\delta = \mu_i\).

Quantile regression is implemented via family **asym_laplace** (asymmetric Laplace distribution) with density

\[
f(y_i) = \frac{p (1 - p)}{\sigma} \exp\left(-\rho_p\left(\frac{y_i - \mu_i}{\sigma}\right)\right)
\] where \(\rho_p\) is given by \(\rho_p(x) = x (p - I_{x < 0})\) and \(I_A\) is the indicator function of set \(A\). The parameter \(\sigma\) is a positive scale parameter and \(p\) is the *quantile* parameter taking on values in \((0, 1)\). For this distribution, we have \(P(Y < g(\eta)) = p\). Thus, quantile regression can be performed by fixing \(p\) to the quantile to interest.

The density of the **Beta** family for \(y_i \in (0,1)\) is given by \[
f(y_i) = \frac{y_i^{\mu_i \phi - 1} (1-y_i)^{(1-\mu_i) \phi-1}}{B(\mu_i \phi, (1-\mu_i) \phi)}
\] where \(B\) is the beta function and \(\phi\) is a positive precision parameter.

The density of the **von_mises** family for \(y_i \in (-\pi,\pi)\) is given by \[
f(y_i) = \frac{\exp(\kappa \cos(y_i - \mu_i))}{2\pi I_0(\kappa)}
\] where \(I_0\) is the modified Bessel function of order 0 and \(\kappa\) is a positive precision parameter.

For ordinal and categorical models, \(y_i\) is one of the categories \(1, ..., K\). The intercepts of ordinal models are called thresholds and are denoted as \(\tau_k\), with \(k \in \{1, ..., K-1\}\), whereas \(\eta\) does not contain a fixed effects intercept. Note that the applied link functions \(h\) are technically distribution functions \(\mathbb{R} \rightarrow [0,1]\). The density of the **cumulative** family (implementing the most basic ordinal model) is given by \[
f(y_i) = g(\tau_{y_i + 1} - \eta_i) - g(\tau_{y_i} - \eta_i)
\] The densities of the **sratio** (stopping ratio) and **cratio** (continuation ratio) families are given by \[
f(y_i) = g(\tau_{y_i + 1} - \eta_i) \prod_{k = 1}^{y_i} (1 - g(\tau_{k} - \eta_i))
\] and \[
f(y_i) = (1 - g(\eta_i - \tau_{y_i + 1})) \prod_{k = 1}^{y_i} g(\eta_i - \tau_{k})
\] respectively. Note that both families are equivalent for symmetric link functions such as logit or probit. The density of the **acat** (adjacent category) family is given by \[
f(y_i) = \frac{\prod_{k=1}^{y_i} g(\eta_i - \tau_{k})
\prod_{k=y_i+1}^K(1-g(\eta_i - \tau_{k}))}{\sum_{k=0}^K\prod_{j=1}^k g(\eta_i-\tau_{j})
\prod_{j=k+1}^K(1-g(\eta_i - \tau_{j}))}
\] For the logit link, this can be simplified to \[
f(y_i) = \frac{\exp \left(\sum_{k=1}^{y_i} (\eta_i - \tau_{k}) \right)}
{\sum_{k=0}^K \exp\left(\sum_{j=1}^k (\eta_i - \tau_{j}) \right)}
\] The linear predictor \(\eta\) can be generalized to also depend on the category \(k\) for a subset of predictors. This leads to so called category specific effects (for details on how to specify them see `help(brm)`

). Note that **cumulative** and **sratio** models use \(\tau - \eta\), whereas **cratio** and **acat** use \(\eta - \tau\). This is done to ensure that larger values of \(\eta\) increase the probability of *higher* reponse categories.

The **categorical** family is currently only implemented with the logit link function and has density \[
f(y_i) = \frac{\exp(\eta_{iy_i})}{\sum_{k = 1}^{K} \exp(\eta_{ik})}
\] Note that \(\eta\) does also depend on the category \(k\). For reasons of identifiability, \(\eta_{i1}\) is set to \(0\).

**Zero-inflated** and **hurdle** families extend existing families by adding special processes for responses that are zero. The densitiy of a **zero-inflated** family is given by \[
f_z(y_i) = z + (1 - z) f(0) \quad \text{if } y_i = 0 \\
f_z(y_i) = (1 - z) f(y_i) \quad \text{if } y_i > 0
\] where \(z\) denotes the zero-inflation probability. Currently implemented families are **zero_inflated_poisson**, **zero_inflated_binomial**, **zero_inflated_negbinomial**, and **zero_inflated_beta**. The density of a **hurdle** family is given by \[
f_z(y_i) = z \quad \text{if } y_i = 0 \\
f_z(y_i) = (1 - z) f(y_i) / (1 - f(0)) \quad \text{if } y_i > 0
\] Currently implemented families are **hurdle_poisson**, **hurdle_negbinomial**, **hurdle_gamma**, and **hurdle_lognormal**. The density of a **zero-one-inflated** family is given by \[
f_{\alpha, \gamma}(y_i) = \alpha (1 - \gamma) \quad \text{if } y_i = 0 \\
f_{\alpha, \gamma}(y_i) = \alpha \gamma \quad \text{if } y_i = 1 \\
f_{\alpha, \gamma}(y_i) = (1 - \alpha) f(y_i) \quad \text{if } y_i \notin \{0, 1\}
\] where \(\alpha\) is the zero-one-inflation probability (i.e. the probability that zero or one occurs) and \(\gamma\) is the conditional one-inflation probability (i.e. the probability that one occurs rather than zero). Currently implemented families are **zero_one_inflated_beta**.