# Parameterization of Response Distributions in brms

#### 2021-03-11

The purpose of this vignette is to discuss the parameterizations of the families (i.e., response distributions) used in brms. For a more general overview of the package see vignette("brms_overview").

## Notation

Throughout this vignette, we denote values of the response variable as $$y$$, a density function as $$f$$, and use $$\mu$$ to refer to the main model parameter, which is usually the mean of the response distribution or some closely related quantity. In a regression framework, $$\mu$$ is not estimated directly but computed as $$\mu = g(\eta)$$, where $$\eta$$ is a predictor term (see help(brmsformula) for details) and $$g$$ is the response function (i.e., inverse of the link function).

## Location shift models

The density of the gaussian family is given by $f(y) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left(-\frac{1}{2}\left(\frac{y - \mu}{\sigma}\right)^2\right)$

where $$\sigma$$ is the residual standard deviation. The density of the student family is given by $f(y) = \frac{\Gamma((\nu + 1)/2)}{\Gamma(\nu/2)} \frac{1}{\sqrt{\nu\pi}\sigma}\left(1 + \frac{1}{\nu} \left(\frac{y - \mu}{\sigma}\right)^2\right)^{-(\nu+1)/2}$

$$\Gamma$$ denotes the gamma function and $$\nu > 1$$ are the degrees of freedom. As $$\nu \rightarrow \infty$$, the student distribution becomes the gaussian distribution. The density of the skew_normal family is given by $f(y) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left(-\frac{1}{2} \left(\frac{y - \xi}{\omega}\right)^2 \right) \left(1 + \text{erf} \left( \alpha \left(\frac{y - \xi}{\omega \sqrt{2}} \right) \right) \right)$

where $$\xi$$ is the location parameter, $$\omega$$ is the positive scale parameter, $$\alpha$$ the skewness parameter, and $$\text{erf}$$ denotes the error function of the gaussian distribution. To parameterize the skew-normal distribution in terms of the mean $$\mu$$ and standard deviation $$\sigma$$, $$\omega$$ and $$\xi$$ are computed as $\omega = \frac{\sigma}{\sqrt{1 - \frac{2}{\pi} \frac{\alpha^2}{1 + \alpha^2}}}$

$\xi = \mu - \omega \frac{\alpha}{\sqrt{1 + \alpha^2}} \sqrt{\frac{2}{\pi}}$

If $$\alpha = 0$$, the skew-normal distribution becomes the gaussian distribution. For location shift models, $$y$$ can be any real value.

## Binary and count data models

The density of the binomial family is given by $f(y) = {N \choose y} \mu^{y} (1-\mu)^{N - y}$ where $$N$$ is the number of trials and $$y \in \{0, ... , N\}$$. When all $$N$$ are $$1$$ (i.e., $$y \in \{0,1\}$$), the bernoulli distribution for binary data arises.

For $$y \in \mathbb{N}_0$$, the density of the poisson family is given by $f(y) = \frac{\mu^{y}}{y!} \exp(-\mu)$ The density of the negbinomial (negative binomial) family is $f(y) = {y + \phi - 1 \choose y} \left(\frac{\mu}{\mu + \phi}\right)^{y} \left(\frac{\phi}{\mu + \phi}\right)^\phi$ where $$\phi$$ is a positive precision parameter. For $$\phi \rightarrow \infty$$, the negative binomial distribution becomes the poisson distribution. The density of the geometric family arises if $$\phi$$ is set to $$1$$.

## Time-to-event models

With time-to-event models we mean all models that are defined on the positive reals only, that is $$y \in \mathbb{R}^+$$. The density of the lognormal family is given by $f(y) = \frac{1}{\sqrt{2\pi}\sigma x} \exp\left(-\frac{1}{2}\left(\frac{\log(y) - \mu}{\sigma}\right)^2\right)$ where $$\sigma$$ is the residual standard deviation on the log-scale. The density of the Gamma family is given by $f(y) = \frac{(\alpha / \mu)^\alpha}{\Gamma(\alpha)} y^{\alpha-1} \exp\left(-\frac{\alpha y}{\mu}\right)$ where $$\alpha$$ is a positive shape parameter. The density of the weibull family is given by $f(y) = \frac{\alpha}{s} \left(\frac{y}{s}\right)^{\alpha-1} \exp\left(-\left(\frac{y}{s}\right)^\alpha\right)$ where $$\alpha$$ is again a positive shape parameter and $$s = \mu / \Gamma(1 + 1 / \alpha)$$ is the scale parameter to that $$\mu$$ is the mean of the distribution. The exponential family arises if $$\alpha$$ is set to $$1$$ for either the gamma or Weibull distribution. The density of the inverse.gaussian family is given by $f(y) = \left(\frac{\alpha}{2 \pi y^3}\right)^{1/2} \exp \left(\frac{-\alpha (y - \mu)^2}{2 \mu^2 y} \right)$ where $$\alpha$$ is a positive shape parameter. The cox family implements Cox proportional hazards model which assumes a hazard function of the form $$h(y) = h_0(y) \mu$$ with baseline hazard $$h_0(y)$$ expressed via M-splines (which integrate to I-splines) in order to ensure monotonicity. The density of the cox model is then given by $f(y) = h(y) S(y)$ where $$S(y)$$ is the survival function implied by $$h(y)$$.

## Extreme value models

Modeling extremes requires special distributions. One may use the weibull distribution (see above) or the frechet distribution with density $f(y) = \frac{\nu}{s} \left(\frac{y}{s}\right)^{-1-\nu} \exp\left(-\left(\frac{y}{s}\right)^{-\nu}\right)$ where $$s = \mu / \Gamma(1 - 1 / \nu)$$ is a positive scale parameter and $$\nu > 1$$ is a shape parameter so that $$\mu$$ predicts the mean of the Frechet distribution. A generalization of both distributions is the generalized extreme value distribution (family gen_extreme_value) with density $f(y) = \frac{1}{\sigma} t(y)^{-1 - 1 / \xi} \exp(-t(y))$ where $t(y) = \left(1 + \xi \left(\frac{y - \mu}{\sigma} \right)\right)^{-1 / \xi}$ with positive scale parameter $$\sigma$$ and shape parameter $$\xi$$.

## Response time models

One family that is especially suited to model reaction times is the exgaussian (‘exponentially modified Gaussian’) family. Its density is given by

$f(y) = \frac{1}{2 \beta} \exp\left(\frac{1}{2 \beta} \left(2\xi + \sigma^2 / \beta - 2 y \right) \right) \text{erfc}\left(\frac{\xi + \sigma^2 / \beta - y}{\sqrt{2} \sigma} \right)$ where $$\beta$$ is the scale (inverse rate) of the exponential component, $$\xi$$ is the mean of the Gaussian component, $$\sigma$$ is the standard deviation of the Gaussian component, and $$\text{erfc}$$ is the complementary error function. We parameterize $$\mu = \xi + \beta$$ so that the main predictor term equals the mean of the distribution.

Another family well suited for modeling response times is the shifted_lognormal distribution. It’s density equals that of the lognormal distribution except that the whole distribution is shifted to the right by a positive parameter called ndt (for consistency with the wiener diffusion model explained below).

A family concerned with the combined modeling of reaction times and corresponding binary responses is the wiener diffusion model. It has four model parameters each with a natural interpretation. The parameter $$\alpha > 0$$ describes the separation between two boundaries of the diffusion process, $$\tau > 0$$ describes the non-decision time (e.g., due to image or motor processing), $$\beta \in [0, 1]$$ describes the initial bias in favor of the upper alternative, and $$\delta \in \mathbb{R}$$ describes the drift rate to the boundaries (a positive value indicates a drift towards to upper boundary). The density for the reaction time at the upper boundary is given by

$f(y) = \frac{\alpha}{(y-\tau)^3/2} \exp \! \left(- \delta \alpha \beta - \frac{\delta^2(y-\tau)}{2}\right) \sum_{k = - \infty}^{\infty} (2k + \beta) \phi \! \left(\frac{2k + \alpha \beta}{\sqrt{y - \tau}}\right)$

where $$\phi(x)$$ denotes the standard normal density function. The density at the lower boundary can be obtained by substituting $$1 - \beta$$ for $$\beta$$ and $$-\delta$$ for $$\delta$$ in the above equation. In brms the parameters $$\alpha$$, $$\tau$$, and $$\beta$$ are modeled as auxiliary parameters named bs (‘boundary separation’), ndt (‘non-decision time’), and bias respectively, whereas the drift rate $$\delta$$ is modeled via the ordinary model formula that is as $$\delta = \mu$$.

## Quantile regression

Quantile regression is implemented via family asym_laplace (asymmetric Laplace distribution) with density

$f(y) = \frac{p (1 - p)}{\sigma} \exp\left(-\rho_p\left(\frac{y - \mu}{\sigma}\right)\right)$ where $$\rho_p$$ is given by $$\rho_p(x) = x (p - I_{x < 0})$$ and $$I_A$$ is the indicator function of set $$A$$. The parameter $$\sigma$$ is a positive scale parameter and $$p$$ is the quantile parameter taking on values in $$(0, 1)$$. For this distribution, we have $$P(Y < g(\eta)) = p$$. Thus, quantile regression can be performed by fixing $$p$$ to the quantile to interest.

## Probability models

The density of the Beta family for $$y \in (0,1)$$ is given by $f(y) = \frac{y^{\mu \phi - 1} (1-y)^{(1-\mu) \phi-1}}{B(\mu \phi, (1-\mu) \phi)}$ where $$B$$ is the beta function and $$\phi$$ is a positive precision parameter. A multivariate generalization of the Beta family is the dirichlet family with density $f(y) = \frac{1}{B((\mu_{1}, \ldots, \mu_{K}) \phi)} \prod_{k=1}^K y_{k}^{\mu_{k} \phi - 1}.$ The dirichlet distribution is only implemented with the multivariate logit link function so that $\mu_{j} = \frac{\exp(\eta_{j})}{\sum_{k = 1}^{K} \exp(\eta_{k})}$ For reasons of identifiability, $$\eta_{1}$$ is set to $$0$$.

## Circular models

The density of the von_mises family for $$y \in (-\pi,\pi)$$ is given by $f(y) = \frac{\exp(\kappa \cos(y - \mu))}{2\pi I_0(\kappa)}$ where $$I_0$$ is the modified Bessel function of order 0 and $$\kappa$$ is a positive precision parameter.

## Ordinal and categorical models

For ordinal and categorical models, $$y$$ is one of the categories $$1, ..., K$$. The intercepts of ordinal models are called thresholds and are denoted as $$\tau_k$$, with $$k \in \{1, ..., K-1\}$$, whereas $$\eta$$ does not contain a fixed effects intercept. Note that the applied link functions $$h$$ are technically distribution functions $$\mathbb{R} \rightarrow [0,1]$$. The density of the cumulative family (implementing the most basic ordinal model) is given by $f(y) = g(\tau_{y + 1} - \eta) - g(\tau_{y} - \eta)$

The densities of the sratio (stopping ratio) and cratio (continuation ratio) families are given by $f(y) = g(\tau_{y + 1} - \eta) \prod_{k = 1}^{y} (1 - g(\tau_{k} - \eta))$ and $f(y) = (1 - g(\eta - \tau_{y + 1})) \prod_{k = 1}^{y} g(\eta - \tau_{k})$

respectively. Note that both families are equivalent for symmetric link functions such as logit or probit. The density of the acat (adjacent category) family is given by $f(y) = \frac{\prod_{k=1}^{y} g(\eta - \tau_{k}) \prod_{k=y+1}^K(1-g(\eta - \tau_{k}))}{\sum_{k=0}^K\prod_{j=1}^k g(\eta-\tau_{j}) \prod_{j=k+1}^K(1-g(\eta - \tau_{j}))}$ For the logit link, this can be simplified to $f(y) = \frac{\exp \left(\sum_{k=1}^{y} (\eta - \tau_{k}) \right)} {\sum_{k=0}^K \exp\left(\sum_{j=1}^k (\eta - \tau_{j}) \right)}$ The linear predictor $$\eta$$ can be generalized to also depend on the category $$k$$ for a subset of predictors. This leads to category specific effects (for details on how to specify them see help(brm)). Note that cumulative and sratio models use $$\tau - \eta$$, whereas cratio and acat use $$\eta - \tau$$. This is done to ensure that larger values of $$\eta$$ increase the probability of higher response categories.

The categorical family is currently only implemented with the multivariate logit link function and has density $f(y) = \mu_{y} = \frac{\exp(\eta_{y})}{\sum_{k = 1}^{K} \exp(\eta_{k})}$ Note that $$\eta$$ does also depend on the category $$k$$. For reasons of identifiability, $$\eta_{1}$$ is set to $$0$$. A generalization of the categorical family to more than one trial is the multinomial family with density $f(y) = {N \choose y_{1}, y_{2}, \ldots, y_{K}} \prod_{k=1}^K \mu_{k}^{y_{k}}$ where, for each category, $$\mu_{k}$$ is estimated via the multivariate logit link function shown above.

## Zero-inflated and hurdle models

Zero-inflated and hurdle families extend existing families by adding special processes for responses that are zero. The density of a zero-inflated family is given by $f_z(y) = z + (1 - z) f(0) \quad \text{if } y = 0 \\ f_z(y) = (1 - z) f(y) \quad \text{if } y > 0$ where $$z$$ denotes the zero-inflation probability. Currently implemented families are zero_inflated_poisson, zero_inflated_binomial, zero_inflated_negbinomial, and zero_inflated_beta.

The density of a hurdle family is given by $f_z(y) = z \quad \text{if } y = 0 \\ f_z(y) = (1 - z) f(y) / (1 - f(0)) \quad \text{if } y > 0$ Currently implemented families are hurdle_poisson, hurdle_negbinomial, hurdle_gamma, and hurdle_lognormal.

The density of a zero-one-inflated family is given by $f_{\alpha, \gamma}(y) = \alpha (1 - \gamma) \quad \text{if } y = 0 \\ f_{\alpha, \gamma}(y) = \alpha \gamma \quad \text{if } y = 1 \\ f_{\alpha, \gamma}(y) = (1 - \alpha) f(y) \quad \text{if } y \notin \{0, 1\}$ where $$\alpha$$ is the zero-one-inflation probability (i.e. the probability that zero or one occurs) and $$\gamma$$ is the conditional one-inflation probability (i.e. the probability that one occurs rather than zero). Currently implemented families are zero_one_inflated_beta.