In this vignette we use the NHANES data for examples in cross-sectional data and the dataset simLong for examples in longitudinal data. For more info on these datasets, check out the vignette *Visualizing Incomplete Data*, in which the distributions of variables and missing values in both sets is explored.

**Note:**

In many of the examples we use `n.adapt = 0`

(and `n.iter = 0`

, which is the default) in order to prevent the MCMC sampling and, hence, reduce computational time. `mess = FALSE`

is used to suppress messages that are not of interest in this vignette.

**JointAI** has five main functions:

`lm_imp()`

: linear regression`glm_imp()`

: generalized linear regression`lme_imp()`

: linear mixed effects regression`glme_imp()`

: generalized linear mixed effects regression`survreg_imp()`

: linear mixed effects regression

Specification of these functions is similar to the specification of the complete data versions `lm()`

, `glm()`

, `lme()`

(from package **nlme**) and `survreg()`

(from package **survival**). `glme_imp()`

uses a combination of the specification used for `lme()`

and `glm()`

.

`lm_imp()`

, `glm_imp()`

and `survreg_imp()`

take arguments `formula`

and `data`

, whereas `lme_imp()`

and `glme_imp()`

require the specification of a `fixed`

effects and a `random`

effects formula. Specification of the fixed effects formula is demonstrated in section Model formula, specification of the random effects in section Multi level structure & longitudinal covariates

Additionally, `glm_imp()`

and `glme_imp()`

require the specification of the model `family`

and `link`

function.

Implemented families and links are

family | |
---|---|

`gaussian` |
with links: `identity` , `log` |

`binomial` |
with links: `logit` , `probit` , `log` , `cloglog` |

`Gamma` |
with links: `identity` , `log` |

`poisson` |
with links: `log` , `identity` |

The argument `family`

can be given as character string or as function. If the link function is omitted, the default link is used.

**Example:**

The following three specifications are equal:

```
mod1a <- glm_imp(educ ~ age + gender + creat, data = NHANES,
family = "binomial", n.adapt = 0, mess = FALSE)
mod1b <- glm_imp(educ ~ age + gender + creat, data = NHANES,
family = binomial(), n.adapt = 0, mess = FALSE)
mod1c <- glm_imp(educ ~ age + gender + creat, data = NHANES,
family = binomial(link = 'logit'), n.adapt = 0, mess = FALSE)
mod1a$analysis_type
#> [1] "glm"
#> attr(,"family")
#> [1] "binomial"
#> attr(,"link")
#> [1] "logit"
```

To use a probit link instead, this needs to be specified explicitly:

The arguments `formula`

(in `lm_imp()`

, `glm_imp()`

and `survreg_imp()`

) and `fixed`

(in `lme_imp()`

and `glme_imp()`

) take a two-sided `formula`

object, where `~`

separates the response (outcome / dependent variable) from the linear predictor, in which covariates (independent variables) are separated by `+`

. An intercept is added automatically.

`survreg_imp()`

expects a survival object (`Surv()`

) on the left hand side of the model formula. Currently, only right censored data can be handled.

Interactions between variables can be introduced using `:`

or `*`

, which adds the interaction term AND the main effects, i.e.,

is equivalent to

Interactions between multiple variables can be specified using parentheses:

```
mod2a <- glm_imp(educ ~ gender * (age + smoke + creat),
data = NHANES, family = binomial(), mess = FALSE, n.adapt = 0)
parameters(mod2a, mess = FALSE)
#> [1] "(Intercept)" "genderfemale" "age"
#> [4] "smokeformer" "smokecurrent" "creat"
#> [7] "genderfemale:age" "genderfemale:smokeformer" "genderfemale:smokecurrent"
#> [10] "genderfemale:creat"
```

The function `parameters()`

returns the parameters that are specified to be followed (even for models where no MCMC sampling was performed, i.e., when `n.iter = 0`

and `n.adapt = 0`

).

To specify interactions of a given level `^`

can be used:

```
# all two-way interactions:
mod2b <- glm_imp(educ ~ gender + (age + smoke + creat)^2,
data = NHANES, family = binomial(), mess = FALSE, n.adapt = 0)
parameters(mod2b, mess = FALSE)
#> [1] "(Intercept)" "genderfemale" "age" "smokeformer"
#> [5] "smokecurrent" "creat" "age:smokeformer" "age:smokecurrent"
#> [9] "age:creat" "smokeformer:creat" "smokecurrent:creat"
# all two- and three-way interactions:
mod2c <- glm_imp(educ ~ gender + (age + smoke + creat)^3,
data = NHANES, family = binomial(), mess = FALSE, n.adapt = 0)
parameters(mod2c, mess = FALSE)
#> [1] "(Intercept)" "genderfemale" "age"
#> [4] "smokeformer" "smokecurrent" "creat"
#> [7] "age:smokeformer" "age:smokecurrent" "age:creat"
#> [10] "smokeformer:creat" "smokecurrent:creat" "age:smokeformer:creat"
#> [13] "age:smokecurrent:creat"
```

In **JointAI**, interactions between any type of variables (observed, incomplete, time-varying) are allowed. When an incomplete variable is involved, the interaction term is re-calculated within each iteration of the MCMC sampling, using the imputed values from the current iteration.

In practice, associations between outcome and covariates do not always meet the standard assumption that all covariate effects are linear. Often, assuming a logarithmic, quadratic or other non-linear effect is more appropriate.

Non-linear associations can be specified in the model formula using either functions such as `log()`

(the natural logarithm), `sqrt()`

(the square root) or `exp()`

(the exponential function), or by specifying a function of a variable using `I()`

, for example, `I(x^2)`

would be the quadratic term of a variable `x`

.

For *completely observed covariates*, **JointAI** can handle any common type of function implemented in R, including splines, e.g., using `ns()`

or `bs()`

from the package **splines** (which is automatically installed with R).

Since functions involving *variables that have missing values* need to be re-calculated in each iteration of the MCMC sampling, currently, only functions that are available in JAGS can be used for incomplete variables. Those functions include:

`log()`

,`exp()`

`sqrt()`

, polynomials (using`I()`

)`abs()`

`sin()`

,`cos()`

- algebraic operations involving one or multiple (in)complete variables, as long as the formula can be interpreted by JAGS

The list of functions implemented in JAGS can be found in the JAGS user manual.

**Some examples:**^{1}

```
# Absolute difference between bili and creat
mod3a <- lm_imp(SBP ~ age + gender + abs(bili - creat),
data = NHANES, mess = FALSE)
# Using a natural cubic spline for age (completely observed) and a quadratic
# and a cubic effect for bili
library(splines)
mod3b <- lm_imp(SBP ~ ns(age, df = 2) + gender + I(bili^2) + I(bili^3),
data = NHANES, mess = FALSE, keep_model = TRUE)
# A function of creat and albu
mod3c <- lm_imp(SBP ~ age + gender + I(creat/albu^2),
data = NHANES, mess = FALSE)
# This function may make more sense to calculate BMI as weight/height^2, but
# we do not have those variables in the NHANES data...
# Using the sinus and cosinus
mod3d <- lm_imp(SBP ~ bili + sin(creat) + cos(albu),
data = NHANES, mess = FALSE)
```

When a function of a complete or incomplete variable is used in the model formula, the main effect of that variable is automatically added as auxiliary variable (more on auxiliary variables in section Auxiliary variables), and only the main effects are used as predictors in models for incomplete variables.

In `mod3b`

, for example, the spline of age is used as predictor for `SBP`

, but in the imputation model for `bili`

, `age`

enters with a linear effect.

```
list_impmodels(mod3b, priors = FALSE, regcoef = FALSE, otherpars = FALSE)
#> Normal imputation model for 'bili'
#> * Predictor variables:
#> (Intercept), genderfemale, age
```

The function `list_impmodels`

prints a list of the imputation models used in a JointAI object. Since, at the moment, we are only interested in the predictor variables, we suppress printing of information on prior distributions, regression coefficients and other parameters by setting `priors`

, `regcoev`

and `otherpars`

to `FALSE`

.

When a function of a variable is specified as auxiliary variable, this function is used in the imputation models. For example, in `mod3e`

, waist circumference (`WC`

) is not part of the model for `SBP`

, and `I(WC^2)`

is used in the linear predictor of the imputation model for `bili`

:

```
mod3e <- lm_imp(SBP ~ age + gender + bili, auxvars = "I(WC^2)",
data = NHANES, mess = FALSE)
list_impmodels(mod3e, priors = F, regcoef = F, otherpars = F)
#> Normal imputation model for 'WC'
#> * Predictor variables:
#> (Intercept), age, genderfemale
#>
#> Normal imputation model for 'bili'
#> * Predictor variables:
#> (Intercept), age, genderfemale, I(WC^2)
```

Incomplete variables are always imputed on their original scale, i.e.,

- in
`mod3b`

the variable`bili`

is imputed and the quadratic and cubic versions calculated from the imputed values. - Likewise,
`creat`

and`albu`

in`mod3c`

are imputed separately, and`I(creat/albu^2)`

calculated from the imputed (and observed) values.

When different transformations of the same incomplete variable are used in one model, or interaction terms involve the main and interaction effect of an incomplete variable, it is strongly discouraged to calculate these transformations or interaction terms beforehand and supply them as different variables. If, for example, a model formula contains both `x`

and `x2`

(where `x2`

= `x^2`

), they are treated as separate variables and imputed with separate models. Imputed values of `x2`

are thus not equal to the square of imputed values of `x`

. Instead, `x + I(x^2)`

should be used in the model formula. Then, only `x`

is imputed and used in the linear predictor of models for other incomplete variables, and `x^2`

is calculated from the imputed values of `x`

internally.

When a function has restricted support, e.g., `log(x)`

is only defined for `x > 0`

, the distribution used to impute that variable needs to comply with these restrictions. This can either be achieved by truncating the distribution, using the argument `trunc`

, or by selecting an imputation method that meets the restrictions. For more information on imputation methods, see the section Imputation model types.

**Example**:

When using a `log()`

transformation for the covariate `bili`

, we can either use the default imputation method `norm`

(a normal distribution) and truncate it by specifying `trunc = list(bili = c(lower, upper))`

(where the lower and upper limits are the smallest and largest allowed values) or choose an imputation method (using the argument `meth`

; more details see the section on Imputation model types) that only imputes positive values such as a log-normal distribution (`lognorm`

) or a Gamma distribution (`gamma`

):

```
# truncation of the distribution of bili
mod4a <- lm_imp(SBP ~ age + gender + log(bili) + exp(creat),
trunc = list(bili = c(1e-5, 1e10)),
data = NHANES, mess = FALSE, keep_model = T)
# log-normal model for bili
mod4b <- lm_imp(SBP ~ age + gender + log(bili) + exp(creat),
meth = c(bili = 'lognorm', creat = 'norm'),
data = NHANES, mess = FALSE, keep_model = T)
# gamma model for bili
mod4c <- lm_imp(SBP ~ age + gender + log(bili) + exp(creat),
meth = c(bili = 'gamma', creat = 'norm'),
data = NHANES, mess = FALSE, keep_model = T)
```

Truncation always requires to specify both limits. Since `-Inf`

and `Inf`

are not accepted, a value outside the range of the variable (here: 1e10) can be selected for the second boundary of the truncation interval.

It is possible to use functions that have different names in R and JAGS, or that do exist in JAGS, but not in R, by defining a new function in R that has the name of the function in JAGS.

**Example**:

In JAGS the inverse logit transformation is defined in the function `ilogit`

. In R, there is no function `ilogit`

, but the inverse logit is available as the distribution function of the logistic distribution `plogis`

.

It is also possible to nest a function in another function.

**Example:**^{2}

The complementary log log transformation is restricted to values larger than 0 and smaller than 1. In order to use this function on a variable that exceeds this range, as is the case for `creat`

, a second transformation might be used, for instance the inverse logit from the previous example.

In JAGS, the complementary log log transformation is implemented as `cloglog`

, but since this function does not exist in (base) R, we first need to define it:

```
# define the complementary log log transformation
cloglog <- function(x) log(-log(1 - x))
# define the inverse logit (in case you have not done it in the previous example)
ilogit <- plogis
# nest ilogit inside cloglog
mod6a = lm_imp(SBP ~ age + gender + cloglog(ilogit(creat)),
data = NHANES, mess = FALSE)
```

In multi-level models, additional to the fixed effects structure specified by the argument `fixed`

a random effects structure needs to be provided via the argument `random`

.

`random`

takes a one-sided formula starting with a `~`

. Variables for which a random effect should be included are usually separated by a `+`

, and the grouping variable is separated by `|`

. A random intercept is added automatically and only needs to be specified in a random intercept only model.

A few examples:

`random = ~1|id`

: random intercept only, with`id`

as grouping variable`random = ~time|id`

: random intercept and slope for variable`time`

`random = ~time + I(time^2)`

: random intercept, slope and quadratic random effect for`time`

`random = ~time + x | id`

random intercept, random slope for`time`

and random effect for variable`x`

It is possible to use splines in the random effects structure, e.g.:

```
mod7a <- lme_imp(bmi ~ GESTBIR + ETHN + HEIGHT_M + ns(age, df = 2),
random = ~ns(age, df = 2)|ID,
data = simLong, mess = FALSE)
```

Currently, multiple levels of nesting (nested or crossed random effects) are not yet available.

Longitudinal (“time-varying”) covariates can be used in the model, however, they can not yet be imputed.

**JointAI** automatically selects an imputation model for each of the incomplete variables, based on the `class`

of the variable and the number of levels. The automatically selected types are:

name | model | variable type |
---|---|---|

`norm` |
linear regression | continuous variables |

`logit` |
logistic regression | factors with two levels |

`multilogit` |
multinomial logit model | unordered factors with >2 levels |

`cumlogit` |
cumulative logit model | ordered factors with >2 levels |

The imputation models that are chosen by default may not necessarily be appropriate for the data at hand, especially for continuous variables, which often do not comply with the assumptions of (conditional) normality.

Therefore, alternative imputation methods are available for continuous covariates:

name | model | variable type |
---|---|---|

`lognorm` |
normal regression of the log-transformed variable | right-skewed variables >0 |

`gamma` |
Gamma regression (with log-link) | right-skewed variables >0 |

`beta` |
beta regression (with logit-link) | continuous variables with values in [0, 1] |

`lognorm`

assumes a normal distribution for the natural logarithm of the variable, but the variable enters the linear predictor of the analysis model (and possibly other imputation models) on its original scale.

In models `mod4b`

and `mod4c`

we have already seen two examples in which the imputation model type was changed using the argument `meth`

.

`meth`

takes a named vector of imputation model types, where the names are the names of the incomplete variables. When the vector supplied to meth only contains specifications for a subset of the incomplete variables, default models are used for the remaining incomplete variables.

```
mod8a <- lm_imp(SBP ~ age + gender + WC + alc + bili + occup + smoke,
meth = c(bili = 'gamma', WC = 'lognorm'),
data = NHANES, n.iter = 100, mess = FALSE, progress.bar = 'none')
mod8a$meth
#> WC smoke bili occup alc
#> "lognorm" "cumlogit" "gamma" "multilogit" "logit"
```

The function `get_imp_meth()`

, which finds and assigns the default imputation methods, can be called directly. `get_imp_meth`

has arguments

`fixed`

: the fixed effects formula`random`

: the random effects formula (if necessary)`data`

: the dataset`auxvars`

: an optional vector of auxiliary variables.

```
mod8a_meth <- get_imp_meth(SBP ~ age + gender + WC + alc + bili + occup + smoke,
data = NHANES)
mod8a_meth
#> WC smoke bili occup alc
#> "norm" "cumlogit" "norm" "multilogit" "logit"
```

When a continuous incomplete variable has only two different values it is assumed to be binary and its coding and default imputation model will be changed accordingly. This behavior can be overwritten when the imputation method for that variable is specified directly by the user.

Variables of type `logical`

are automatically converted to unordered factors.

By default, the imputation models are ordered by number of missing values (decreasing), and each model has the (cross-sectional) complete covariates and all incomplete variables that appear earlier in the sequence in its linear predictor:

```
# number of missing values in the covariates in mod8a
colSums(is.na(NHANES[, names(mod8a_meth)]))
#> WC smoke bili occup alc
#> 2 7 8 28 34
# print information on the imputation models (and omit everything but the predictor variables)
list_impmodels(mod8a, priors = F, regcoef = F, otherpars = F, refcat = F)
#> Log-normal imputation model for 'WC'
#> * Predictor variables:
#> (Intercept), age, genderfemale
#>
#> Cumulative logit imputation model for 'smoke'
#> * Predictor variables:
#> age, genderfemale, WC
#>
#> Gamma imputation model for 'bili'
#> * Parametrization:
#> - shape: shape_bili = mu_bili^2 * tau_bili
#> - rate: rate_bili = mu_bili * tau_bili
#> * Predictor variables:
#> (Intercept), age, genderfemale, WC, smokeformer, smokecurrent
#>
#> Multinomial logit imputation model for 'occup'
#> * Predictor variables:
#> (Intercept), age, genderfemale, WC, smokeformer, smokecurrent, bili
#>
#> Logistic imputation model for 'alc'
#> * Predictor variables:
#> (Intercept), age, genderfemale, WC, smokeformer, smokecurrent, bili, occuplooking for work, occupnot working
```

By re-ordering the elements in `meth`

, the order of the sequence of imputation models will be changed. Note that this requires the user to specify imputation methods for all incomplete variables; when only a subset is specified the default order will be retained.

Auxiliary variables are variables that are not part of the analysis model, but should be considered as predictor variables in the imputation models because they can inform the imputation of on unobserved values.

Good auxiliary variables are

- associated with an incomplete variable of interest, or are
- associated with the missingness of that variable and
- do not have too many missing values themselves. Importantly, they should be observed for a large proportion of the cases that have a missing value in the variable to be imputed.

In `lm_imp()`

, `glm_imp()`

, `lme_imp()`

, `glme_imp()`

and `survreg_imp()`

auxiliary variables can be specified with the argument `auxvars`

, which is a vector containing the names of the auxiliary variables.

**Example:**

We might consider the variables `educ`

and `smoke`

as predictors for the imputation of `occup`

:

```
mod9a <- lm_imp(SBP ~ gender + age + occup, auxvars = c('educ', 'smoke'),
data = NHANES, n.iter = 100, progress.bar = 'none', mess = FALSE)
```

The regression coefficients for `educ`

and `smoke`

in the analysis model are fixed to zero so that these two variables do not contribute to the model, and are omitted from the model summary:

```
summary(mod9a)
#>
#> Linear model fitted with JointAI
#>
#> Call:
#> lm_imp(formula = SBP ~ gender + age + occup, data = NHANES, n.iter = 100,
#> progress.bar = "none", mess = FALSE, auxvars = c("educ",
#> "smoke"))
#>
#> Posterior summary:
#> Mean SD 2.5% 97.5% tail-prob. GR-crit
#> (Intercept) 105.652 3.1267 99.832 111.65 0.00000 1.00
#> genderfemale -5.624 2.1182 -9.778 -1.65 0.00667 1.02
#> age 0.373 0.0733 0.228 0.51 0.00000 1.05
#> occuplooking for work 4.082 5.9771 -7.727 15.89 0.53333 1.03
#> occupnot working -0.347 2.6793 -5.574 4.93 0.96000 1.07
#>
#> Posterior summary of residual std. deviation:
#> Mean SD 2.5% 97.5% GR-crit
#> sigma_SBP 14.4 0.742 13.1 15.9 1.01
#>
#>
#> MCMC settings:
#> Iterations = 101:200
#> Sample size per chain = 100
#> Thinning interval = 1
#> Number of chains = 3
#>
#> Number of observations: 186
```

They are, however, used as predictors in the imputation for `occup`

and imputed themselves (if they have missing values):

```
list_impmodels(mod9a, priors = FALSE, regcoef = FALSE, otherpars = FALSE, refcat = FALSE)
#> Cumulative logit imputation model for 'smoke'
#> * Predictor variables:
#> genderfemale, age, educhigh
#>
#> Multinomial logit imputation model for 'occup'
#> * Predictor variables:
#> (Intercept), genderfemale, age, educhigh, smokeformer, smokecurrent
```

As shown above in `mod3e`

, it is possible to specify functions of auxiliary variables. In that case, the auxiliary variable is not considered as linear effect but as specified by the function:

```
mod9b <- lm_imp(SBP ~ gender + age + occup, data = NHANES,
auxvars = c('educ', 'smoke', 'log(WC)'),
trunc = list(WC = c(1e-10, 1e10)), n.adapt = 0, mess = FALSE)
```

```
list_impmodels(mod9b, priors = FALSE, regcoef = FALSE, otherpars = FALSE,
refcat = FALSE)
#> Normal imputation model for 'WC'
#> * Predictor variables:
#> (Intercept), genderfemale, age, educhigh
#>
#> Cumulative logit imputation model for 'smoke'
#> * Predictor variables:
#> genderfemale, age, educhigh, log(WC)
#>
#> Multinomial logit imputation model for 'occup'
#> * Predictor variables:
#> (Intercept), genderfemale, age, educhigh, log(WC), smokeformer, smokecurrent
```

Setting the regression coefficient of auxiliary variables to zero in the analysis model implies that the outcome is independent of these variables, conditional on the other variables in the model. If this is not true, the model is mis-specified which may lead to biased results (similar to leaving an confounding variable out of a model).

In **JointAI** dummy coding is used when categorical variables enter a linear predictor, i.e., a binary variables is created for each category, except the reference category. These binary variables have value one when that category was observed and zero otherwise.

By default, the first category of a categorical variable (ordered or unordered) is used as reference, however, this may not always allow the desired interpretation of the regression coefficients. Moreover, when categories are unbalanced, setting the largest group as reference may result in better mixing of the MCMC chains.

Therefore, **JointAI** allows specification of the reference category separately for each variable, via the argument `refcats`

.

To specify the choice of reference category globally for all variables in the model, `refcats`

can be set as

`refcats = "first"`

`refcats = "last"`

`refcats = "largest"`

For example:

Alternatively, `refcats`

takes a named vector, in which the reference category for each variable can be specified either by its number or its name, or one of the three global types: “first”, “last” or “largest”. For variables for which no reference category is specified in the list the default is used.

```
mod10b <- lm_imp(SBP ~ gender + age + race + educ + occup + smoke,
refcats = list(occup = "not working", race = 3, educ = 'largest'),
data = NHANES, n.adapt = 0, mess = FALSE)
```

To help to specify the reference category, the function `set_refcat()`

can be used. It prints the names of the categorical variables that are selected by

- a specified model formula and/or
- a vector of auxiliary variables, or
- a vector of naming covariates

or all categorical variables in the data if only `data`

is provided, and asks a number of questions which the user needs to reply to by input of a number.

```
refs_mod10 <- set_refcat(NHANES, formula = formula(mod10b))
#> The categorical variables are:
#> - "gender"
#> - "race"
#> - "educ"
#> - "occup"
#> - "smoke"
#>
#> How do you want to specify the reference categories?
#>
#> 1: Use the first category for each variable.
#> 2: Use the last category for each variabe.
#> 3: Use the largest category for each variable.
#> 4: Specify the reference categories individually.
```

When option 4 is chosen, a question for each categorical variable is asked, for example:

```
#> The reference category for “race” should be
#>
#> 1: Mexican American
#> 2: Other Hispanic
#> 3: Non-Hispanic White
#> 4: Non-Hispanic Black
#> 5: other
```

After specification of the reference categories for all categorical variables, the determined specification for the argument `refcats`

is printed:

```
#> In the JointAI model specify:
#> refcats = c(gender = 'female', race = 'Non-Hispanic White', educ = 'low',
#> occup = 'not working', smoke = 'never')
#>
#> or use the output of this function.
```

`set_refcat()`

also returns a named vector that can be passed to the argument `refcats`

:

Contrary to base R behavior, dummy coding (i.e., `contr.treatment`

contrasts) are used for ordered factors in any linear predictor. Since the order of levels in an ordered factor contains information relevant to the imputation of missing values, it is important that incomplete ordinal variables are coded as such.

For unordered factors, the reference category could be changed in the dataset directly, outside of **JointAI**, e.g., using `relevel()`

. For ordered factors, however, this is not possible without loosing or changing the order of the levels. In this case, the `refcats`

argument in **JointAI** has to be used.

Changing a reference category via the argument `refcats`

does not change the order of levels in the dataset or any of the data matrices inside **JointAI**. Only when, in the JAGS model, the categorical variables is converted into dummy variables, the reference category is used to determine for which levels the dummies are created.

In the Bayesian framework, parameters are random variables for which a distribution needs to be specified. These distributions depend on parameters themselves, i.e., on hyperparameters.

The function `default_hyperpars()`

returns a list containing the default hyper parameters used in a `JointAI`

model. Arguments `family`

, `link`

and `nranef`

allow to specify a model family and link, as well as the number of random effects.

```
default_hyperpars(nranef = 2)
#> $analysis_model
#> mu_reg_main tau_reg_main a_tau_main b_tau_main
#> 0e+00 1e-04 1e-02 1e-03
#>
#> $Z
#> $Z$RinvD
#> [,1] [,2]
#> [1,] NA 0
#> [2,] 0 NA
#>
#> $Z$KinvD
#> [1] 2
#>
#> $Z$a_diag_RinvD
#> [1] 0.1
#>
#> $Z$b_diag_RinvD
#> [1] 0.01
#>
#>
#> $norm
#> mu_reg_norm tau_reg_norm a_tau_norm b_tau_norm
#> 0e+00 1e-04 1e-02 1e-02
#>
#> $gamma
#> mu_reg_gamma tau_reg_gamma a_tau_gamma b_tau_gamma
#> 0e+00 1e-04 1e-02 1e-02
#>
#> $beta
#> mu_reg_beta tau_reg_beta a_tau_beta b_tau_beta
#> 0e+00 1e-04 1e-02 1e-02
#>
#> $logit
#> mu_reg_logit tau_reg_logit
#> 0.000 0.001
#>
#> $multinomial
#> mu_reg_multinomial tau_reg_multinomial
#> 0.000 0.001
#>
#> $ordinal
#> mu_reg_ordinal tau_reg_ordinal mu_delta_ordinal tau_delta_ordinal
#> 0.000 0.001 0.000 0.001
```

To change the hyperparameters in a **JointAI** model, the list obtained from `default_hyperpars()`

can be edited and passed to the argument `hyperpars`

in `lm_imp()`

, `glm_imp()`

, `lme_imp()`

, `glme_imp()`

or `survreg_imp()`

.

`mu_reg_*`

and `tau_reg_*`

refer to the mean and precision in the distribution for regression coefficients. `a_tau_*`

and `b_tau_*`

are the shape and rate parameters of a Gamma distribution that is used has prior for precision parameters. `RinvD`

is the scale matrix in the Wishart prior for the inverse of the random effects design matrix `D`

, and `KinvD`

the number of degrees of freedom in that distribution. `a_diag_RinvD`

and `b_diag_RinvD`

are the scale and rate parameters of the Gamma prior of the diagonal elements of `RinvD`

. In random effects models with only one random effect, instead of the Wishart distribution a Gamma prior is used for the inverse of `D`

.

When variables are measured on very different scales this can result in slow convergence and bad mixing. Therefore, **JointAI** automatically scales continuous variables are to have mean zero and standard deviation one. Results (parameters and imputed values) are transformed back to the original scale when the results are printed or imputed values exported.

In some settings, however, it is not possible to scale continuous variables. This is the case for incomplete variables that enter the linear predictor of the analysis model in a function and variables that are imputed with models that are defined on a subset of the real line (i.e., with a Gamma or a Beta distribution). Variables that are imputed with a log-normal distribution are scaled, but not centered.

To prevent scaling, the argument `scale_vars`

can be set to `FALSE`

. When a vector of variable names is supplied to `scale_vars`

, only those variables are scaled.

When variables are measured on very different scales this can result in slow convergence and bad mixing. Therefore, **JointAI** automatically scales continuous variables to have mean zero and standard deviation one. Results (parameters and imputed values) are transformed back to the original scale when the results are printed or imputed values are exported.

In some settings, however, it is not possible to scale continuous variables. This is the case for incomplete variables that enter a linear predictor in a function and variables that are imputed with models that are defined on a subset of the real line (i.e., with a Gamma or a Beta distribution). Variables that are imputed with a log-normal distribution are scaled, but not centered. To prevent scaling, the argument `scale_vars`

can be set to `FALSE`

. When a vector of variable names is supplied to `scale_vars`

, only those variables are scaled.