Usage guidance

2021-03-14

Introduction

DescrTab2 is the replacement of the DescrTab package. It supports a variety of different customization options and can be used in .Rmd files in conjunction with knitr.

Preamble settings

imbi_report

You’re all set. Everything is already included.

pdf_document

Here is what you need to include in the yaml header to use DescrTab2 inside .Rmd file with pdf_document output:

---
title: "DescrTab2 tutorial"
header-includes:
   - \usepackage{needspace}
   - \usepackage{longtable}
   - \usepackage{booktabs}
output: pdf_document
---

html & word_document

No special preamble needed. Make sure you have pandoc version >= 2.0 installed on your system.

Global print_format option

In order for DescrTab2 to work properly with your document type of choice, you need to set the print_format options, preferably right at the start of your document. You can do this by typing:

options(print_format = "html") # or = "word" or "tex", depending on your document type

Getting started

For instructive purposes, we will use the following dataset:

dat <- iris[, c("Species", "Sepal.Length")]
dat %<>% mutate(animal= c("Mammal", "Fish") %>% rep(75) %>% factor())
dat %<>% mutate(food= c("fries", "wedges") %>% sample(150, TRUE) %>% factor())

Make sure you include the DescrTab2 library by typing

library(DescrTab2)

somewhere in the document before you use it. You are now ready to go! Producing beautiful descriptive tables in html and tex is now as easy as typing:

```{r, results='asis'}
descr(dat)
```
Variables
Total
p
(N=150)
Species
setosa 50 (33%) >0.999chi1
versicolor 50 (33%)
virginica 50 (33%)
Sepal.Length
N 150 <0.001tt1
mean 5.8
sd 0.83
median 5.8
Q1 - Q3 5.1 – 6.4
min - max 4.3 – 7.9
animal
Fish 75 (50%) >0.999chi1
Mammal 75 (50%)
food
fries 79 (53%) 0.514chi1
wedges 71 (47%)
chi1 Chi-squared goodness-of-fit test
tt1 Students one-sample t-test

Note the chunk option results='asis'. DescrTab2 produces raw LaTeX or hmtl code. To get pandoc to render this properly, the results='asis' option is required. An alternative will be described later.

To produce descriptive tables for a word document, a bit more typing is required:

```{r}
descr(dat) %>% print() %>% knitr::knit_print()
```

When producing word tables in this fashion, you must not have the results='asis' chunk option set.

Note that DescrTab2 can also produce console output! In fact, this is the default setting (i.e. if the global print_format is not specified)

Accessing table elements

The object returned from the descr function is basically just a named list. You may be interested in referencing certain summary statistics from the table in your document. To do this, you can save the list returned by descr:

my_table <- descr(dat)

You can then access the elements of the list using the $ operator.

my_table$variables$Sepal.Length$results$Total$mean
#> [1] 5.843333

Rstudios autocomplete suggestions are very helpful when navigating this list.

The print function returns a formatted version of this list, which you can also save and access using the same syntax.

my_table <- descr(dat) %>% print(silent=TRUE)

Specifying a group

Use the group option to specify the name of a grouping variable in your data:

descr(dat, "Species")
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999chi2
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 23 (46%) 29 (58%) 27 (54%) 79 (53%) 0.473chi2
wedges 27 (54%) 21 (42%) 23 (46%) 71 (47%)
F F-test (ANOVA)
chi2 Pearsons chi-squared test

Assigning labels

Use the group_labels option to assign group labels and the var_labels option to assign variable labels:

descr(dat, "Species", group_labels=list(setosa="My custom group label"), var_labels = list(Sepal.Length = "My custom variable label"))
Variables
My custom group label
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
My custom variable label
N 50 50 50 150 <0.001F
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999chi2
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 23 (46%) 29 (58%) 27 (54%) 79 (53%) 0.473chi2
wedges 27 (54%) 21 (42%) 23 (46%) 71 (47%)
F F-test (ANOVA)
chi2 Pearsons chi-squared test

Confidence intervals for two group comparisons

For 2-group comparisons, decrtab automatically calculates confidence intervals for differences in effect measures:

descr(dat, "animal")
Variables
Fish
Mammal
Total
p
CI
(N=75) (N=75) (N=150)
Species
setosa 25 (33%) 25 (33%) 50 (33%) >0.999chi2
versicolor 25 (33%) 25 (33%) 50 (33%)
virginica 25 (33%) 25 (33%) 50 (33%)
Sepal.Length
N 75 75 150 0.961tt2 Mean dif. CI
mean 5.8 5.8 5.8 [-0.26, 0.27]
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 36 (48%) 43 (57%) 79 (53%) 0.252chi2 Prop. dif. CI
wedges 39 (52%) 32 (43%) 71 (47%) [-0.25, 0.066]
chi2 Pearsons chi-squared test
tt2 Welchs two-sample t-test

Different tests

There are a lot of different tests available. Check out the test_choice vignette for details: https://imbi-heidelberg.github.io/DescrTab2/articles/test_choice_tree_pdf.pdf

Here are some different tests in action:

descr(dat %>% select(-"Species"), "animal", test_options = list(exact=TRUE, nonparametric=TRUE))
Variables
Fish
Mammal
Total
p
CI
(N=75) (N=75) (N=150)
Sepal.Length
N 75 75 150 0.870MWU HL CI
mean 5.8 5.8 5.8 [-0.3, 0.3]
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 36 (48%) 43 (57%) 79 (53%) 0.278Bolo Prop. dif. CI
wedges 39 (52%) 32 (43%) 71 (47%) [-0.25, 0.066]
MWU Mann-Whitney U test
Bolo Boschloos test
descr(dat %>% select(c("Species", "Sepal.Length")), "Species", test_options = list(nonparametric=TRUE))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001KW
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
KW Kruskal-Wallis one-way ANOVA

Paired observations

In situations with paired data, the group variable usually denotes the timing of the measurement (e.g. “before” and “after” or “time 1”, “time 2”, etc.). In these scenarios, you need an additional index variable that specifies which observations from the different timepoints should be paired. The test_options =list(paired=TRUE, indices = <Character name of index variable name or vector of indices>) option can be used to specify the pairing indices, see the example below. DescrTab2 only works with data in “long format”, see e.g. ?reshape or ?tidyr::pivot_longer for information on how to transoform your data from wide to long format.

descr(dat %>% mutate(animal = fct_recode(animal, Before="Fish", After="Mammal")) %>% select(-"Species"), "animal", test_options = list(paired=TRUE, indices=rep(1:75, each=2)))
Variables
Before
After
Total
p
CI
(N=75) (N=75) (N=150)
Sepal.Length
N 75 75 150 0.937tpar Mean dif. CI
mean 5.8 5.8 5.8 [-0.16, 0.18]
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 36 (48%) 43 (57%) 79 (53%) 0.324McN Prop. dif. CI
wedges 39 (52%) 32 (43%) 71 (47%) [-0.25, 0.066]
tpar Students paired t-test
McN McNemars test

descr(dat %>% mutate(animal = fct_recode(animal, Before="Fish", After="Mammal"), idx = rep(1:75, each=2)) %>% select(-"Species"), "animal", test_options = list(paired=TRUE, indices="idx" ))
Variables
Before
After
Total
p
CI
(N=75) (N=75) (N=150)
Sepal.Length
N 75 75 150 0.937tpar Mean dif. CI
mean 5.8 5.8 5.8 [-0.16, 0.18]
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 36 (48%) 43 (57%) 79 (53%) 0.324McN Prop. dif. CI
wedges 39 (52%) 32 (43%) 71 (47%) [-0.25, 0.066]
tpar Students paired t-test
McN McNemars test

Significant digits

Every summary statistic in DescrTab2 is formatted by a corresponding formatting function. You can exchange these formatting functions as you please:

descr(dat, "Species", format_summary_stats = list(mean=function(x)formatC(x, digits = 4)) )
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F
mean 5.006 5.936 6.588 5.843
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999chi2
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 23 (46%) 29 (58%) 27 (54%) 79 (53%) 0.473chi2
wedges 27 (54%) 21 (42%) 23 (46%) 71 (47%)
F F-test (ANOVA)
chi2 Pearsons chi-squared test

Omitting summary statistics

Let’s say you don’t want to calculate quantiles for your numeric variables. You can specify the summary_stats_cont option to include all summary statistics but quantiles:

descr(dat, "Species", summary_stats_cont = list(N = DescrTab2:::.N, Nmiss = DescrTab2:::.Nmiss, mean =
    DescrTab2:::.mean, sd = DescrTab2:::.sd, median = DescrTab2:::.median, min = DescrTab2:::.min, max =
    DescrTab2:::.max))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999chi2
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 23 (46%) 29 (58%) 27 (54%) 79 (53%) 0.473chi2
wedges 27 (54%) 21 (42%) 23 (46%) 71 (47%)
F F-test (ANOVA)
chi2 Pearsons chi-squared test

Adding summary statistics

Let’s say you have a categorical variable, but for some reason it’s levels are numerals and you want to calculate the mean. No problem:

# Create example dataset
dat2 <- iris
dat2$cat_var <- c(1,2) %>% sample(150, TRUE) %>% factor()
dat2 <- dat2[, c("Species", "cat_var")]

descr(dat2, "Species", summary_stats_cat=list(mean=DescrTab2:::.factormean))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
cat_var
mean 1.6 1.6 1.5 1.5 0.898chi2
1 22 (44%) 22 (44%) 24 (48%) 68 (45%)
2 28 (56%) 28 (56%) 26 (52%) 82 (55%)
chi2 Pearsons chi-squared test

Combining mean and sd

Use the format_options = list(combine_mean_sd=TRUE) option:

descr(dat, "Species", format_options = c(combine_mean_sd=TRUE))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F
mean ± sd 5 ± 0.35 5.9 ± 0.52 6.6 ± 0.64 5.8 ± 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999chi2
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 23 (46%) 29 (58%) 27 (54%) 79 (53%) 0.473chi2
wedges 27 (54%) 21 (42%) 23 (46%) 71 (47%)
F F-test (ANOVA)
chi2 Pearsons chi-squared test

Omitting p-values

You can declare the format_options = list(print_p = FALSE) option to omit p-values:

descr(dat, "animal", format_options = list(print_p = FALSE))
Variables
Fish
Mammal
Total
CI
(N=75) (N=75) (N=150)
Species
setosa 25 (33%) 25 (33%) 50 (33%)
versicolor 25 (33%) 25 (33%) 50 (33%)
virginica 25 (33%) 25 (33%) 50 (33%)
Sepal.Length
N 75 75 150 Mean dif. CI
mean 5.8 5.8 5.8 [-0.26, 0.27]
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 36 (48%) 43 (57%) 79 (53%) Prop. dif. CI
wedges 39 (52%) 32 (43%) 71 (47%) [-0.25, 0.066]

Similarily for Confidence intervals:

descr(dat, "animal", format_options = list(print_CI = FALSE))
Variables
Fish
Mammal
Total
p
(N=75) (N=75) (N=150)
Species
setosa 25 (33%) 25 (33%) 50 (33%) >0.999chi2
versicolor 25 (33%) 25 (33%) 50 (33%)
virginica 25 (33%) 25 (33%) 50 (33%)
Sepal.Length
N 75 75 150 0.961tt2
mean 5.8 5.8 5.8
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 36 (48%) 43 (57%) 79 (53%) 0.252chi2
wedges 39 (52%) 32 (43%) 71 (47%)
chi2 Pearsons chi-squared test
tt2 Welchs two-sample t-test

Printing without results=‘asis’

Sometimes, e.g. if you have a loop inside your R-chunk and you want to plot graphics in between descriptive tables, it is necessary not to have the results=‘asis’ option. You can still use DescrTab2 with the following commands:

```{r}
capture.output(print(descr(dat, "Species"))) %>%  knitr::raw_html() # or knitr::raw_tex() for tex
```
capture.output(print(descr(dat, "Species"))) %>%  knitr::raw_html() # or knitr::raw_tex() for tex
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 -- 5.2 5.6 -- 6.3 6.2 -- 6.9 5.1 -- 6.4
min - max 4.3 -- 5.8 4.9 -- 7 4.9 -- 7.9 4.3 -- 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999chi2
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 23 (46%) 29 (58%) 27 (54%) 79 (53%) 0.473chi2
wedges 27 (54%) 21 (42%) 23 (46%) 71 (47%)
F F-test (ANOVA)
chi2 Pearsons chi-squared test

In word documents this is irrelevant, because you never have to specify results='asis'

Controling options on a per-variable level

You can use the var_options list to control formatting and test options on a per-variable basis. Let’s say in the dataset iris, we want that only the Sepal.Length variable has more digits in the mean and a nonparametric test:

descr(iris, "Species", var_options = list(Sepal.Length = list(
  format_summary_stats = list(
    mean = function(x)
      formatC(x, digits = 4)
  ),
  test_options = c(nonparametric = TRUE)
)))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001KW
mean 5.006 5.936 6.588 5.843
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
Sepal.Width
N 50 50 50 150 <0.001F
mean 3.4 2.8 3 3.1
sd 0.38 0.31 0.32 0.44
median 3.4 2.8 3 3
Q1 - Q3 3.2 – 3.7 2.5 – 3 2.8 – 3.2 2.8 – 3.3
min - max 2.3 – 4.4 2 – 3.4 2.2 – 3.8 2 – 4.4
Petal.Length
N 50 50 50 150 <0.001F
mean 1.5 4.3 5.6 3.8
sd 0.17 0.47 0.55 1.8
median 1.5 4.3 5.5 4.3
Q1 - Q3 1.4 – 1.6 4 – 4.6 5.1 – 5.9 1.6 – 5.1
min - max 1 – 1.9 3 – 5.1 4.5 – 6.9 1 – 6.9
Petal.Width
N 50 50 50 150 <0.001F
mean 0.25 1.3 2 1.2
sd 0.11 0.2 0.27 0.76
median 0.2 1.3 2 1.3
Q1 - Q3 0.2 – 0.3 1.2 – 1.5 1.8 – 2.3 0.3 – 1.8
min - max 0.1 – 0.6 1 – 1.8 1.4 – 2.5 0.1 – 2.5
KW Kruskal-Wallis one-way ANOVA
F F-test (ANOVA)