Using the table1 Package to Create HMTL Tables of Descriptive Statistics

Benjamin Rich

2018-07-18

Introduction

It is standard practice in epidemiology and related fields that the first table of any journal article, referred to as “Table 1”, is a table that presents descriptive statistics of baseline characteristics of the study population stratified by exposure. This package makes it fairly straightforward to produce such a table using R. The output format is HTML (which has the advantage of being easy to copy into a Word document; Chrome browser works well). It is convenient to use this package in conjunction with knitr and R Markdown, as the HTML output is passed through untouched (note: as of version 1.1 it is no longer necessary to specify the results='asis' chunk option to have the HTML output appear correctly in the final document); in fact, this vignette serves as an example. The package does allow quite a bit of flexibility to customize the table’s contents and appearance, but this does come at the cost of ease-of-use (more programming, some knowledge of CSS).

Example 1

The first example is inspired by this blog post, which is about how to accomplish a similar task using the htmlTable package. It uses the melanoma data set from the boot package for illustration, and I have copied here the code used to prepare the data:

library(boot) 

melanoma2 <- melanoma
 
# Factor the basic variables that
# we're interested in
melanoma2$status <- 
  factor(melanoma2$status, 
         levels=c(2,1,3),
         labels=c("Alive", # Reference
                  "Melanoma death", 
                  "Non-melanoma death"))

As I first attempt, we can do the following:

table1(~ factor(sex) + age + factor(ulcer) + thickness | status, data=melanoma2)
Alive
(n=134)
Melanoma death
(n=57)
Non-melanoma death
(n=14)
Overall
(n=205)
factor(sex)
0 91 (67.9%) 28 (49.1%) 7 (50.0%) 126 (61.5%)
1 43 (32.1%) 29 (50.9%) 7 (50.0%) 79 (38.5%)
age
Mean (SD) 50.0 (15.9) 55.1 (17.9) 65.3 (10.9) 52.5 (16.7)
Median [Min, Max] 52.0 [4.00, 84.0] 56.0 [14.0, 95.0] 65.0 [49.0, 86.0] 54.0 [4.00, 95.0]
factor(ulcer)
0 92 (68.7%) 16 (28.1%) 7 (50.0%) 115 (56.1%)
1 42 (31.3%) 41 (71.9%) 7 (50.0%) 90 (43.9%)
thickness
Mean (SD) 2.24 (2.33) 4.31 (3.57) 3.72 (3.63) 2.92 (2.96)
Median [Min, Max] 1.36 [0.100, 12.9] 3.54 [0.320, 17.4] 2.26 [0.160, 12.6] 1.94 [0.100, 17.4]

Note that the table1 package uses a familiar formula interface, where the variables to include in the table are separated by ‘+’ symbols, the “stratification” variable (which creates the columns) appears to the right of a “conditioning” symbol ‘|’, and the data argument specifies a data.frame that contains the variables in the formula.

But because we don’t have nice labels for the variables and categories, it doesn’t look great. To improve things, we can create factors with descriptive labels for the categorical variables (sex and ulcer), label each variable the way we want, and specify units for the continuous variables (age and thickness), like this:

melanoma2$sex <- 
  factor(melanoma2$sex, levels=c(1,0),
         labels=c("Male", 
                  "Female"))
 
melanoma2$ulcer <- 
  factor(melanoma2$ulcer, levels=c(0,1),
         labels=c("Absent", 
                  "Present"))

label(melanoma2$sex)       <- "Sex"
label(melanoma2$age)       <- "Age"
label(melanoma2$ulcer)     <- "Ulceration"
label(melanoma2$thickness) <- "Thickness"

units(melanoma2$age)       <- "years"
units(melanoma2$thickness) <- "mm"

table1(~ sex + age + ulcer + thickness | status, data=melanoma2, overall="Total")
Alive
(n=134)
Melanoma death
(n=57)
Non-melanoma death
(n=14)
Total
(n=205)
Sex
Male 43 (32.1%) 29 (50.9%) 7 (50.0%) 79 (38.5%)
Female 91 (67.9%) 28 (49.1%) 7 (50.0%) 126 (61.5%)
Age (years)
Mean (SD) 50.0 (15.9) 55.1 (17.9) 65.3 (10.9) 52.5 (16.7)
Median [Min, Max] 52.0 [4.00, 84.0] 56.0 [14.0, 95.0] 65.0 [49.0, 86.0] 54.0 [4.00, 95.0]
Ulceration
Absent 92 (68.7%) 16 (28.1%) 7 (50.0%) 115 (56.1%)
Present 42 (31.3%) 41 (71.9%) 7 (50.0%) 90 (43.9%)
Thickness (mm)
Mean (SD) 2.24 (2.33) 4.31 (3.57) 3.72 (3.63) 2.92 (2.96)
Median [Min, Max] 1.36 [0.100, 12.9] 3.54 [0.320, 17.4] 2.26 [0.160, 12.6] 1.94 [0.100, 17.4]

This looks better, but still not quite the same as the original blog post: in the blog post the “Total” column is on the left, while we have it on the right; the two “Death” strata (Melanoma and Non-melanoma) should be grouped together under a common heading; the continuous variables Age and Thickness show only Means (SD) (with a ±), and not Median [Min, Max] like the table1 default output; most values are displayed with two significant digits rather than three (I will not concern myself with the footnote here, but it could be added as well). To achieve the same result, we need to customize the output further, and in this case that involves using the slightly more complicated “default” (i.e. non-formula) interface to table1.

First, we set up our labels differently, using a list:

labels <- list(
    variables=list(sex="Sex",
                   age="Age (years)",
                   ulcer="Ulceration",
                   thickness="Thickness (mm)"),
    groups=list("", "", "Death"))

# Remove the word "death" from the labels, since it now appears above
levels(melanoma2$status) <- c("Alive", "Melanoma", "Non-melanoma")

Next, we set up our “strata”, or column, as a list of data.frames, in the order we want them displayed:

strata <- c(list(Total=melanoma2), split(melanoma2, melanoma2$status))

Finally, we can customize the contents using custom renderers. A custom render can be a function that take a vector as the first argument and return a (named) character vector. There is also a simpler way to customize the table contents using an abbreviated code syntax instead of a render function, but it allows less control over rounding (see below). Here, for example, we specify render functions for the continuous and categorical variables as follows:

my.render.cont <- function(x) {
    with(stats.apply.rounding(stats.default(x), digits=2), c("",
        "Mean (SD)"=sprintf("%s (&plusmn; %s)", MEAN, SD)))
}
my.render.cat <- function(x) {
    c("", sapply(stats.default(x), function(y) with(y,
        sprintf("%d (%0.0f %%)", FREQ, PCT))))
}

And here is the result:

table1(strata, labels, groupspan=c(1, 1, 2),
       render.continuous=my.render.cont, render.categorical=my.render.cat)
Death
Total
(n=205)
Alive
(n=134)
Melanoma
(n=57)
Non-melanoma
(n=14)
Sex
Male 79 (39 %) 43 (32 %) 29 (51 %) 7 (50 %)
Female 126 (61 %) 91 (68 %) 28 (49 %) 7 (50 %)
Age (years)
Mean (SD) 52 (± 17) 50 (± 16) 55 (± 18) 65 (± 11)
Ulceration
Absent 115 (56 %) 92 (69 %) 16 (28 %) 7 (50 %)
Present 90 (44 %) 42 (31 %) 41 (72 %) 7 (50 %)
Thickness (mm)
Mean (SD) 2.9 (± 3.0) 2.2 (± 2.3) 4.3 (± 3.6) 3.7 (± 3.6)

This is now looking pretty similar to the original blog post, but admittedly there are still some differences: the sexes are inverted (the original blog post got it wrong); I added units to the continuous variables; I include the number of individuals in each column under the column heading; the percentages are different, because I think they should add to 100% within a column, and in the original blog post they add to 100% along a row (except for the Total column, which adds to 100% within the column). This last point is the most contentious. In my version, it is easier to compare the different types of outcomes with respect to variables like sex, while in the original version it is easier to compare sexes with respect to outcomes. However, this is not really the standard application for these kinds of tables (at least not the one I have in mind). Usually, the columns would represent exposure or treatment groups, not outcomes, and we want to compare those groups with respect to the distribution of baseline characteristics, and for this purpose having percentages add up to 100% within columns makes the most sense. Let’s continue with an example of that nature, using simulated data.

Example 2

For this second example, we will use simulated data. We imagine a clinical trial where subjects have been randomized in a 2:1 ratio to receive an active treatment or placebo. For simplicity, we will only consider three baseline characteristics: age, sex and weight.

f <- function(x, n, ...) factor(sample(x, n, replace=T, ...), levels=x)
set.seed(427)

n <- 146
dat <- data.frame(id=1:n)
dat$treat <- f(c("Placebo", "Treated"), n, prob=c(1, 2)) # 2:1 randomization
dat$age   <- sample(18:65, n, replace=TRUE)
dat$sex   <- f(c("Female", "Male"), n, prob=c(.6, .4))  # 60% female
dat$wt    <- round(exp(rnorm(n, log(70), 0.23)), 1)

# Add some missing data
dat$wt[sample.int(n, 5)] <- NA

label(dat$age)   <- "Age"
label(dat$sex)   <- "Sex"
label(dat$wt)    <- "Weight"
label(dat$treat) <- "Treatment Group"

units(dat$age)   <- "years"
units(dat$wt)    <- "kg"

Using the default settings, we obtain this table:

table1(~ age + sex + wt | treat, data=dat)
Placebo
(n=52)
Treated
(n=94)
Overall
(n=146)
Age (years)
Mean (SD) 44.2 (13.5) 41.3 (13.8) 42.3 (13.7)
Median [Min, Max] 44.0 [19.0, 65.0] 43.0 [18.0, 65.0] 43.5 [18.0, 65.0]
Sex
Female 35 (67.3%) 58 (61.7%) 93 (63.7%)
Male 17 (32.7%) 36 (38.3%) 53 (36.3%)
Weight (kg)
Mean (SD) 69.0 (15.0) 69.4 (17.9) 69.2 (16.9)
Median [Min, Max] 66.3 [45.8, 102] 67.2 [37.5, 119] 66.6 [37.5, 119]
Missing 3 (5.8%) 2 (2.1%) 5 (3.4%)

Note that when contains missing values (here weight), be it continuous or categorical, these are reported as a distinct category (with count and percent).

The “Overall” column can be easily removed (or relabeled):

table1(~ age + sex + wt | treat, data=dat, overall=F)
Placebo
(n=52)
Treated
(n=94)
Age (years)
Mean (SD) 44.2 (13.5) 41.3 (13.8)
Median [Min, Max] 44.0 [19.0, 65.0] 43.0 [18.0, 65.0]
Sex
Female 35 (67.3%) 58 (61.7%)
Male 17 (32.7%) 36 (38.3%)
Weight (kg)
Mean (SD) 69.0 (15.0) 69.4 (17.9)
Median [Min, Max] 66.3 [45.8, 102] 67.2 [37.5, 119]
Missing 3 (5.8%) 2 (2.1%)

We can also have stratification by two variables, in which case they are nested. For example, to see each treatment group split by sex:

table1(~ age + wt | treat*sex, data=dat)
Placebo
Treated
Overall
Female
(n=35)
Male
(n=17)
Female
(n=58)
Male
(n=36)
Female
(n=93)
Male
(n=53)
Age (years)
Mean (SD) 42.8 (14.0) 47.2 (12.4) 40.7 (13.8) 42.3 (13.8) 41.5 (13.9) 43.8 (13.4)
Median [Min, Max] 43.0 [19.0, 65.0] 52.0 [29.0, 65.0] 41.0 [18.0, 65.0] 46.5 [21.0, 61.0] 42.0 [18.0, 65.0] 48.0 [21.0, 65.0]
Weight (kg)
Mean (SD) 70.4 (15.9) 66.4 (13.4) 70.1 (15.3) 68.3 (21.6) 70.2 (15.4) 67.7 (19.2)
Median [Min, Max] 66.4 [47.1, 102] 66.3 [45.8, 92.2] 67.9 [37.5, 111] 63.2 [40.0, 119] 67.4 [37.5, 111] 64.2 [40.0, 119]
Missing 3 (8.6%) 0 (0%) 2 (3.4%) 0 (0%) 5 (5.4%) 0 (0%)

Or, switch the order:

table1(~ age + wt | sex*treat, data=dat)
Female
Male
Overall
Placebo
(n=35)
Treated
(n=58)
Placebo
(n=17)
Treated
(n=36)
Placebo
(n=52)
Treated
(n=94)
Age (years)
Mean (SD) 42.8 (14.0) 40.7 (13.8) 47.2 (12.4) 42.3 (13.8) 44.2 (13.5) 41.3 (13.8)
Median [Min, Max] 43.0 [19.0, 65.0] 41.0 [18.0, 65.0] 52.0 [29.0, 65.0] 46.5 [21.0, 61.0] 44.0 [19.0, 65.0] 43.0 [18.0, 65.0]
Weight (kg)
Mean (SD) 70.4 (15.9) 70.1 (15.3) 66.4 (13.4) 68.3 (21.6) 69.0 (15.0) 69.4 (17.9)
Median [Min, Max] 66.4 [47.1, 102] 67.9 [37.5, 111] 66.3 [45.8, 92.2] 63.2 [40.0, 119] 66.3 [45.8, 102] 67.2 [37.5, 119]
Missing 3 (8.6%) 2 (3.4%) 0 (0%) 0 (0%) 3 (5.8%) 2 (2.1%)

Or, no stratification:

table1(~ treat + age + sex + wt, data=dat)
Overall
(n=146)
Treatment Group
Placebo 52 (35.6%)
Treated 94 (64.4%)
Age (years)
Mean (SD) 42.3 (13.7)
Median [Min, Max] 43.5 [18.0, 65.0]
Sex
Female 93 (63.7%)
Male 53 (36.3%)
Weight (kg)
Mean (SD) 69.2 (16.9)
Median [Min, Max] 66.6 [37.5, 119]
Missing 5 (3.4%)

Finally, we may again consider something a bit more complicated, using the default (i.e., non-formula) interface. Suppose that instead of simply being assigned to placebo or active treatment, there were actually two doses of treatment randomized, 5 mg and 10 mg, and we want columns for each dose level separately, as well as for all treated subjects.

dat$dose <- (dat$treat != "Placebo")*sample(1:2, n, replace=T)
dat$dose <- factor(dat$dose, labels=c("Placebo", "5 mg", "10 mg"))

strata <- c(split(dat, dat$dose), list("All treated"=subset(dat, treat=="Treated")), list(Overall=dat))

labels <- list(
    variables=list(age=render.varlabel(dat$age),
                   sex=render.varlabel(dat$sex),
                   wt=render.varlabel(dat$wt)),
    groups=list("", "Treated", ""))

table1(strata, labels, groupspan=c(1, 3, 1))
Treated
Placebo
(n=52)
5 mg
(n=49)
10 mg
(n=45)
All treated
(n=94)
Overall
(n=146)
Age (years)
Mean (SD) 44.2 (13.5) 42.6 (13.8) 39.9 (13.7) 41.3 (13.8) 42.3 (13.7)
Median [Min, Max] 44.0 [19.0, 65.0] 45.0 [18.0, 63.0] 39.0 [21.0, 65.0] 43.0 [18.0, 65.0] 43.5 [18.0, 65.0]
Sex
Female 35 (67.3%) 28 (57.1%) 30 (66.7%) 58 (61.7%) 93 (63.7%)
Male 17 (32.7%) 21 (42.9%) 15 (33.3%) 36 (38.3%) 53 (36.3%)
Weight (kg)
Mean (SD) 69.0 (15.0) 66.6 (18.3) 72.5 (17.1) 69.4 (17.9) 69.2 (16.9)
Median [Min, Max] 66.3 [45.8, 102] 63.3 [37.5, 119] 70.3 [48.1, 114] 67.2 [37.5, 119] 66.6 [37.5, 119]
Missing 3 (5.8%) 0 (0%) 2 (4.4%) 2 (2.1%) 5 (3.4%)

Using abbreviated code to specify a custom renderer

Suppose that for continuous variables, we want to display the percent coefficient of variation (CV%) instead of the standard deviation (SD). We also want to display the geometric mean and geometric coefficient of variation. We already discussed custom render functions that could be used to accomplish this, but a simpler alternative is to use abbreviated code. This is a character string that contains certain keywords which are substituted for computed values in the table output. The list of recognized keywords comes from the output of the stats.default function and includes: N, NMISS, MEAN, SD, CV, GMEAN, GCV, MEDIAN, MIN, MAX, IQR, Q1, Q2, Q3, T1, T2, FREQ, PCT. Keyword matching is case insensitive, and any text other than the keywords is left untouched. We can specify a vector of character strings, in which case each result will be displayed in its own row in the table. We can use a named vector to specify labels for each row; a dot (‘.’) can be used to indicate that the abbreviated code string itself be used as the row label. Significant digits can be controlled using the digits argument (default: 3). Here is a continuation of the example from the previous section that produces the desired result:

table1(strata, labels, groupspan=c(1, 3, 1),
       render.continuous=c(.="Mean (CV%)", .="Median [Min, Max]",
                           "Geo. mean (Geo. CV%)"="GMEAN (GCV%)"))
Treated
Placebo
(n=52)
5 mg
(n=49)
10 mg
(n=45)
All treated
(n=94)
Overall
(n=146)
Age (years)
Mean (CV%) 44.2 (30.6%) 42.6 (32.5%) 39.9 (34.3%) 41.3 (33.3%) 42.3 (32.4%)
Median [Min, Max] 44.0 [19.0, 65.0] 45.0 [18.0, 63.0] 39.0 [21.0, 65.0] 43.0 [18.0, 65.0] 43.5 [18.0, 65.0]
Geo. mean (Geo. CV%) 42.0 (34.4%) 40.1 (38.3%) 37.5 (37.2%) 38.8 (37.7%) 39.9 (36.6%)
Sex
Female 35 (67.3%) 28 (57.1%) 30 (66.7%) 58 (61.7%) 93 (63.7%)
Male 17 (32.7%) 21 (42.9%) 15 (33.3%) 36 (38.3%) 53 (36.3%)
Weight (kg)
Mean (CV%) 69.0 (21.8%) 66.6 (27.5%) 72.5 (23.6%) 69.4 (25.8%) 69.2 (24.4%)
Median [Min, Max] 66.3 [45.8, 102] 63.3 [37.5, 119] 70.3 [48.1, 114] 67.2 [37.5, 119] 66.6 [37.5, 119]
Geo. mean (Geo. CV%) 67.5 (21.5%) 64.3 (27.3%) 70.7 (23.1%) 67.2 (25.7%) 67.3 (24.3%)
Missing 3 (5.8%) 0 (0%) 2 (4.4%) 2 (2.1%) 5 (3.4%)

Displaying different statistics for different variables

Suppose it is desired to show the median and range for age, but the mean and standard deviation for weight. This can be achieved using a custom render function as follows:

rndr <- function(x, name, ...) {
    if (!is.numeric(x)) return(render.categorical.default(x))
    what <- switch(name,
        age = "Median [Min, Max]",
        wt  = "Mean (SD)")
    parse.abbrev.render.code(c("", what))(x)
}

table1(~ age + sex + wt | treat, data=dat,
       render=rndr)
Placebo
(n=52)
Treated
(n=94)
Overall
(n=146)
Age (years)
Median [Min, Max] 44.0 [19.0, 65.0] 43.0 [18.0, 65.0] 43.5 [18.0, 65.0]
Sex
Female 35 (67.3%) 58 (61.7%) 93 (63.7%)
Male 17 (32.7%) 36 (38.3%) 53 (36.3%)
Weight (kg)
Mean (SD) 69.0 (15.0) 69.4 (17.9) 69.2 (16.9)

Note that instead of overriding render.continuous and render.categorical separately, you can override render which handles both. The render function gets the name of the variable as its second argument, and should also accept ... to capture any other arguments passed to it. Note also that the function parse.abbrev.render.code can be used to turn abbreviated code into a corresponding render function.

Changing the table’s appearance

The default style of table1 uses an Arial font, and resembles the booktabs style commonly used in LaTeX. While this default style is not ugly, inevitably there will be a desire to customize the visual appearance of the table (fonts, colors, gridlines, etc). The package provides a limited number of built-in options for changing the style, while further customization can be achieved in R Markdown documents using CSS (see below).

Using built-in styles

The package includes a limited number of built-in styles including:

These styles can be selected using the topclass argument of table1. Some examples follow:

Placebo
(n=52)
Treated
(n=94)
Overall
(n=146)
Age (years)
Mean (SD) 44.2 (13.5) 41.3 (13.8) 42.3 (13.7)
Median [Min, Max] 44.0 [19.0, 65.0] 43.0 [18.0, 65.0] 43.5 [18.0, 65.0]
Sex
Female 35 (67.3%) 58 (61.7%) 93 (63.7%)
Male 17 (32.7%) 36 (38.3%) 53 (36.3%)
Weight (kg)
Mean (SD) 69.0 (15.0) 69.4 (17.9) 69.2 (16.9)
Median [Min, Max] 66.3 [45.8, 102] 67.2 [37.5, 119] 66.6 [37.5, 119]
Missing 3 (5.8%) 2 (2.1%) 5 (3.4%)
Placebo
(n=52)
Treated
(n=94)
Overall
(n=146)
Age (years)
Mean (SD) 44.2 (13.5) 41.3 (13.8) 42.3 (13.7)
Median [Min, Max] 44.0 [19.0, 65.0] 43.0 [18.0, 65.0] 43.5 [18.0, 65.0]
Sex
Female 35 (67.3%) 58 (61.7%) 93 (63.7%)
Male 17 (32.7%) 36 (38.3%) 53 (36.3%)
Weight (kg)
Mean (SD) 69.0 (15.0) 69.4 (17.9) 69.2 (16.9)
Median [Min, Max] 66.3 [45.8, 102] 67.2 [37.5, 119] 66.6 [37.5, 119]
Missing 3 (5.8%) 2 (2.1%) 5 (3.4%)
Placebo
(n=52)
Treated
(n=94)
Overall
(n=146)
Age (years)
Mean (SD) 44.2 (13.5) 41.3 (13.8) 42.3 (13.7)
Median [Min, Max] 44.0 [19.0, 65.0] 43.0 [18.0, 65.0] 43.5 [18.0, 65.0]
Sex
Female 35 (67.3%) 58 (61.7%) 93 (63.7%)
Male 17 (32.7%) 36 (38.3%) 53 (36.3%)
Weight (kg)
Mean (SD) 69.0 (15.0) 69.4 (17.9) 69.2 (16.9)
Median [Min, Max] 66.3 [45.8, 102] 67.2 [37.5, 119] 66.6 [37.5, 119]
Missing 3 (5.8%) 2 (2.1%) 5 (3.4%)

Note that the style name needs to be preceded by the prefix Rtable1-. Multiple styles can be applied in combination by separating them with a space.

Using custom CSS to control the table’s appearance

Further customization of the table appearance is only possible in R Markdown documents, by using custom CSS which is specified in the document’s YAML header. For examples, to include style.css in the output, the YAML header should contain the following:

output: 
  html_document:
    css: style.css

CSS allows fine control of the appearance of different elements in the table. For examples, if style.css contains the following definitions:

then the output will be as follows:

Placebo
(n=52)
Treated
(n=94)
Overall
(n=146)
Age (years)
Mean (SD) 44.2 (13.5) 41.3 (13.8) 42.3 (13.7)
Median [Min, Max] 44.0 [19.0, 65.0] 43.0 [18.0, 65.0] 43.5 [18.0, 65.0]
Sex
Female 35 (67.3%) 58 (61.7%) 93 (63.7%)
Male 17 (32.7%) 36 (38.3%) 53 (36.3%)
Weight (kg)
Mean (SD) 69.0 (15.0) 69.4 (17.9) 69.2 (16.9)
Median [Min, Max] 66.3 [45.8, 102] 67.2 [37.5, 119] 66.6 [37.5, 119]
Missing 3 (5.8%) 2 (2.1%) 5 (3.4%)

(Note: as an alternative to redefining the default CSS class Rtable1, a different custom CSS class name could be used, and the topclass argument used to select it.)

A column of p-values

A user asked if it was possible to add a column to the table showing the p-value associated with a univariate test for differences in each variable across strata. There is currently no facility built into the package to accomplish this, but with the use of a custom render function and an empty stratum, it can be “tricked” into producing the desired result.

The following example uses the lalonde data from the MatchIt package. In this case, a chi-square test of independence is used for categorical variables, and a t-test for continuous variables (other tests could be used if desired, this is just for illustration purposes).

library(MatchIt) 
data(lalonde)

lalonde$treat    <- factor(lalonde$treat, levels=c(0, 1, 2), labels=c("Control", "Treatment", "P-value"))
lalonde$black    <- factor(lalonde$black)
lalonde$hispan   <- factor(lalonde$hispan)
lalonde$married  <- factor(lalonde$married)
lalonde$nodegree <- factor(lalonde$nodegree)
lalonde$black    <- as.logical(lalonde$black == 1)
lalonde$hispan   <- as.logical(lalonde$hispan == 1)
lalonde$married  <- as.logical(lalonde$married == 1)
lalonde$nodegree <- as.logical(lalonde$nodegree == 1)

label(lalonde$black)    <- "Black"
label(lalonde$hispan)   <- "Hispanic"
label(lalonde$married)  <- "Married"
label(lalonde$nodegree) <- "No high school diploma"
label(lalonde$age)      <- "Age"
label(lalonde$re74)     <- "1974 Income"
label(lalonde$re75)     <- "1975 Income"
label(lalonde$re78)     <- "1978 Income"
units(lalonde$age)      <- "years"

rndr <- function(x, name, ...) {
    if (length(x) == 0) {
        y <- lalonde[[name]]
        s <- rep("", length(render.default(x=y, name=name, ...)))
        if (is.numeric(y)) {
            p <- t.test(y ~ lalonde$treat)$p.value
        } else {
            p <- chisq.test(table(y, droplevels(lalonde$treat)))$p.value
        }
        s[2] <- sub("<", "&lt;", format.pval(p, digits=3, eps=0.001))
        s
    } else {
        render.default(x=x, name=name, ...)
    }
}

rndr.strat <- function(label, n, ...) {
    ifelse(n==0, label, render.strat.default(label, n, ...))
}

table1(~ age + black + hispan + married + nodegree + re74 + re75 + re78 | treat,
    data=lalonde, droplevels=F, render=rndr, render.strat=rndr.strat, overall=F)
Control
(n=429)
Treatment
(n=185)
P-value
Age (years)
Mean (SD) 28.0 (10.8) 25.8 (7.16) 0.00291
Median [Min, Max] 25.0 [16.0, 55.0] 25.0 [17.0, 48.0]
Black
Yes 87 (20.3%) 156 (84.3%) <0.001
No 342 (79.7%) 29 (15.7%)
Hispanic
Yes 61 (14.2%) 11 (5.9%) 0.00532
No 368 (85.8%) 174 (94.1%)
Married
Yes 220 (51.3%) 35 (18.9%) <0.001
No 209 (48.7%) 150 (81.1%)
No high school diploma
Yes 256 (59.7%) 131 (70.8%) 0.0113
No 173 (40.3%) 54 (29.2%)
1974 Income
Mean (SD) 5620 (6790) 2100 (4890) <0.001
Median [Min, Max] 2550 [0.00, 25900] 0.00 [0.00, 35000]
1975 Income
Mean (SD) 2470 (3290) 1530 (3220) 0.00115
Median [Min, Max] 1090 [0.00, 18300] 0.00 [0.00, 25100]
1978 Income
Mean (SD) 6980 (7290) 6350 (7870) 0.349
Median [Min, Max] 4980 [0.00, 25600] 4230 [0.00, 60300]

Admittedly, this is not very elegant, and is definitely a hack. But it is still a neat illustration of the flexibility of the package in terms of being able to accomplish something that we never really planned for.

Transposed table

By default, the table produced by table1 will have strata or subgroups as columns, and variables as rows. In some cases, it may be desirable to transpose the table such that each column is a variables and the rows are strata. This makes most sense when all the variables are continuous and when a compact representation is desired. It can be achieved by using the transpose = TRUE option.

An example:

dat <- expand.grid(i=1:50, group=LETTERS[1:3])
dat <- cbind(dat, matrix(round(exp(rnorm(6*nrow(dat))), 1), nrow=nrow(dat)))
names(dat)[3:8] <- paste0("V", 1:6)

Default:

table1(~ V1 + V2 + V3 + V4 + V5 + V6 | group, data=dat,
       topclass="Rtable1-grid Rtable1-center",
       render="Mean (CV%)<br/>Median [Min, Max]<br/>GMean (GCV%)")
A
(n=50)
B
(n=50)
C
(n=50)
Overall
(n=150)
V1 1.55 (95.7%)
1.25 [0.100, 7.80]
1.08 (110.8%)
1.64 (142.2%)
1.05 [0.100, 12.7]
0.789 (206.3%)
1.75 (154.4%)
1.05 [0.200, 13.6]
1.02 (122.3%)
1.64 (134.7%)
1.10 [0.100, 13.6]
0.952 (144.6%)
V2 1.17 (99.5%)
0.900 [0.200, 6.30]
0.781 (115.2%)
1.60 (128.2%)
0.900 [0.100, 11.4]
0.913 (145.0%)
1.45 (141.8%)
0.850 [0.100, 13.4]
0.829 (145.8%)
1.41 (128.3%)
0.900 [0.100, 13.4]
0.839 (134.1%)
V3 1.71 (129.4%)
0.900 [0.100, 10.2]
1.00 (134.2%)
2.48 (341.2%)
0.950 [0.100, 60.5]
0.961 (145.8%)
1.43 (107.5%)
1.10 [0.200, 9.70]
0.992 (102.5%)
1.87 (273.2%)
1.00 [0.100, 60.5]
0.985 (125.9%)
V4 1.34 (110.5%)
0.900 [0.200, 7.40]
0.890 (109.0%)
1.72 (158.5%)
0.850 [0.100, 18.0]
0.992 (125.9%)
1.69 (122.0%)
1.05 [0.100, 12.2]
1.02 (137.5%)
1.58 (135.3%)
0.900 [0.100, 18.0]
0.966 (123.1%)
V5 1.56 (127.8%)
0.800 [0.200, 9.60]
0.912 (129.7%)
1.79 (138.8%)
1.05 [0.100, 15.1]
1.00 (150.0%)
1.64 (65.8%)
1.30 [0.100, 4.50]
1.28 (90.9%)
1.66 (116.1%)
1.10 [0.100, 15.1]
1.05 (123.9%)
V6 1.60 (93.6%)
1.20 [0.100, 7.10]
1.07 (122.4%)
1.92 (201.4%)
0.800 [0.100, 26.4]
0.894 (168.6%)
2.32 (144.0%)
1.05 [0.100, 19.9]
1.21 (170.4%)
1.94 (157.5%)
1.00 [0.100, 26.4]
1.05 (153.1%)

Transposed:

table1(~ V1 + V2 + V3 + V4 + V5 + V6 | group, data=dat,
       topclass="Rtable1-grid Rtable1-center",
       render="Mean (CV%)<br/>Median [Min, Max]<br/>GMean (GCV%)",
       transpose=TRUE)
V1 V2 V3 V4 V5 V6
A
(n=50)
1.55 (95.7%)
1.25 [0.100, 7.80]
1.08 (110.8%)
1.17 (99.5%)
0.900 [0.200, 6.30]
0.781 (115.2%)
1.71 (129.4%)
0.900 [0.100, 10.2]
1.00 (134.2%)
1.34 (110.5%)
0.900 [0.200, 7.40]
0.890 (109.0%)
1.56 (127.8%)
0.800 [0.200, 9.60]
0.912 (129.7%)
1.60 (93.6%)
1.20 [0.100, 7.10]
1.07 (122.4%)
B
(n=50)
1.64 (142.2%)
1.05 [0.100, 12.7]
0.789 (206.3%)
1.60 (128.2%)
0.900 [0.100, 11.4]
0.913 (145.0%)
2.48 (341.2%)
0.950 [0.100, 60.5]
0.961 (145.8%)
1.72 (158.5%)
0.850 [0.100, 18.0]
0.992 (125.9%)
1.79 (138.8%)
1.05 [0.100, 15.1]
1.00 (150.0%)
1.92 (201.4%)
0.800 [0.100, 26.4]
0.894 (168.6%)
C
(n=50)
1.75 (154.4%)
1.05 [0.200, 13.6]
1.02 (122.3%)
1.45 (141.8%)
0.850 [0.100, 13.4]
0.829 (145.8%)
1.43 (107.5%)
1.10 [0.200, 9.70]
0.992 (102.5%)
1.69 (122.0%)
1.05 [0.100, 12.2]
1.02 (137.5%)
1.64 (65.8%)
1.30 [0.100, 4.50]
1.28 (90.9%)
2.32 (144.0%)
1.05 [0.100, 19.9]
1.21 (170.4%)
Overall
(n=150)
1.64 (134.7%)
1.10 [0.100, 13.6]
0.952 (144.6%)
1.41 (128.3%)
0.900 [0.100, 13.4]
0.839 (134.1%)
1.87 (273.2%)
1.00 [0.100, 60.5]
0.985 (125.9%)
1.58 (135.3%)
0.900 [0.100, 18.0]
0.966 (123.1%)
1.66 (116.1%)
1.10 [0.100, 15.1]
1.05 (123.9%)
1.94 (157.5%)
1.00 [0.100, 26.4]
1.05 (153.1%)