Subsetting Tables

Gabriel Becker and Adrian Waddell

2021-01-19

Introduction

TableTree objects are based on a tree data structure as the name indicates. The package is written such that the user does not need to walk trees for many basic table manipulations. Walking trees will still be necessary for certain manipulation and will be the subject of a different vignette.

In this vignette we show some methods to subset tables and to extract cell values.

We will use the following table for illustrative purposes:

library(rtables)
library(dplyr)

tbl <- basic_table() %>%
  split_cols_by("ARM") %>%
  split_rows_by("SEX", split_fun = drop_split_levels) %>%
  analyze(c("AGE", "STRATA1")) %>%
  build_table(ex_adsl %>% filter(SEX %in% c("M", "F")))

tbl
            A: Drug X   B: Placebo   C: Combination
---------------------------------------------------
F                                                  
  AGE                                              
    Mean      32.76       34.12           35.2     
  STRATA1                                          
    A          21           24             18      
    B          25           27             21      
    C          33           26             27      
M                                                  
  AGE                                              
    Mean      35.57       37.44          35.38     
  STRATA1                                          
    A          16           19             20      
    B          21           17             21      
    C          14           19             19      

The [ accessor function always returns an TableTree object if drop=TRUE is not set. The first argument are the row indices and the second argument the column indices. Alternatively logical subsetting can be used. The indices are based on visible rows and not on the tree structure. So:

tbl[1, 1]
    A: Drug X
-------------
F            

is a table with an empty cell because the first row is a label row. We need to access a cell with actual cell data:

tbl[3, 1]
       A: Drug X
----------------
Mean     32.76  

which is another TableTree and not an rcell. If we wanted the rcell we need to use the drop argument:

tbl[3, 1, drop = TRUE]
[1] 32.75949

One can access multiple rows and columns:

tbl[1:3, 1:2]
           A: Drug X   B: Placebo
---------------------------------
F                                
  AGE                            
    Mean     32.76       34.12   

Note that we do not repeat label rows for descending children, e.g.

tbl[2:4, ]
          A: Drug X   B: Placebo   C: Combination
-------------------------------------------------
AGE                                              
  Mean      32.76       34.12           35.2     
STRATA1                                          

does not show that the first row is derived from AGE. In order to repeat content/label information one should use the pagination feature. Please read the related vignette.

Path Based Cell Value Accessing:

Cell values can also be access via path information. The functions row_paths, col_paths, row_paths_summary, col_paths_summary are helpful to get information on the paths.

tbl2 <- basic_table() %>%
  split_cols_by("ARM") %>%
  split_cols_by("SEX", split_fun = drop_split_levels) %>%
  analyze(c("AGE", "STRATA1")) %>%
  build_table(ex_adsl %>% filter(SEX %in% c("M", "F")))

tbl2
            A: Drug X      B: Placebo      C: Combination  
            F       M       F       M        F         M   
-----------------------------------------------------------
AGE                                                        
  Mean    32.76   35.57   34.12   37.44    35.2      35.38 
STRATA1                                                    
  A        21      16      24      19       18        20   
  B        25      21      27      17       21        21   
  C        33      14      26      19       27        19   

So the column paths are as follows:

col_paths_summary(tbl2)
label             path                       
---------------------------------------------
A: Drug X         ARM, A: Drug X             
  F               ARM, A: Drug X, SEX, F     
  M               ARM, A: Drug X, SEX, M     
B: Placebo        ARM, B: Placebo            
  F               ARM, B: Placebo, SEX, F    
  M               ARM, B: Placebo, SEX, M    
C: Combination    ARM, C: Combination        
  F               ARM, C: Combination, SEX, F
  M               ARM, C: Combination, SEX, M

and the row paths are as follows:

row_paths_summary(tbl2)
rowname    node_class    path      
-----------------------------------
AGE        LabelRow      AGE       
  Mean     DataRow       AGE, Mean 
STRATA1    LabelRow      STRATA1   
  A        DataRow       STRATA1, A
  B        DataRow       STRATA1, B
  C        DataRow       STRATA1, C

So in order to get the average age for all female patients in arm A: Drug X:

value_at(tbl2, c("AGE",  "Mean"), c("ARM", "A: Drug X", "SEX", "F"))
[1] 32.75949

You can also request information from non-cell specific paths with the cell_values function:

cell_values(tbl2, c("AGE", "Mean"), c("ARM", "A: Drug X"))
$`A: Drug X.F`
[1] 32.75949

$`A: Drug X.M`
[1] 35.56863

Note the return value of cell_values is always a list even if you specify a path to a cell:

cell_values(tbl2, c("AGE",  "Mean"), c("ARM", "A: Drug X", "SEX", "F"))
$`A: Drug X.F`
[1] 32.75949

Hence, use value_at if you want to access data from a cell, and cell_values if you want to access data from multiple cells.