1. Irregular Layout

Chris Bailiss

2019-08-05

In This Vignette

Introductory Note

This is a more advanced topic. It requires a good understanding of the material in the previous vignettes.

What is Irregular Layout?

Let’s start by talking about regular layout. The following is an example pivot table that has appeared in previous vignettes:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

This is a regular pivot table: Each level of row and column headings is related to a single variable only, i.e.

Returning to the original question (What is irregular layout?), the definition for our purposes is: Irregular layout is any layout where a single level of row or column headings relates to more than one variable.

Caution

Irregular layouts inevitably require more effort to construct. Irregular layouts also tend to be harder to understand, so they need careful consideration before being used. Often using multiple separate regular pivot tables is a better idea.

Simple Example of an Irregular Layout

Constructing pivot tables that have an irregular layout typically requires more lines of R. This is because the helper functions addColumnDataGroups() and addRowDataGroups() that have been used throughout all of the previous vignettes to easily build-up the structure of a pivot table can no longer be used, since they generate a regular layout only. Instead the layout must be built in a more granular way.

For example, consider we are only interested in the number of express trains (i.e. TrainCategory=“Express Passenger”) and the number of DMU trains (i.e. PowerType=“DMU”), for each train operating company. Since these requirements relate to two different variables, we need to construct an irregular layout. To do this, these two data groups are added individually to the root column group:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$columnGroup$addChildGroup(variableName="TrainCategory", values="Express Passenger")
pt$columnGroup$addChildGroup(variableName="PowerType", values="DMU")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

In the example above, pt$columnGroup refers to the root column group that exists by default in the pivot table. This is not rendered (i.e. not visible in the output pivot table). The first level of visible column groups are the children of this root group.

The equivalent root group for the row groups is pt$rowGroup. Although all of the examples in this vignette use the column groups, exactly the same principles apply to the row groups.

No totals column is added to the above pivot table. This wouldn’t make sense for this pivot table anyway, since some express passenger trains are also DMU trains, so a simple total would double count some trains.

Showing the cell context, as described in the Cell Context vignette, makes the irregularity clearer:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$columnGroup$addChildGroup(variableName="TrainCategory", values="Express Passenger")
pt$columnGroup$addChildGroup(variableName="PowerType", values="DMU")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot(includeHeaderValues=TRUE, includeRCFilters=TRUE)

Extending the Simple Example

Further data groups can be added to the pivot table. These data groups can be regular or irregular. Several different examples are shown below. These examples are rather contrived for demonstration purposes.

If a regular level is desired, this can simply be added using the regular addColumnDataGroups() and addRowDataGroups() functions, for example adding the train status:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$columnGroup$addChildGroup(variableName="TrainCategory", values="Express Passenger")
pt$columnGroup$addChildGroup(variableName="PowerType", values="DMU")
pt$addColumnDataGroups("Status")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

The addChildGroup() function returns the new data group that has been added. The addChildGroup() function can be called on each of these groups to add further groups underneath:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
cg1 <- pt$columnGroup$addChildGroup(variableName="TrainCategory", values="Express Passenger")
cg2 <- pt$columnGroup$addChildGroup(variableName="PowerType", values="DMU")
cg1$addChildGroup(variableName="Status", values="A")
cg1$addChildGroup(variableName="Status", values="R")
cg2$addChildGroup(variableName="SchedSpeedMPH", values="100")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

Instead of adding groups one at a time, the addDataGroups() function can be called on these data groups to add different data groups for different variables underneath each:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
cg1 <- pt$columnGroup$addChildGroup(variableName="TrainCategory", values="Express Passenger")
cg2 <- pt$columnGroup$addChildGroup(variableName="PowerType", values="DMU")
cg1$addDataGroups("Status")
cg2$addDataGroups("SchedSpeedMPH")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

Another Example

The addColumnDataGroups(), addRowDataGroups() and addDataGroups() functions1 all return zero, one or multiple data groups, in the form of an R list. This list can be iterated or used with functions such as lapply() in the usual ways:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
cgrps <- pt$addColumnDataGroups("PowerType")
add2Groups <- function(grp) {
  grp$addChildGroup(variableName="Status", values="A")
  grp$addChildGroup(variableName="Status", values="R")
}
invisible(lapply(cgrps, add2Groups))
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

Each data group has a set of properties that can be used to determine whether/how to add the child groups. For example, to skip adding the status groups to the totals:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
cgrps <- pt$addColumnDataGroups("PowerType")
add2Groups <- function(grp) {
  if(!grp$isTotal) {
    grp$addChildGroup(variableName="Status", values="A")
    grp$addChildGroup(variableName="Status", values="R")
  }
}
invisible(lapply(cgrps, add2Groups))
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

In the rather contrived example above, the totals do not add up to the sum of the individual cells. This is often the case with irregular layouts. There are a few different options for dealing with this, including:

Empty Groups

Sometimes when constructing irregular layouts, it is desirable to insert an “empty” group, i.e. a group that applies no filtering. This is typically needed when one set of column headings has more levels than another. Adding an empty group can be accomplished by simply calling addChildGroup() - either with no arguments or just supplying a caption for display purposes with the caption argument. An example of this can be found in the following section.

Multiple Calculation Groups

Sometimes it is desirable to use different calculations in different parts of the pivot table. Returning to the initial irregular example:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$columnGroup$addChildGroup(variableName="TrainCategory", values="Express Passenger")
pt$columnGroup$addChildGroup(variableName="PowerType", values="DMU")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

Suppose we require the number of express trains but the maximum scheduled speed of the DMU trains. We might naively try the following R:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$columnGroup$addChildGroup(variableName="TrainCategory", values="Express Passenger")
pt$columnGroup$addChildGroup(variableName="PowerType", values="DMU")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", caption="Count", summariseExpression="n()")
pt$defineCalculation(calculationName="MaxSpeedMPH", caption="Maximum Speed", summariseExpression="max(SchedSpeedMPH, na.rm=TRUE)")
pt$renderPivot()

This has replicated both calculations across the whole pivot table, which is not what we wanted.

Instead the solution is to define an additional2 calculation group, then explicitly add the two calculation groups to the relevant parts of the pivot table:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
cg1 <- pt$columnGroup$addChildGroup(variableName="TrainCategory", values="Express Passenger")
cg2 <- pt$columnGroup$addChildGroup(variableName="PowerType", values="DMU")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$defineCalculation(calculationGroupName="calcGrp2", calculationName="MaxSpeedMPH", 
                     summariseExpression="max(SchedSpeedMPH, na.rm=TRUE)")
cg1$addCalculationGroups("default")
cg2$addCalculationGroups("calcGrp2")
pt$renderPivot()

Pivot tables like the above are quite likely to cause confusion. Either the caption of the existing data groups needs changing or additional empty groups with captions should be added. Both of these are demonstrated below.

Changing the data group captions

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
cg1 <- pt$columnGroup$addChildGroup(variableName="TrainCategory", values="Express Passenger", caption="Express Passenger (Count)")
cg2 <- pt$columnGroup$addChildGroup(variableName="PowerType", values="DMU", caption="DMU (Maximum Speed")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$defineCalculation(calculationGroupName="calcGrp2", calculationName="MaxSpeedMPH", 
                     summariseExpression="max(SchedSpeedMPH, na.rm=TRUE)")
cg1$addCalculationGroups("default")
cg2$addCalculationGroups("calcGrp2")
pt$renderPivot()

Adding empty data groups

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
cg1 <- pt$columnGroup$addChildGroup(variableName="TrainCategory", values="Express Passenger")
cg2 <- pt$columnGroup$addChildGroup(variableName="PowerType", values="DMU")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$defineCalculation(calculationGroupName="calcGrp2", calculationName="MaxSpeedMPH", 
                     summariseExpression="max(SchedSpeedMPH, na.rm=TRUE)")
cg3 <- cg1$addChildGroup(caption="Count")
cg4 <- cg2$addChildGroup(caption="Maximum Speed")
cg3$addCalculationGroups("default")
cg4$addCalculationGroups("calcGrp2")
pt$renderPivot()

Custom Layout Changes

The asBasicTable() function allows a pivot table to be converted to a basic table - from the basictabler package.

The basictabler package allows free-form tables to be constructed, in contrast to pivottabler which creates pivot tables with relatively fixed structures. pivottabler contains calculation logic - to calculate the values of cells within the pivot table. basictabler contains no calculation logic - cell values must be provided either from a data frame, row-by-row, column-by-column or cell-by-cell.

Converting a pivot table to a basic table allows the structure of pivot tables to be altered after they have been created, e.g.

library(pivottabler)
library(dplyr)
library(lubridate)
trains <- mutate(bhmtrains, 
                 GbttDate=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
                 GbttMonth=make_date(year=year(GbttDate), month=month(GbttDate), day=1))

pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("GbttMonth", dataFormat=list(format="%B %Y"))
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()

# convert the pivot table to a basic table, insert a new row, merge cells and highlight
bt <- pt$asBasicTable()
bt$cells$insertRow(5)
bt$cells$setCell(5, 2, rawValue="The values below are significantly higher than expected.", 
                 styleDeclarations=list("text-align"="left", "background-color"="yellow",
                                        "font-weight"="bold", "font-style"="italic"))
bt$mergeCells(rFrom=5, cFrom=2, rSpan=1, cSpan=13)
bt$setStyling(rFrom=6, cFrom=2, rTo=6, cTo=14, 
              declarations=list("text-align"="left", "background-color"="yellow"))
bt$renderTable()

Further Reading

The full set of vignettes is:

  1. Introduction
  2. Data Groups
  3. Calculations
  4. Outputs
  5. Latex Output
  6. Styling
  7. Finding and Formatting
  8. Cell Context
  9. Irregular Layout
  10. Performance
  11. Shiny
  12. Excel Export
  13. Appendix: Details
  14. Appendix: Calculations
  15. Appendix: Class Overview

  1. Use the addColumnDataGroups() and addRowDataGroups() functions against the pivot table. Use the addDataGroups() function against data groups.

  2. Every pivot table has a default calculation group named “default”.