# A2. Appendix: Calculations

## In This Vignette

• Introduction
• Basic Principles
• Modifying filters
• Method 1: Further restricting the data using a PivotFilters object
• Method 2: Flexible filter changes using a PivotFilterOverrides object
• Examples of Modifying Filters
• % of Row Total, % of Column Total
• % of Grand Total
• Ratios/Multiples
• Subsets of Data
• More filter combination examples
• Examples using a custom filter override function
• Running Differences/Changes
• Rolling Average
• Cumulative Sum

## Introduction

This vignette describes how to construct more advanced calculations, such as percentages of totals, cumulative sums, running differences, rolling averages, and ratios/multiples.

Many of the examples in this vignette leave working columns (i.e. intermediate values not of interest) visible in the pivot tables. This is done deliberately to aid understanding. These calculations can be made invisible by specifying visible=FALSE as an additional parameter as part of pt$defineCalculation(...). ## Basic Principles A filter is simply the name of a variable and one or more allowed values of that variable (e.g. TrainCategory=“Express Passenger”). Every cell in a pivot table has a set of associated filters that specify what data is included in the calculation of the value of that cell, as explained in the Cell Context vignette. By changing these filters it is possible to perform different sorts of calculations. The filters associated with each cell can be made visible in the pivot table by specifying pt$renderPivot(includeWorkingData=TRUE) as shown in the example below.

library(pivottabler)
pt <- PivotTable$new() pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()") pt$renderPivot(includeWorkingData=TRUE)

## Modifying filters

There are two possible ways to change the filters that are used when calculating values.

### Method 1: Further restricting the data using a PivotFilters object

When defining a calculation, it is possible to specify a PivotFilters object as part of the calculation. These additional filters are applied on top of the filters already related to that cell, effectively further restricting the subset of data that is covered by the calculation.

This approach has been supported since v0.3.0 of the pivottabler package. A weakness of this method is that it can only be used to further restrict the data - it cannot be used to refer to data outside the scope of the current cell.

As an example of this method, the “Weekend Trains” calculation below specifies an extra filter that means only rows where the WeekdayOrWeekend variable equals “Weekend” are included in the “Weekend Trains” count.

library(dplyr)
library(lubridate)
library(pivottabler)

# get the date of each train and whether that date is a weekday or weekend
trains <- bhmtrains %>%
mutate(GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
DayNumber=wday(GbttDateTime),
WeekdayOrWeekend=ifelse(DayNumber %in% c(1,7), "Weekend", "Weekday"))

# render the pivot table
pt <- PivotTable$new() pt$addData(trains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("TOC")
weekendFilter <- PivotFilters$new(pt, variableName="WeekdayOrWeekend", values="Weekend") pt$defineCalculation(calculationName="WeekendTrains", summariseExpression="n()",
filters=weekendFilter, noDataValue=0,
caption="Weekend Trains")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()", noDataValue=0, caption="Total Trains") pt$defineCalculation(calculationName="WeekendTrainsPercentage",
type="calculation", basedOn=c("WeekendTrains", "TotalTrains"),
format="%.1f %%", caption="% at Weekend",
calculationExpression="ifelse(values$TotalTrains>0, values$WeekendTrains/values$TotalTrains*100, NA)") pt$renderPivot()

### Method 2: Flexible filter changes using a PivotFilterOverrides object

Newly added in v1.0.0 of the pivottabler package is the ability to specify a PivotFilterOverrides object as part of a calculation definition. This provides a more flexible and more powerful way of controlling the filters used when calculating a cell value. In particular, using a PivotFilterOverrides object it is possible to add new filters or change/remove existing filters - this enables many new calculation possibilities. Many of the examples below use this approach.

A few different options can be specified when creating a PivotFilterOverrides object:

• removeAllFilters=TRUE|FALSE specifies whether to clear all existing filters for the calculation (default value is FALSE)
• keepOnlyFiltersFor=variableNames specifies a character vector of variable names to retain the filter criteria for. Filter criteria for all other variables will be cleared.
• removeFiltersFor=variableNames specifies a character vector of variable names for which the filter criteria will be cleared.
• overrideFunction=function specifies a custom R function which will be invoked for each cell to modify the filters before the calculation is carried out.
• filter=filter, action="intersect"|"replace"|"union" specifies a PivotFilter object containing filter criteria which will be combined with the current set of filters using the specified combine method (explained below).
• variableName=variableName, type="ALL"|"VALUES"|"NONE", values=values, action="intersect"|"replace"|"union" specifies a new filter condition which will be combined with the current set of filters using the specified combine method (explained below).

When we talk about combining filters, we mean combining the lists of allowed values in each filter. Three different methods of combining filter criteria are supported:

• intersect - this further restricts the data - i.e. it is additional filter criteria on top of the existing filter criteria for the variable.
• replace - this removes the existing criteria for the specified variable and replaces it with the newly specified criteria for the variable.
• union - this provides additional allowed values, alongside any existing criteria.

The tables below provide some examples of filter overrides applied to hypothetical GbttDate and TOC variables:

Existing Criteria Override Criteria Combine Resulting Criteria
Gbtt = 1st, 2nd, 3rd May 2018 Gbtt = 2nd May 2018 intersect Gbtt = 2nd May 2018
Gbtt = 1st, 2nd, 3rd May 2018 Gbtt = 2nd May 2018 replace Gbtt = 2nd May 2018
Gbtt = 1st, 2nd, 3rd May 2018 Gbtt = 2nd May 2018 union Gbtt = 1st, 2nd, 3rd May 2018
Gbtt = 1st, 2nd, 3rd May 2018 Gbtt = 9th May 2018 intersect NONE (i.e. no matching data)
Gbtt = 1st, 2nd, 3rd May 2018 Gbtt = 9th May 2018 replace Gbtt = 9th May 2018
Gbtt = 1st, 2nd, 3rd May 2018 Gbtt = 9th May 2018 union Gbtt = 1st, 2nd, 3rd, 9th May 2018

Filters for different variables are combined completely independently. Where no filter criteria exists for a particular variable, then there is an implicit filter that allows ALL values, i.e. in the tables above and below, in the Existing Criteria column, there is an implicit “TOC = ALL”.

Existing Criteria Override Combine Resulting Criteria
Gbtt = 1st, 2nd, 3rd May 2018 TOC = c2c intersect Gbtt = 1st, 2nd, 3rd May 2018, TOC=c2c
Gbtt = 1st, 2nd, 3rd May 2018 TOC = c2c replace Gbtt = 1st, 2nd, 3rd May 2018, TOC=c2c
Gbtt = 1st, 2nd, 3rd May 2018 TOC = c2c union Gbtt = 1st, 2nd, 3rd May 2018, TOC=ALL

## Examples of Modifying Filters

The first example immediately below provides more explanation of the approach of overriding the filters. The other examples have less explanation, but the same approach of displaying the working data using pt$renderPivot(includeWorkingData=TRUE) can be used to display helpful filter details with these examples too. ### % of Row Total, % of Column Total Objective: To calculate the percentage of each train operating company’s (TOC’s) trains that are either of category express passenger or ordinary passenger: 1. Define the basic calculation (e.g. a count of trains) as normal. 2. Define a second calculation that will return the row total for every combination in the row. For example, if the row is about a specific TOC, then define a set of filter override that removes all filters except TOC for this new calculation. Everywhere in the row will then have the total for the TOC. 3. Define a third calculation that calculates the percentage based on the above two calculations. library(pivottabler) pt <- PivotTable$new()
pt$addData(bhmtrains) pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC") pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()",
caption="Count")
filterOverrides <- PivotFilterOverrides$new(pt, keepOnlyFiltersFor="TOC") pt$defineCalculation(calculationName="TOCTotalTrains", filters=filterOverrides,
summariseExpression="n()", caption="TOC Total")
pt$defineCalculation(calculationName="PercentageOfTOCTrains", type="calculation", basedOn=c("CountTrains", "TOCTotalTrains"), calculationExpression="values$CountTrains/values$TOCTotalTrains*100", format="%.1f %%", caption="% of TOC") pt$renderPivot()

In the example above, the “TOC Total” value is the same everywhere in each row. This is because the filter criteria has been overriden to remove the “TrainCategory” filter for this calculation. This can be seen by displaying the context:

library(pivottabler)
pt <- PivotTable$new() pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Count") filterOverrides <- PivotFilterOverrides$new(pt, keepOnlyFiltersFor="TOC")
pt$defineCalculation(calculationName="TOCTotalTrains", filters=filterOverrides, summariseExpression="n()", caption="TOC Total") pt$renderPivot(includeWorkingData=TRUE)

### % of Grand Total

Objective:

The approach is similar to the previous example, except the filter override removes all of the existing filters:

library(pivottabler)
pt <- PivotTable$new() pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Count") filterOverrides <- PivotFilterOverrides$new(pt, removeAllFilters=TRUE)
pt$defineCalculation(calculationName="GrandTotalTrains", filters=filterOverrides, summariseExpression="n()", caption="Grand Total") pt$defineCalculation(calculationName="PercentageOfAllTrains", type="calculation",
basedOn=c("CountTrains", "GrandTotalTrains"),
calculationExpression="values$CountTrains/values$GrandTotalTrains*100",
format="%.1f %%", caption="% of All")
pt$renderPivot() ### Ratios/Multiples Objective: We wish to find the multiples of Cross Country Express Passenger: A “CrossCountryExpress” calculation is defined with a set of filters on TrainCategory=“Express Passenger” and TOC=“CrossCountry” that replace the existing filters on these variables: library(pivottabler) pt <- PivotTable$new()
pt$addData(bhmtrains) pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC") pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Count")
filterOverrides <- PivotFilterOverrides$new(pt, removeAllFilters=TRUE) filterOverrides$add(variableName="TrainCategory", values="Express Passenger", action="replace")
filterOverrides$add(variableName="TOC", values="CrossCountry", action="replace") pt$defineCalculation(calculationName="CrossCountryExpress", filters=filterOverrides,
summariseExpression="n()", caption="CrossCountry Express Trains")
pt$defineCalculation(calculationName="MultipleOfCCExpressTrains", type="calculation", basedOn=c("CountTrains", "CrossCountryExpress"), calculationExpression="values$CountTrains/values$CrossCountryExpress", format="%.2f", caption="Multiple of CC Express") pt$renderPivot()

### Subsets of Data

Objective:

We wish to find the percentage of trains with PowerType=“DMU”.

A “CountDMU” calculation is defined that applies an additional filter on PowerType=“DMU”, i.e. that is combined using action=intersect:

library(pivottabler)
pt <- PivotTable$new() pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("TOC")
filterDMU <- PivotFilter$new(pt, variableName="PowerType", values="DMU") filterOverrides <- PivotFilterOverrides$new(pt, filter=filterDMU, action="intersect")
pt$defineCalculation(calculationName="CountDMU", filters=filterOverrides, summariseExpression="n()", caption="DMU") pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Count")
pt$defineCalculation(calculationName="PercentageDMU", type="calculation", basedOn=c("CountTrains", "CountDMU"), calculationExpression="values$CountDMU/values$CountTrains*100", format="%.1f %%", caption="% DMU") pt$renderPivot()

### More filter combination examples

While the above examples have practical use, the following couple of examples are more for illustrative purposes to show what happens when the filter overrides are specified using the same variable as in either the row or column headings. Again, these examples follow the rules described in the table above.

In the following examples, the TOC variable makes up the row headings and a filter override is specified where TOC="London Midland". This filter override is combined in three different ways with the existing filters from the row headings.

intersect:

library(pivottabler)
pt <- PivotTable$new() pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("TOC")
filterTOC <- PivotFilter$new(pt, variableName="TOC", values="London Midland") filterOverrides <- PivotFilterOverrides$new(pt, filter=filterTOC, action="intersect")
pt$defineCalculation(calculationName="CountLM", filters=filterOverrides, summariseExpression="n()", caption="Test Count") pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Count")
pt$renderPivot() In the above example the filter override in the “Test Count” calculation intersects the TOC criteria in every row with TOC="London Midland". The intersection of TOC="London Midland" with anything other than TOC="London Midland" results in an empty set, so only the “London Midland” row has any matching data. Another way of looking at this is to say that it is impossible for any row in the source data frame to simultaneously have two different values for TOC, e.g. in a single data frame row, TOC cannot equal both “Arriva Trains Wales” and “London Midland”. Therefore, the only row where the “Test Count” calculation has a value is the “London Midland” row. replace: library(pivottabler) pt <- PivotTable$new()
pt$addData(bhmtrains) pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC") filterTOC <- PivotFilter$new(pt, variableName="TOC", values="London Midland")
filterOverrides <- PivotFilterOverrides$new(pt, filter=filterTOC, action="replace") pt$defineCalculation(calculationName="CountLM", filters=filterOverrides, summariseExpression="n()", caption="Test Count")
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Count") pt$renderPivot()

In the above example the filter override in the “Test Count” calculation replaces the TOC criteria in every row with TOC="London Midland".

union:

library(pivottabler)
pt <- PivotTable$new() pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("TOC")
filterTOC <- PivotFilter$new(pt, variableName="TOC", values="London Midland") filterOverrides <- PivotFilterOverrides$new(pt, filter=filterTOC, action="union")
pt$defineCalculation(calculationName="CountLM", filters=filterOverrides, summariseExpression="n()", caption="Test Count") pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Count")
pt$renderPivot() In the above example the filter override in the “Test Count” calculation unions the TOC criteria in every row with TOC="London Midland". So the London Midland count is included in every row for “Test Count”, e.g. the value of “Test Count” for “Arriva Trains Wales” includes the values for both “Arriva Trains Wales” and “London Midland”, i.e. 3079 + 14487 = 17566. ## Examples using a custom filter override function The above examples provide different basic ways of modifying filters. For relative / more dynamic filters, a custom filter override function can be used. The package will invoke this function once per cell. Within the function it is possible to examine the existing filters and selectively change them on a cell by cell basis. When invoked the custom function is provided with a reference to the existing filters and a reference to the current cell. In the majority of cases, working with the filters only will be sufficient. The reference to the cell is provided only for those rare cases where additional information is needed about the cell (e.g. to determine the exact cell location, whether it is a total cell, etc). No changes should be made to the cell, only to the filters. ### Running Differences/Changes Objective: We wish to show the change in the number of trains compared to the previous day for the first seven days of 2017. A custom filter override function is used to change the filter from the current date (i.e. the date in the current row) to the previous date: library(dplyr) trains <- bhmtrains %>% mutate(GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival), GbttDate=as.Date(GbttDateTime)) januaryDates <- seq(as.Date("2017-01-01"), as.Date("2017-01-07"), by="days") # comparison to yesterday # date filter function to return yesterday getYesterdayDateFilter <- function(pt, filters, cell) { # get the date filter filter <- filters$getFilter("GbttDate")
if(is.null(filter)||(filter$type=="ALL")||(length(filter$values)>1)) {
# there is no filter on GbttDate in this cell
# i.e. we are in one of the total cells that covers all dates,
# so the concept of yesterday has no meaning, so block all dates
newFilter <- PivotFilter$new(pt, variableName="GbttDate", type="NONE") filters$setFilter(newFilter, action="replace")
}
else {
# get the date value and subtract one day
date <- filter$values date <- date - 1 filter$values <- date
}
}
# build the pivot
library(pivottabler)
pt <- PivotTable$new() pt$addData(trains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("GbttDate", fromData=FALSE,
explicitListOfValues=as.list(januaryDates), visualTotals=TRUE)
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Current Day Count") filterOverrides <- PivotFilterOverrides$new(pt, overrideFunction=getYesterdayDateFilter)
pt$defineCalculation(calculationName="CountPreviousDayTrains", filters=filterOverrides, summariseExpression="n()", caption="Previous Day Count") pt$defineCalculation(calculationName="Daily Change", type="calculation",
basedOn=c("CountTrains", "CountPreviousDayTrains"),
calculationExpression="values$CountTrains-values$CountPreviousDayTrains",
caption="Daily Change")
pt$renderPivot() The above example uses visualTotals=TRUE, since the data frame contains other dates outside of the range 1st to 7th January. If visual totals is not enabled, then the column totals include data for all dates, i.e. the column totals in the pivot table would be greater than the sum of the values in the pivot table column. ### Rolling Average Objective: We wish to show a three-day rolling average of train count for the first seven days of 2017. A custom filter override function is used to change the filter to include the previous, current and following date: library(dplyr) trains <- bhmtrains %>% mutate(GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival), GbttDate=as.Date(GbttDateTime)) januaryDates <- seq(as.Date("2017-01-01"), as.Date("2017-01-07"), by="days") # three-day rolling average # date filter function to a three day range of dates getThreeDayFilter <- function(pt, filters, cell) { # get the date filter filter <- filters$getFilter("GbttDate")
if(is.null(filter)||(filter$type=="ALL")||(length(filter$values)>1)) {
# there is no filter on GbttDate in this cell
# i.e. we are in one of the total cells that covers all dates,
# so the concept of previous/next day has no meaning, so block all dates
newFilter <- PivotFilter$new(pt, variableName="GbttDate", type="NONE") filters$setFilter(newFilter, action="replace")
}
else {
# get the date value and create three day filter
date <- filter$values newDates <- seq(date-1, date+1, by="days") filter$values <- newDates
}
}
# build the pivot
library(pivottabler)
pt <- PivotTable$new() pt$addData(trains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("GbttDate", fromData=FALSE,
explicitListOfValues=as.list(januaryDates), visualTotals=TRUE)
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Current Day Count") filterOverrides <- PivotFilterOverrides$new(pt, overrideFunction=getThreeDayFilter)
pt$defineCalculation(calculationName="ThreeDayCount", filters=filterOverrides, summariseExpression="n()", caption="Three Day Total") pt$defineCalculation(calculationName="ThreeDayAverage", type="calculation",
basedOn="ThreeDayCount",
calculationExpression="values$ThreeDayCount/3", format="%.1f", caption="Three Day Rolling Average") pt$renderPivot()

The above example also uses visualTotals=TRUE for the same reasons as described in the previous example.

Another variation of the above that is useful when full data for the rolling average is not available for the first and last days:

library(dplyr)
library(dplyr)

# here the trains data frame does not contain data for 31st Dec and 8th Jan,
# so the rolling average for 1st Jan and 7th Jan will be incomplete.
trains <- bhmtrains %>%
mutate(GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
GbttDate=as.Date(GbttDateTime)) %>%
filter((as.Date("2017-01-01") <= GbttDate) & (GbttDate <= as.Date("2017-01-07")))

# three-day rolling average
# date filter function to a three day range of dates
getThreeDayFilter <- function(pt, filters, cell) {
# get the date filter
filter <- filters$getFilter("GbttDate") if(is.null(filter)||(filter$type=="ALL")||(length(filter$values)>1)) { # there is no filter on GbttDate in this cell # i.e. we are in one of the total cells that covers all dates, # so the concept of previous/next day has no meaning, so block all dates newFilter <- PivotFilter$new(pt, variableName="GbttDate", type="NONE")
filters$setFilter(newFilter, action="replace") } else { # get the date value and create three day filter date <- filter$values
newDates <- seq(date-1, date+1, by="days")
filter$values <- newDates } } # build the pivot library(pivottabler) pt <- PivotTable$new()
pt$addData(trains) pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("GbttDate") pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()",
caption="Current Day Count")
filterOverrides <- PivotFilterOverrides$new(pt, overrideFunction=getThreeDayFilter) pt$defineCalculation(calculationName="DaysWithDataCount", filters=filterOverrides,
summariseExpression="n_distinct(GbttDate)", caption="Days With Data")
pt$defineCalculation(calculationName="ThreeDayCount", filters=filterOverrides, summariseExpression="n()", caption="Three Day Total") pt$defineCalculation(calculationName="ThreeDayAverage", type="calculation",
basedOn=c("DaysWithDataCount", "ThreeDayCount"),
calculationExpression="ifelse(values$DaysWithDataCount==3, values$ThreeDayCount/3, NA)",
format="%.1f", caption="Three Day Rolling Average")
pt$renderPivot() ### Cumulative Sum Objective: We wish to show a cumulative sum of the number of trains for the first seven days of 2017. A custom filter override function is used to change the filter to include dates between the 1st January 2017 and the current date: library(dplyr) trains <- bhmtrains %>% mutate(GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival), GbttDate=as.Date(GbttDateTime)) %>% filter((as.Date("2017-01-01") <= GbttDate)&(GbttDate <= as.Date("2017-01-07"))) januaryDates <- seq(as.Date("2017-01-01"), as.Date("2017-01-07"), by="days") # date filter function to all dates since 1st jan getCumulativeFilter <- function(pt, filters, cell) { # get the date filter filter <- filters$getFilter("GbttDate")
if(is.null(filter)||(filter$type=="ALL")||(length(filter$values)>1)) {
# there is no filter on GbttDate in this cell
# i.e. we are in one of the total cells that covers all dates,
# can allow this to just be the total
}
else {
# get the date value and modify the filter
date <- filter$values newDates <- seq(as.Date("2017-01-01"), date, by="days") filter$values <- newDates
}
}
# build the pivot
library(pivottabler)
pt <- PivotTable$new() pt$addData(trains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("GbttDate", fromData=FALSE,
explicitListOfValues=as.list(januaryDates))
pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()", caption="Current Day Count") filterOverrides <- PivotFilterOverrides$new(pt, overrideFunction=getCumulativeFilter)
pt$defineCalculation(calculationName="CumulativeCount", filters=filterOverrides, summariseExpression="n()", caption="Cumulative Count") pt$renderPivot()