A family of window functions

Earo Wang

Time series come with a strict temporal order that dictate the type of operations that can be done. An example of operation is the moving average, where a window slides over the time order and averages of the response are computed on the temporal subset. The tsibble package provides three moving window operations, called verbs that operate on temporal data objects (nouns):

These functions handle all sorts of objects and feature purrr-like interface. In this vignette, I will walk you through the slide() and its variants, but the example snippets are also applicable to tile() and stretch().

In spirit of purrr::map(), slide() accepts one input, slide2() two inputs, and pslide() multiple inputs, all of which always return lists for the sake of type stability. Other variants including slide_lgl(), slide_int(), slide_dbl(), slide_chr() return vectors of the corresponding type, as well as slide_dfr() and slide_dfc() for row-binding and column-binding data frames respectively. This full-fledged window family empowers users to build window-related workflows in all sorts of ways, from fixed window size to calendar periods, and from moving average to model fitting.

The pedestrian dataset includes hourly pedestrian counts in the city of Melbourne, with Sensor as key and Date_Time as index. These windowed functions are index-based rolling for tackling general problems, rather than time indexed. Implicit missing values are thereby made explicit using fill_gaps(), and .full = TRUE warrants the equal time length of each sensor. This prepares the data inputs in the expected order.

library(tsibble)
library(dplyr)
pedestrian_full <- pedestrian %>% 
  fill_gaps(.full = TRUE)
pedestrian_full
#> # A tsibble: 70,176 x 5 [1h] <Australia/Melbourne>
#> # Key:       Sensor [4]
#>   Sensor         Date_Time           Date        Time Count
#>   <chr>          <dttm>              <date>     <int> <int>
#> 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01     0  1630
#> 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01     1   826
#> 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01     2   567
#> 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01     3   264
#> 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01     4   139
#> # … with 7.017e+04 more rows

Fixed window size

Moving average is one of the common techniques to smooth time series. We can apply daily window smoother (a fixed window size of 24) easily for each sensor. slide() returns an output the same length as the input with .fill = NA (by default) and .align = "center-left" padded at both sides of the data range, so that the result fits into mutate() in harmony. slide_dbl() produces the numeric vector returned by mean().

pedestrian_full %>% 
  group_by(Sensor) %>% 
  mutate(Daily_MA = slide_dbl(Count, 
    mean, na.rm = TRUE, .size = 24, .align = "center-left"
  ))
#> # A tsibble: 70,176 x 6 [1h] <Australia/Melbourne>
#> # Key:       Sensor [4]
#> # Groups:    Sensor [4]
#>   Sensor         Date_Time           Date        Time Count Daily_MA
#>   <chr>          <dttm>              <date>     <int> <int>    <dbl>
#> 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01     0  1630       NA
#> 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01     1   826       NA
#> 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01     2   567       NA
#> 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01     3   264       NA
#> 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01     4   139       NA
#> # … with 7.017e+04 more rows

To make this even-order moving average symmetric, a second moving average with .size = 2 should be applied to Daily_MA.

Flexible calendar period

What if the time period we’d like to slide over happens not to be a fixed window size, for example sliding over three months. The preprocessing step is to wrap observations into monthly subsets (a list of tsibbles) using nest().

pedestrian_mth <- pedestrian_full %>% 
  mutate(YrMth = yearmonth(Date_Time)) %>% 
  nest(-Sensor, -YrMth)
pedestrian_mth
#> # A tibble: 96 x 3
#>   Sensor            YrMth data               
#>   <chr>             <mth> <list>             
#> 1 Birrarung Marr 2015 Jan <tsibble [744 × 4]>
#> 2 Birrarung Marr 2015 Feb <tsibble [672 × 4]>
#> 3 Birrarung Marr 2015 Mar <tsibble [744 × 4]>
#> 4 Birrarung Marr 2015 Apr <tsibble [721 × 4]>
#> 5 Birrarung Marr 2015 May <tsibble [744 × 4]>
#> # … with 91 more rows

Now it’s ready to (rock and) roll. When setting .size = 1 in slide(), it behaves exactly the same as purrr::map(), mapping over each element in the object. However, (1) a bundle of 3 subsets (.size = 3) needs to be binded first and then computed for average counts; (2) alternatively, .bind = TRUE takes care of binding data frames by row. The nicely-glued simple operations facilitate complex tasks in an easier-to-comprehend manner.

pedestrian_mth %>% 
  group_by(Sensor) %>% 
  # (1)
  # mutate(Monthly_MA = slide_dbl(data, 
  #   ~ mean(bind_rows(.)$Count, na.rm = TRUE), .size = 3, .align = "center"
  # ))
  # (2) equivalent to (1)
  mutate(Monthly_MA = slide_dbl(data, 
    ~ mean(.$Count, na.rm = TRUE), .size = 3, .align = "center", .bind = TRUE
  ))
#> # A tibble: 96 x 4
#> # Groups:   Sensor [4]
#>   Sensor            YrMth data                Monthly_MA
#>   <chr>             <mth> <list>                   <dbl>
#> 1 Birrarung Marr 2015 Jan <tsibble [744 × 4]>        NA 
#> 2 Birrarung Marr 2015 Feb <tsibble [672 × 4]>       634.
#> 3 Birrarung Marr 2015 Mar <tsibble [744 × 4]>       546.
#> 4 Birrarung Marr 2015 Apr <tsibble [721 × 4]>       554.
#> 5 Birrarung Marr 2015 May <tsibble [744 × 4]>       397.
#> # … with 91 more rows

Row-oriented workflow

We have had a glimpse at row-oriented workflow to slide over consecutive months using nest() in the preceding example. To leverage this workflow more, we can fit a linear model for each sensor simultaneously but independently, and in turn obtain its fitted values and residuals over weekly rolling windows. This is where pslide() comes to play. It takes a list or a data frame (multiple inputs) and apply the custom function my_diag() to every rolling block. We start with a tsibble and end up with a diagnostic tibble of relatively larger size.

my_diag <- function(...) {
  data <- tibble(...)
  fit <- lm(Count ~ Time, data = data)
  list(fitted = fitted(fit), resid = residuals(fit))
}
pedestrian %>%
  filter_index(~ "2015-03") %>%
  nest(-Sensor) %>%
  mutate(diag = purrr::map(data, ~ pslide_dfr(., my_diag, .size = 24 * 7)))
#> # A tibble: 4 x 3
#>   Sensor                        data                  diag                 
#>   <chr>                         <list>                <list>               
#> 1 Birrarung Marr                <tsibble [2,160 × 4]> <tibble [334,825 × 2…
#> 2 Bourke Street Mall (North)    <tsibble [1,032 × 4]> <tibble [145,321 × 2…
#> 3 QV Market-Elizabeth St (West) <tsibble [2,160 × 4]> <tibble [334,825 × 2…
#> 4 Southern Cross Station        <tsibble [2,160 × 4]> <tibble [334,825 × 2…

Why slide() not working for this case? It is intended to work with list (i.e. column-wise data frame). However, when we perform a row-wise sliding over data frame, pslide() does the job.

Other features

The slide() examples default to sliding over complete sets. In some cases, you may find partial sliding more appropriate, which can be enabled by .partial = TRUE. Additionally, as opposed to moving window forward by a positive .size, a negative one moves window backward.