Customizing Wrapper Functions

Nickalus Redell

2019-11-22

Purpose

The purpose of this vignette is to provide a closer look at how the user-supplied model training and predict wrapper functions can be modified to give greater control over the model-building process. The goal is to present examples of how the wrapper functions could be flexibly written to keep a linear workflow in forecastML while modeling across multiple forecast horizons and validation datasets. The alternative would be to train models across a single forecast horizon and/or validation window and customize the wrapper functions for this specific setup.

Example 1 - Multiple forecast horizons and 1 model training function

library(DT)
library(dplyr)
library(ggplot2)
library(forecastML)
library(randomForest)

data("data_seatbelts", package = "forecastML")
data <- data_seatbelts

data <- data[, c("DriversKilled", "kms", "PetrolPrice", "law")]

dates <- seq(as.Date("1969-01-01"), as.Date("1984-12-01"), by = "1 month")
data_train <- forecastML::create_lagged_df(data,
                                           type = "train",
                                           outcome_col = 1, 
                                           lookback = 1:12,
                                           horizons = c(3, 12),
                                           dates = dates,
                                           frequency = "1 month")

# View the horizon 3 lagged dataset.
DT::datatable(head((data_train$horizon_3)), options = list("scrollX" = TRUE))


windows <- forecastML::create_windows(data_train, window_length = 0, 
                                      window_start = as.Date("1984-01-01"),
                                      window_stop = as.Date("1984-12-01"))

plot(windows, data_train)

User-defined model-training function

## [1] 3
## [1] 12
## $my_trained_model
## 
## Call:
##  randomForest(formula = model_formula, data = data, ntree = n_tree[1]) 
##                Type of random forest: regression
##                      Number of trees: 200
## No. of variables tried at each split: 13
## 
##           Mean of squared residuals: 255.05
##                     % Var explained: 60.22
## 
## $n_tree
## [1] 200
## 
## $meta_data
## [1] 3
## $my_trained_model
## 
## Call:
##  randomForest(formula = model_formula, data = data, ntree = n_tree[2]) 
##                Type of random forest: regression
##                      Number of trees: 100
## No. of variables tried at each split: 1
## 
##           Mean of squared residuals: 366.4603
##                     % Var explained: 42.84
## 
## $n_tree
## [1] 100
## 
## $meta_data
## [1] 12

User-defined prediction function