Augmented Dynamic Adaptive Model

Ivan Svetunkov

2024-06-19

This vignette explains briefly how to use the function adam() and the related auto.adam() in smooth package. It does not aim at covering all aspects of the function, but focuses on the main ones.

ADAM is Augmented Dynamic Adaptive Model. It is a model that underlies ETS, ARIMA and regression, connecting them in a unified framework. The underlying model for ADAM is a Single Source of Error state space model, which is explained in detail separately in an online textbook.

The main philosophy of adam() function is to be agnostic of the provided data. This means that it will work with ts, msts, zoo, xts, data.frame, numeric and other classes of data. The specification of seasonality in the model is done using a separate parameter lags, so you are not obliged to transform the existing data to something specific, and can use it as is. If you provide a matrix, or a data.frame, or a data.table, or any other multivariate structure, then the function will use the first column for the response variable and the others for the explanatory ones. One thing that is currently assumed in the function is that the data is measured at a regular frequency. If this is not the case, you will need to introduce missing values manually.

In order to run the experiments in this vignette, we need to load the following packages:

require(greybox)
require(smooth)

ADAM ETS

First and foremost, ADAM implements ETS model, although in a more flexible way than (Hyndman et al. 2008): it supports different distributions for the error term, which are regulated via distribution parameter. By default, the additive error model relies on Normal distribution, while the multiplicative error one assumes Inverse Gaussian. If you want to reproduce the classical ETS, you would need to specify distribution="dnorm". Here is an example of ADAM ETS(MMM) with Normal distribution on AirPassengers data:

testModel <- adam(AirPassengers, "MMM", lags=c(1,12), distribution="dnorm",
                  h=12, holdout=TRUE)
summary(testModel)
#> 
#> Model estimated using adam() function: ETS(MMM)
#> Response variable: AirPassengers
#> Distribution used in the estimation: Normal
#> Loss function type: likelihood; Loss function value: 475.7278
#> Coefficients:
#>             Estimate Std. Error Lower 2.5% Upper 97.5%  
#> alpha         0.4143     0.0764     0.2630      0.5653 *
#> beta          0.0005     0.0044     0.0000      0.0092  
#> gamma         0.0264     0.0394     0.0000      0.1041  
#> level       108.5160     3.1894   102.1985    114.8183 *
#> trend         1.0114     0.0014     1.0087      1.0141 *
#> seasonal_1    0.9090     0.0083     0.8946      0.9348 *
#> seasonal_2    0.9031     0.0095     0.8887      0.9289 *
#> seasonal_3    1.0237     0.0108     1.0094      1.0495 *
#> seasonal_4    0.9963     0.0072     0.9819      1.0221 *
#> seasonal_5    0.9921     0.0083     0.9777      1.0179 *
#> seasonal_6    1.1169     0.0115     1.1026      1.1427 *
#> seasonal_7    1.2339     0.0131     1.2196      1.2598 *
#> seasonal_8    1.2267     0.0127     1.2123      1.2525 *
#> seasonal_9    1.0620     0.0105     1.0477      1.0879 *
#> seasonal_10   0.9221     0.0092     0.9077      0.9479 *
#> seasonal_11   0.8051     0.0085     0.7907      0.8309 *
#> 
#> Error standard deviation: 0.0393
#> Sample size: 132
#> Number of estimated parameters: 17
#> Number of degrees of freedom: 115
#> Information criteria:
#>       AIC      AICc       BIC      BICc 
#>  985.4556  990.8240 1034.4632 1047.5697
plot(forecast(testModel,h=12,interval="prediction"))

You might notice that the summary contains more than what is reported by other smooth functions. This one also produces standard errors for the estimated parameters based on Fisher Information calculation. Note that this is computationally expensive, so if you have a model with more than 30 variables, the calculation of standard errors might take plenty of time. As for the default print() method, it will produce a shorter summary from the model, without the standard errors (similar to what es() does):

testModel
#> Time elapsed: 0.13 seconds
#> Model estimated using adam() function: ETS(MMM)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 475.7278
#> Persistence vector g:
#>  alpha   beta  gamma 
#> 0.4143 0.0005 0.0264 
#> 
#> Sample size: 132
#> Number of estimated parameters: 17
#> Number of degrees of freedom: 115
#> Information criteria:
#>       AIC      AICc       BIC      BICc 
#>  985.4556  990.8240 1034.4632 1047.5697 
#> 
#> Forecast errors:
#> ME: -14.479; MAE: 17.456; RMSE: 24.368
#> sCE: -66.193%; Asymmetry: -85%; sMAE: 6.65%; sMSE: 0.862%
#> MASE: 0.725; RMSSE: 0.778; rMAE: 0.23; rRMSE: 0.237

Also, note that the prediction interval in case of multiplicative error models are approximate. It is advisable to use simulations instead (which is slower, but more accurate):

plot(forecast(testModel,h=18,interval="simulated"))

If you want to do the residuals diagnostics, then it is recommended to use plot function, something like this (you can select, which of the plots to produce):

par(mfcol=c(3,4))
plot(testModel,which=c(1:11))
par(mfcol=c(1,1))
plot(testModel,which=12)

By default ADAM will estimate models via maximising likelihood function. But there is also a parameter loss, which allows selecting from a list of already implemented loss functions (again, see documentation for adam() for the full list) or using a function written by a user. Here is how to do the latter on the example of BJsales:

lossFunction <- function(actual, fitted, B){
  return(sum(abs(actual-fitted)^3))
}
testModel <- adam(BJsales, "AAN", silent=FALSE, loss=lossFunction,
                  h=12, holdout=TRUE)
testModel
#> Time elapsed: 0.02 seconds
#> Model estimated using adam() function: ETS(AAN)
#> Distribution assumed in the model: Normal
#> Loss function type: custom; Loss function value: 599.224
#> Persistence vector g:
#>  alpha   beta 
#> 1.0000 0.2265 
#> 
#> Sample size: 138
#> Number of estimated parameters: 4
#> Number of degrees of freedom: 134
#> Information criteria are unavailable for the chosen loss & distribution.
#> 
#> Forecast errors:
#> ME: 3.016; MAE: 3.13; RMSE: 3.867
#> sCE: 15.923%; Asymmetry: 91.7%; sMAE: 1.377%; sMSE: 0.029%
#> MASE: 2.627; RMSSE: 2.521; rMAE: 1.01; rRMSE: 1.009

Note that you need to have parameters actual, fitted and B in the function, which correspond to the vector of actual values, vector of fitted values on each iteration and a vector of the optimised parameters.

loss and distribution parameters are independent, so in the example above, we have assumed that the error term follows Normal distribution, but we have estimated its parameters using a non-conventional loss because we can. Some of distributions assume that there is an additional parameter, which can either be estimated or provided by user. These include Asymmetric Laplace (distribution="dalaplace") with alpha, Generalised Normal and Log-Generalised normal (distribution=c("gnorm","dlgnorm")) with shape and Student’s T (distribution="dt") with nu:

testModel <- adam(BJsales, "MMN", silent=FALSE, distribution="dgnorm", shape=3,
                  h=12, holdout=TRUE)

The model selection in ADAM ETS relies on information criteria and works correctly only for the loss="likelihood". There are several options, how to select the model, see them in the description of the function: ?adam(). The default one uses branch-and-bound algorithm, similar to the one used in es(), but only considers additive trend models (the multiplicative trend ones are less stable and need more attention from a forecaster):

testModel <- adam(AirPassengers, "ZXZ", lags=c(1,12), silent=FALSE,
                  h=12, holdout=TRUE)
#> Forming the pool of models based on... ANN , ANA , MNM , MAM , Estimation progress:    71 %86 %100 %... Done!
testModel
#> Time elapsed: 0.57 seconds
#> Model estimated using adam() function: ETS(MAM)
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 467.7828
#> Persistence vector g:
#>  alpha   beta  gamma 
#> 0.7487 0.0178 0.0006 
#> 
#> Sample size: 132
#> Number of estimated parameters: 17
#> Number of degrees of freedom: 115
#> Information criteria:
#>       AIC      AICc       BIC      BICc 
#>  969.5655  974.9339 1018.5731 1031.6796 
#> 
#> Forecast errors:
#> ME: 3.005; MAE: 16.312; RMSE: 22.838
#> sCE: 13.737%; Asymmetry: 37.5%; sMAE: 6.214%; sMSE: 0.757%
#> MASE: 0.677; RMSSE: 0.729; rMAE: 0.215; rRMSE: 0.222

Note that the function produces point forecasts if h>0, but it won’t generate prediction interval. This is why you need to use forecast() method (as shown in the first example in this vignette).

Similarly to es(), function supports combination of models, but it saves all the tested models in the output for a potential reuse. Here how it works:

testModel <- adam(AirPassengers, "CXC", lags=c(1,12),
                  h=12, holdout=TRUE)
testForecast <- forecast(testModel,h=18,interval="semiparametric", level=c(0.9,0.95))
testForecast
#>          Point forecast Lower bound (5%) Lower bound (2.5%) Upper bound (95%)
#> Jan 1960       414.8649         391.5810           387.2529          438.7114
#> Feb 1960       410.6283         381.2299           375.8122          440.9392
#> Mar 1960       473.2942         432.6534           425.2245          515.4579
#> Apr 1960       456.5806         414.0166           406.2690          500.8822
#> May 1960       459.5624         415.6970           407.7229          505.2633
#> Jun 1960       522.3406         470.1741           460.7155          576.7958
#> Jul 1960       581.7566         521.2214           510.2724          645.0648
#> Aug 1960       580.7394         518.9643           507.8064          645.4114
#> Sep 1960       508.5326         454.2424           444.4388          565.3792
#> Oct 1960       444.7669         396.6700           387.9920          495.1600
#> Nov 1960       388.7625         346.3430           338.6940          433.2272
#> Dec 1960       438.1822         389.0777           380.2390          489.7228
#> Jan 1961       448.1516         395.0905           385.5760          504.0040
#> Feb 1961       443.3565         387.4364           377.4563          502.4240
#> Mar 1961       510.7678         441.3690           429.0568          584.3951
#> Apr 1961       492.4934         421.8667           409.3953          567.6815
#> May 1961       495.4734         423.5192           410.8286          572.1420
#> Jun 1961       562.8933         480.1278           465.5488          651.1627
#>          Upper bound (97.5%)
#> Jan 1960            443.4143
#> Feb 1960            446.9646
#> Mar 1960            523.9010
#> Apr 1960            509.7868
#> May 1960            514.4596
#> Jun 1960            587.7786
#> Jul 1960            657.8605
#> Aug 1960            658.4983
#> Sep 1960            576.8850
#> Oct 1960            505.3670
#> Nov 1960            442.2381
#> Dec 1960            500.1837
#> Jan 1961            515.3770
#> Feb 1961            514.4998
#> Mar 1961            599.5226
#> Apr 1961            583.1896
#> May 1961            587.9710
#> Jun 1961            669.4059
plot(testForecast)

Yes, now we support vectors for the levels in case you want to produce several. In fact, we also support side for prediction interval, so you can extract specific quantiles without a hustle:

forecast(testModel,h=18,interval="semiparametric", level=c(0.9,0.95,0.99), side="upper")
#>          Point forecast Upper bound (90%) Upper bound (95%) Upper bound (99%)
#> Jan 1960       414.8649          433.3305          438.7114          448.9243
#> Feb 1960       410.6283          434.0594          440.9392          454.0387
#> Mar 1960       473.2942          505.8360          515.4579          533.8323
#> Apr 1960       456.5806          490.7443          500.8822          520.2711
#> May 1960       459.5624          494.7963          505.2633          525.2904
#> Jun 1960       522.3406          564.3029          576.7958          600.7209
#> Jul 1960       581.7566          630.5178          645.0648          672.9475
#> Aug 1960       580.7394          630.5379          645.4114          673.9335
#> Sep 1960       508.5326          552.3033          565.3792          590.4561
#> Oct 1960       444.7669          483.5624          495.1600          517.4083
#> Nov 1960       388.7625          422.9900          433.2272          452.8698
#> Dec 1960       438.1822          477.8430          489.7228          512.5310
#> Jan 1961       448.1516          491.0992          504.0040          528.8123
#> Feb 1961       443.3565          488.7360          502.4240          528.7796
#> Mar 1961       510.7678          567.2700          584.3951          617.4337
#> Apr 1961       492.4934          550.1430          567.6815          601.5694
#> May 1961       495.4734          554.2450          572.1420          606.7360
#> Jun 1961       562.8933          630.5419          651.1627          691.0383

A brand new thing in the function is the possibility to use several frequencies (double / triple / quadruple / … seasonal models). In order to show how it works, we will generate an artificial time series, inspired by half-hourly electricity demand using sim.gum() function:

ordersGUM <- c(1,1,1)
lagsGUM <- c(1,48,336)
initialGUM1 <- -25381.7
initialGUM2 <- c(23955.09, 24248.75, 24848.54, 25012.63, 24634.14, 24548.22, 24544.63, 24572.77,
                 24498.33, 24250.94, 24545.44, 25005.92, 26164.65, 27038.55, 28262.16, 28619.83,
                 28892.19, 28575.07, 28837.87, 28695.12, 28623.02, 28679.42, 28682.16, 28683.40,
                 28647.97, 28374.42, 28261.56, 28199.69, 28341.69, 28314.12, 28252.46, 28491.20,
                 28647.98, 28761.28, 28560.11, 28059.95, 27719.22, 27530.23, 27315.47, 27028.83,
                 26933.75, 26961.91, 27372.44, 27362.18, 27271.31, 26365.97, 25570.88, 25058.01)
initialGUM3 <- c(23920.16, 23026.43, 22812.23, 23169.52, 23332.56, 23129.27, 22941.20, 22692.40,
                 22607.53, 22427.79, 22227.64, 22580.72, 23871.99, 25758.34, 28092.21, 30220.46,
                 31786.51, 32699.80, 33225.72, 33788.82, 33892.25, 34112.97, 34231.06, 34449.53,
                 34423.61, 34333.93, 34085.28, 33948.46, 33791.81, 33736.17, 33536.61, 33633.48,
                 33798.09, 33918.13, 33871.41, 33403.75, 32706.46, 31929.96, 31400.48, 30798.24,
                 29958.04, 30020.36, 29822.62, 30414.88, 30100.74, 29833.49, 28302.29, 26906.72,
                 26378.64, 25382.11, 25108.30, 25407.07, 25469.06, 25291.89, 25054.11, 24802.21,
                 24681.89, 24366.97, 24134.74, 24304.08, 25253.99, 26950.23, 29080.48, 31076.33,
                 32453.20, 33232.81, 33661.61, 33991.21, 34017.02, 34164.47, 34398.01, 34655.21,
                 34746.83, 34596.60, 34396.54, 34236.31, 34153.32, 34102.62, 33970.92, 34016.13,
                 34237.27, 34430.08, 34379.39, 33944.06, 33154.67, 32418.62, 31781.90, 31208.69,
                 30662.59, 30230.67, 30062.80, 30421.11, 30710.54, 30239.27, 28949.56, 27506.96,
                 26891.75, 25946.24, 25599.88, 25921.47, 26023.51, 25826.29, 25548.72, 25405.78,
                 25210.45, 25046.38, 24759.76, 24957.54, 25815.10, 27568.98, 29765.24, 31728.25,
                 32987.51, 33633.74, 34021.09, 34407.19, 34464.65, 34540.67, 34644.56, 34756.59,
                 34743.81, 34630.05, 34506.39, 34319.61, 34110.96, 33961.19, 33876.04, 33969.95,
                 34220.96, 34444.66, 34474.57, 34018.83, 33307.40, 32718.90, 32115.27, 31663.53,
                 30903.82, 31013.83, 31025.04, 31106.81, 30681.74, 30245.70, 29055.49, 27582.68,
                 26974.67, 25993.83, 25701.93, 25940.87, 26098.63, 25771.85, 25468.41, 25315.74,
                 25131.87, 24913.15, 24641.53, 24807.15, 25760.85, 27386.39, 29570.03, 31634.00,
                 32911.26, 33603.94, 34020.90, 34297.65, 34308.37, 34504.71, 34586.78, 34725.81,
                 34765.47, 34619.92, 34478.54, 34285.00, 34071.90, 33986.48, 33756.85, 33799.37,
                 33987.95, 34047.32, 33924.48, 33580.82, 32905.87, 32293.86, 31670.02, 31092.57,
                 30639.73, 30245.42, 30281.61, 30484.33, 30349.51, 29889.23, 28570.31, 27185.55,
                 26521.85, 25543.84, 25187.82, 25371.59, 25410.07, 25077.67, 24741.93, 24554.62,
                 24427.19, 24127.21, 23887.55, 24028.40, 24981.34, 26652.32, 28808.00, 30847.09,
                 32304.13, 33059.02, 33562.51, 33878.96, 33976.68, 34172.61, 34274.50, 34328.71,
                 34370.12, 34095.69, 33797.46, 33522.96, 33169.94, 32883.32, 32586.24, 32380.84,
                 32425.30, 32532.69, 32444.24, 32132.49, 31582.39, 30926.58, 30347.73, 29518.04,
                 29070.95, 28586.20, 28416.94, 28598.76, 28529.75, 28424.68, 27588.76, 26604.13,
                 26101.63, 25003.82, 24576.66, 24634.66, 24586.21, 24224.92, 23858.42, 23577.32,
                 23272.28, 22772.00, 22215.13, 21987.29, 21948.95, 22310.79, 22853.79, 24226.06,
                 25772.55, 27266.27, 28045.65, 28606.14, 28793.51, 28755.83, 28613.74, 28376.47,
                 27900.76, 27682.75, 27089.10, 26481.80, 26062.94, 25717.46, 25500.27, 25171.05,
                 25223.12, 25634.63, 26306.31, 26822.46, 26787.57, 26571.18, 26405.21, 26148.41,
                 25704.47, 25473.10, 25265.97, 26006.94, 26408.68, 26592.04, 26224.64, 25407.27,
                 25090.35, 23930.21, 23534.13, 23585.75, 23556.93, 23230.25, 22880.24, 22525.52,
                 22236.71, 21715.08, 21051.17, 20689.40, 20099.18, 19939.71, 19722.69, 20421.58,
                 21542.03, 22962.69, 23848.69, 24958.84, 25938.72, 26316.56, 26742.61, 26990.79,
                 27116.94, 27168.78, 26464.41, 25703.23, 25103.56, 24891.27, 24715.27, 24436.51,
                 24327.31, 24473.02, 24893.89, 25304.13, 25591.77, 25653.00, 25897.55, 25859.32,
                 25918.32, 25984.63, 26232.01, 26810.86, 27209.70, 26863.50, 25734.54, 24456.96)
y <- sim.gum(orders=ordersGUM, lags=lagsGUM, nsim=1, frequency=336, obs=3360,
             measurement=rep(1,3), transition=diag(3), persistence=c(0.045,0.162,0.375),
             initial=cbind(initialGUM1,initialGUM2,initialGUM3))$data

We can then apply ADAM to this data:

testModel <- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
                  silent=FALSE, h=336, holdout=TRUE)
testModel
#> Time elapsed: 1.11 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 19500.06
#> Persistence vector g:
#>  alpha   beta gamma1 gamma2 
#> 0.0350 0.0000 0.1803 0.3057 
#> Damping parameter: 0.9936
#> Sample size: 3024
#> Number of estimated parameters: 6
#> Number of degrees of freedom: 3018
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 39012.12 39012.14 39048.20 39048.31 
#> 
#> Forecast errors:
#> ME: 135.704; MAE: 186.674; RMSE: 229.875
#> sCE: 150.815%; Asymmetry: 74%; sMAE: 0.617%; sMSE: 0.006%
#> MASE: 0.252; RMSSE: 0.222; rMAE: 0.027; rRMSE: 0.027

Note that the more lags you have, the more initial seasonal components the function will need to estimate, which is a difficult task. This is why we used initial="backcasting" in the example above - this speeds up the estimation by reducing the number of parameters to estimate. Still, the optimiser might not get close to the optimal value, so we can help it. First, we can give more time for the calculation, increasing the number of iterations via maxeval (the default value is 40 iterations for each estimated parameter, e.g. \(40 \times 5 = 200\) in our case):

testModel <- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
                  silent=FALSE, h=336, holdout=TRUE, maxeval=10000)
testModel
#> Time elapsed: 1.24 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 19500.06
#> Persistence vector g:
#>  alpha   beta gamma1 gamma2 
#> 0.0351 0.0000 0.1797 0.3055 
#> Damping parameter: 0.9935
#> Sample size: 3024
#> Number of estimated parameters: 6
#> Number of degrees of freedom: 3018
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 39012.11 39012.14 39048.20 39048.31 
#> 
#> Forecast errors:
#> ME: 135.758; MAE: 186.691; RMSE: 229.89
#> sCE: 150.876%; Asymmetry: 74.1%; sMAE: 0.618%; sMSE: 0.006%
#> MASE: 0.252; RMSSE: 0.222; rMAE: 0.027; rRMSE: 0.027

This will take more time, but will typically lead to more refined parameters. You can control other parameters of the optimiser as well, such as algorithm, xtol_rel, print_level and others, which are explained in the documentation for nloptr function from nloptr package (run nloptr.print.options() for details). Second, we can give a different set of initial parameters for the optimiser, have a look at what the function saves:

testModel$B

and use this as a starting point for the reestimation (e.g. with a different algorithm):

testModel <- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
                  silent=FALSE, h=336, holdout=TRUE, B=testModel$B)
testModel
#> Time elapsed: 0.45 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 19500.06
#> Persistence vector g:
#>  alpha   beta gamma1 gamma2 
#> 0.0351 0.0000 0.1799 0.3059 
#> Damping parameter: 0.996
#> Sample size: 3024
#> Number of estimated parameters: 6
#> Number of degrees of freedom: 3018
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 39012.11 39012.14 39048.20 39048.31 
#> 
#> Forecast errors:
#> ME: 135.789; MAE: 186.716; RMSE: 229.915
#> sCE: 150.91%; Asymmetry: 74.1%; sMAE: 0.618%; sMSE: 0.006%
#> MASE: 0.252; RMSSE: 0.222; rMAE: 0.027; rRMSE: 0.027

If you are ready to wait, you can change the initialisation to the initial="optimal", which in our case will take much more time because of the number of estimated parameters - 389 for the chosen model. The estimation process in this case might take 20 - 30 times more than in the example above.

In addition, you can specify some parts of the initial state vector or some parts of the persistence vector, here is an example:

testModel <- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
                  silent=TRUE, h=336, holdout=TRUE, persistence=list(beta=0.1))
testModel
#> Time elapsed: 0.71 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 21804.83
#> Persistence vector g:
#>  alpha   beta gamma1 gamma2 
#> 0.9531 0.1000 0.0284 0.0468 
#> Damping parameter: 0.7836
#> Sample size: 3024
#> Number of estimated parameters: 5
#> Number of degrees of freedom: 3019
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 43619.66 43619.68 43649.73 43649.81 
#> 
#> Forecast errors:
#> ME: 470.35; MAE: 889.006; RMSE: 1101.571
#> sCE: 522.727%; Asymmetry: 56.3%; sMAE: 2.94%; sMSE: 0.133%
#> MASE: 1.199; RMSSE: 1.063; rMAE: 0.129; rRMSE: 0.13

The function also handles intermittent data (the data with zeroes) and the data with missing values. This is partially covered in the vignette on the oes() function. Here is a simple example:

testModel <- adam(rpois(120,0.5), "MNN", silent=FALSE, h=12, holdout=TRUE,
                  occurrence="odds-ratio")
testModel
#> Time elapsed: 0.02 seconds
#> Model estimated using adam() function: iETS(MNN)[O]
#> Occurrence model type: Odds ratio
#> Distribution assumed in the model: Mixture of Bernoulli and Gamma
#> Loss function type: likelihood; Loss function value: 68.1295
#> Persistence vector g:
#> alpha 
#>     0 
#> 
#> Sample size: 108
#> Number of estimated parameters: 5
#> Number of degrees of freedom: 103
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 293.5997 293.8304 307.0103 298.1863 
#> 
#> Forecast errors:
#> Asymmetry: -19.201%; sMSE: 21.708%; rRMSE: 0.794; sPIS: 405.811%; sCE: -51.375%

Finally, adam() is faster than es() function, because its code is more efficient and it uses a different optimisation algorithm with more finely tuned parameters by default. Let’s compare:

adamModel <- adam(AirPassengers, "CCC",
                  h=12, holdout=TRUE)
esModel <- es(AirPassengers, "CCC",
              h=12, holdout=TRUE)
"adam:"
#> [1] "adam:"
adamModel
#> Time elapsed: 2.23 seconds
#> Model estimated: ETS(CCC)
#> Loss function type: likelihood
#> 
#> Number of models combined: 30
#> Sample size: 132
#> Average number of estimated parameters: 18.204
#> Average number of degrees of freedom: 113.796
#> 
#> Forecast errors:
#> ME: 1.196; MAE: 15.085; RMSE: 21.536
#> sCE: 5.469%; sMAE: 5.747%; sMSE: 0.673%
#> MASE: 0.626; RMSSE: 0.687; rMAE: 0.198; rRMSE: 0.209
"es():"
#> [1] "es():"
esModel
#> Time elapsed: 2.14 seconds
#> Model estimated: ETS(CCC)
#> Loss function type: likelihood
#> 
#> Number of models combined: 30
#> Sample size: 132
#> Average number of estimated parameters: 18.0014
#> Average number of degrees of freedom: 113.9986
#> 
#> Forecast errors:
#> ME: 5.112; MAE: 17.588; RMSE: 23.73
#> sCE: 23.368%; sMAE: 6.7%; sMSE: 0.817%
#> MASE: 0.73; RMSSE: 0.757; rMAE: 0.231; rRMSE: 0.23

ADAM ARIMA

As mentioned above, ADAM does not only contain ETS, it also contains ARIMA model, which is regulated via orders parameter. If you want to have a pure ARIMA, you need to switch off ETS, which is done via model="NNN":

testModel <- adam(BJsales, "NNN", silent=FALSE, orders=c(0,2,2),
                  h=12, holdout=TRUE)
testModel
#> Time elapsed: 0.04 seconds
#> Model estimated using adam() function: ARIMA(0,2,2)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 240.5926
#> ARMA parameters of the model:
#>         Lag 1
#> MA(1) -0.7483
#> MA(2) -0.0144
#> 
#> Sample size: 138
#> Number of estimated parameters: 5
#> Number of degrees of freedom: 133
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 491.1852 491.6398 505.8215 506.9413 
#> 
#> Forecast errors:
#> ME: 2.958; MAE: 3.084; RMSE: 3.809
#> sCE: 15.615%; Asymmetry: 90.1%; sMAE: 1.357%; sMSE: 0.028%
#> MASE: 2.589; RMSSE: 2.483; rMAE: 0.995; rRMSE: 0.994

Given that both models are implemented in the same framework, they can be compared using information criteria.

The functionality of ADAM ARIMA is similar to the one of msarima function in smooth package, although there are several differences.

First, changing the distribution parameter will allow switching between additive / multiplicative models. For example, distribution="dlnorm" will create an ARIMA, equivalent to the one on logarithms of the data:

testModel <- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12),
                  orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dlnorm",
                  h=12, holdout=TRUE)
testModel
#> Time elapsed: 0.4 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,2)[12]
#> Distribution assumed in the model: Log-Normal
#> Loss function type: likelihood; Loss function value: 583.8752
#> ARMA parameters of the model:
#>        Lag 1  Lag 12
#> AR(1) 0.9112 -0.9926
#>         Lag 1  Lag 12
#> MA(1) -0.1469  0.0125
#> MA(2) -0.5459 -0.4798
#> 
#> Sample size: 132
#> Number of estimated parameters: 33
#> Number of degrees of freedom: 99
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 1233.750 1256.648 1328.883 1384.786 
#> 
#> Forecast errors:
#> ME: 1.908; MAE: 10.74; RMSE: 15.611
#> sCE: 8.723%; Asymmetry: 20.8%; sMAE: 4.091%; sMSE: 0.354%
#> MASE: 0.446; RMSSE: 0.498; rMAE: 0.141; rRMSE: 0.152

Second, if you want the model with intercept / drift, you can do it using constant parameter:

testModel <- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12), constant=TRUE,
                  orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dnorm",
                  h=12, holdout=TRUE)
testModel
#> Time elapsed: 0.36 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,2)[12] with drift
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 492.4511
#> Intercept/Drift value: 2.6411
#> ARMA parameters of the model:
#>         Lag 1  Lag 12
#> AR(1) -0.7057 -0.1664
#>         Lag 1 Lag 12
#> MA(1)  0.5066 0.1320
#> MA(2) -0.0358 0.0282
#> 
#> Sample size: 132
#> Number of estimated parameters: 34
#> Number of degrees of freedom: 98
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 1052.902 1077.438 1150.917 1210.820 
#> 
#> Forecast errors:
#> ME: -25.161; MAE: 25.212; RMSE: 31.03
#> sCE: -115.026%; Asymmetry: -98.7%; sMAE: 9.605%; sMSE: 1.397%
#> MASE: 1.047; RMSSE: 0.99; rMAE: 0.332; rRMSE: 0.301

If the model contains non-zero differences, then the constant acts as a drift. Third, you can specify parameters of ARIMA via the arma parameter in the following manner:

testModel <- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12),
                  orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dnorm",
                  arma=list(ar=c(0.1,0.1), ma=c(-0.96, 0.03, -0.12, 0.03)),
                  h=12, holdout=TRUE)
testModel
#> Time elapsed: 0.18 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,2)[12]
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 534.852
#> ARMA parameters of the model:
#>       Lag 1 Lag 12
#> AR(1)   0.1    0.1
#>       Lag 1 Lag 12
#> MA(1) -0.96  -0.12
#> MA(2)  0.03   0.03
#> 
#> Sample size: 132
#> Number of estimated parameters: 27
#> Number of degrees of freedom: 105
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 1123.704 1138.243 1201.540 1237.034 
#> 
#> Forecast errors:
#> ME: 9.575; MAE: 17.082; RMSE: 19.148
#> sCE: 43.773%; Asymmetry: 61.9%; sMAE: 6.508%; sMSE: 0.532%
#> MASE: 0.709; RMSSE: 0.611; rMAE: 0.225; rRMSE: 0.186

Finally, the initials for the states can also be provided, although getting the correct ones might be a challenging task (you also need to know how many of them to provide; checking testModel$initial might help):

testModel <- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12),
                  orders=list(ar=c(1,1),i=c(1,1),ma=c(2,0)), distribution="dnorm",
                  initial=list(arima=AirPassengers[1:24]),
                  h=12, holdout=TRUE)
testModel
#> Time elapsed: 0.28 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,0)[12]
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 491.2151
#> ARMA parameters of the model:
#>        Lag 1 Lag 12
#> AR(1) 0.2131 0.0479
#>         Lag 1
#> MA(1) -0.4438
#> MA(2)  0.0371
#> 
#> Sample size: 132
#> Number of estimated parameters: 31
#> Number of degrees of freedom: 101
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 1044.430 1064.270 1133.797 1182.234 
#> 
#> Forecast errors:
#> ME: -18.326; MAE: 19.56; RMSE: 24.974
#> sCE: -83.778%; Asymmetry: -92.4%; sMAE: 7.452%; sMSE: 0.905%
#> MASE: 0.812; RMSSE: 0.797; rMAE: 0.257; rRMSE: 0.243

If you work with ADAM ARIMA model, then there is no such thing as “usual” bounds for the parameters, so the function will use the bounds="admissible", checking the AR / MA polynomials in order to make sure that the model is stationary and invertible (aka stable).

Similarly to ETS, you can use different distributions and losses for the estimation. Note that the order selection for ARIMA is done in auto.adam() function, not in the adam()! However, if you do orders=list(..., select=TRUE) in adam(), it will call auto.adam() and do the selection.

Finally, ARIMA is typically slower than ETS, mainly because its initial states are more difficult to estimate due to an increased complexity of the model. If you want to speed things up, use initial="backcasting" and reduce the number of iterations via maxeval parameter.

ADAM ETSX / ARIMAX / ETSX+ARIMA

Another important feature of ADAM is introduction of explanatory variables. Unlike in es(), adam() expects a matrix for data and can work with a formula. If the latter is not provided, then it will use all explanatory variables. Here is a brief example:

BJData <- cbind(BJsales,BJsales.lead)
testModel <- adam(BJData, "AAN", h=18, silent=FALSE)