IM532 3.0 Applied Time Series ForecastingDynamic regression modelsDr. Thiyanga Talagala21 June 20201 / 22

Introduction

ARIMA models
ETS models
Dynamic regression models: How to include external variables

Regression model

$y_{t} = β_{0} + β_{1} x_{1, t} + . . . + β_{k} x_{k, t} + ϵ_{t}$

Allow the errors from a regression to contain autocorrelation.

$y_{t} = β_{0} + β_{1} x_{1, t} + . . . + β_{k} x_{k, t} + η_{t}$ $(1 - ϕ_{1} B) (1 - B) η_{t} = (1 + θ_{1} B) ϵ_{t}$

where $ϵ_{t}$ is a white noise series.

$η_{t}$ follows an $A R I M A (1, 1, 1)$

2 / 22

Stationary variables vs Nonstationary Variables

If all of the variables in the model are stationary, then we only need to consider ARMA errors for the residuals
Regression model with ARIMA errors is equivalent to a regression model in differences with ARMA errors.

$y_{t}^{'} = β_{0} + β_{1} x_{1, t}^{'} + . . . + β_{k} x_{k, t}^{'} + η_{t}^{'}$

$(1 - ϕ_{1} B) η_{t}^{'} = (1 + θ_{1} B) ϵ_{t}$

$y_{t}^{'} = y_{t} - y_{t - 1}$

$x_{t, i}^{'} = x_{t, i} - x_{t - 1, i}$

$η_{t}^{'} = η_{t} - η_{t - 1}$

3 / 22

Regression with ARIMA errors in R

To fit

$y_{t}^{'} = β_{1} x_{t}^{'} + η_{t}^{'},$

where $η_{t}^{'} = ϕ_{1} η_{t - 1}^{'} + ϵ_{t}$ is an $A R (1)$ error.

This is equivalent to

$y_{t} = β_{0} + β_{1} x_{t} + η_{t}$ where $η_{t}$ is an $A R I M A (1, 1, 0)$ error.

Constant term disappears due to the differencing.

fit <- Arima(y, xreg=x, order=c(1,1,0))

4 / 22

Using automated ARIMA algorithm

library(forecast)
library(fpp2)
head(uschange)

        Consumption     Income Production   Savings Unemployment
1970 Q1   0.6159862  0.9722610 -2.4527003 4.8103115          0.9
1970 Q2   0.4603757  1.1690847 -0.5515251 7.2879923          0.5
1970 Q3   0.8767914  1.5532705 -0.3587079 7.2890131          0.5
1970 Q4  -0.2742451 -0.2552724 -2.1854549 0.9852296          0.7
1971 Q1   1.8973708  1.9871536  1.9097341 3.6577706         -0.1
1971 Q2   0.9119929  1.4473342  0.9015358 6.0513418         -0.1

5 / 22

Using automated ARIMA algorithm

autoplot(uschange[,1:2], facets=TRUE) +
xlab("Year") + ylab("") +
ggtitle("Quarterly changes in US consumption
and personal income")

6 / 22

Using automated ARIMA algorithm

fit <- auto.arima(uschange[,"Consumption"],
xreg=uschange[,"Income"])
fit

Series: uschange[, "Consumption"] 
Regression with ARIMA(1,0,2) errors 
Coefficients:
         ar1      ma1     ma2  intercept    xreg
      0.6922  -0.5758  0.1984     0.5990  0.2028
s.e.  0.1159   0.1301  0.0756     0.0884  0.0461
sigma^2 estimated as 0.3219:  log likelihood=-156.95
AIC=325.91   AICc=326.37   BIC=345.29

The fitted model is

${^y}_{t} = 0.599 + 0.203 x_{t} + η_{t}$

$η_{t} = 0.692 η_{t - 1} + ϵ_{t} - 0.576 ϵ_{t - 1} + 0.198 ϵ_{t - 2}$

$ϵ_{t} \sim N I D (0, 0.322)$

7 / 22

plot $η_{t}$ and $ϵ_{t}$

residuals function can be used to extract $η_{t}$ and $ϵ_{t}$ .

library(magrittr)
cbind("Regression Errors" = residuals(fit, type="regression"),
"ARIMA errors" = residuals(fit, type="innovation")) %>%
autoplot(facets=TRUE)

8 / 22

It is the ARIMA errors that should resemble a white noise series.

checkresiduals(fit)


    Ljung-Box test
data:  Residuals from Regression with ARIMA(1,0,2) errors
Q* = 5.8916, df = 3, p-value = 0.117
Model df: 5.   Total lags used: 8

9 / 22

Calculate forecasts

We first need to forecast predictors

fcast <- forecast(fit, xreg=rep(mean(uschange[,2]),8))
autoplot(fcast) + xlab("Year") + ylab("Percentage change")

10 / 22

Forecasting electricity demand

Model daily electricity demand as a function of temperature using quadratic regression with ARMA errors.

11 / 22

Forecasting electricity demand

12 / 22

Forecasting electricity demand

xreg <- cbind(MaxTemp = elecdaily[, "Temperature"],
MaxTempSq = elecdaily[, "Temperature"]^2,
Workday = elecdaily[, "WorkDay"])
fit <- auto.arima(elecdaily[, "Demand"], xreg = xreg)
fit

Series: elecdaily[, "Demand"] 
Regression with ARIMA(2,1,2)(2,0,0)[7] errors 
Coefficients:
          ar1     ar2      ma1      ma2    sar1    sar2    drift  MaxTemp
      -0.0622  0.6731  -0.0234  -0.9301  0.2012  0.4021  -0.0191  -7.4996
s.e.   0.0714  0.0667   0.0413   0.0390  0.0533  0.0567   0.1091   0.4409
      MaxTempSq  Workday
         0.1789  30.5695
s.e.     0.0084   1.2891
sigma^2 estimated as 43.72:  log likelihood=-1200.7
AIC=2423.4   AICc=2424.15   BIC=2466.27

Forecasting

forecast(fit, xreg = cbind(20, 20^2, 1)) # Forecast one day ahead

         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
53.14286       185.4008 176.9271 193.8745 172.4414 198.3602

13 / 22

Forecasting electricity demand (cont.)

checkresiduals(fit)


    Ljung-Box test
data:  Residuals from Regression with ARIMA(2,1,2)(2,0,0)[7] errors
Q* = 28.229, df = 4, p-value = 1.121e-05
Model df: 10.   Total lags used: 14

14 / 22

Lagged predictors

$y_{t} = β_{0} + γ_{0} x_{t} + γ_{1} x_{t - 1} + . . . + γ_{k} x_{t - k} + η_{t}$

where $η_{t}$ is an $A R I M A$ process.

How to select $k$ ?

Use AICc along with the values of $p$ and $q$ for the ARIMA error.

15 / 22

autoplot(insurance, facets=TRUE) +
xlab("Year") + ylab("") +
ggtitle("Insurance advertising and quotations")

16 / 22

Lagged predictors

Advert <- cbind(
AdLag0 = insurance[,"TV.advert"],
AdLag1 = stats::lag(insurance[,"TV.advert"],-1),
AdLag2 = stats::lag(insurance[,"TV.advert"],-2),
AdLag3 = stats::lag(insurance[,"TV.advert"],-3)) %>%
head(NROW(insurance))
head(Advert)

           AdLag0   AdLag1   AdLag2   AdLag3
Jan 2002 7.212725       NA       NA       NA
Feb 2002 9.443570 7.212725       NA       NA
Mar 2002 7.534250 9.443570 7.212725       NA
Apr 2002 7.212725 7.534250 9.443570 7.212725
May 2002 9.443570 7.212725 7.534250 9.443570
Jun 2002 6.415215 9.443570 7.212725 7.534250

17 / 22

Lagged predictors (cont.)

fit1 <- auto.arima(insurance[4:40,1], xreg=Advert[4:40,1],
stationary=TRUE)
fit1[["aicc"]]

[1] 68.49968

fit2 <- auto.arima(insurance[4:40,1], xreg=Advert[4:40,1:2],
stationary=TRUE)
fit2[["aicc"]]

[1] 60.02357

fit3 <- auto.arima(insurance[4:40,1], xreg=Advert[4:40,1:3],
stationary=TRUE)
fit3[["aicc"]]

[1] 62.83253

fit4 <- auto.arima(insurance[4:40,1], xreg=Advert[4:40,1:4],
stationary=TRUE)
fit4[["aicc"]]

[1] 65.45747

18 / 22

Lagged predictors (cont.)

Re-estimate the model

fit <- auto.arima(insurance[,1], xreg=Advert[,1:2],
stationary=TRUE); fit

Series: insurance[, 1] 
Regression with ARIMA(3,0,0) errors 
Coefficients:
         ar1      ar2     ar3  intercept  AdLag0  AdLag1
      1.4117  -0.9317  0.3591     2.0393  1.2564  0.1625
s.e.  0.1698   0.2545  0.1592     0.9931  0.0667  0.0591
sigma^2 estimated as 0.2165:  log likelihood=-23.89
AIC=61.78   AICc=65.4   BIC=73.43

The fitted model is

$y_{t} = 2.039 + 1.25 x_{t} + 0.162 x_{t - 1} + η_{t}$ $η_{t} = 1.412 η_{t - 1} - 0.932 η_{t - 2} + 0.359 η_{t - 3} + ϵ_{t}$

$y_{t}$ - the number of quotations provided in month $t$

$x_{t}$ - the advertising expenditure in month $t$

$ϵ_{t}$ is white noise.

19 / 22

Generate forecasts

fc8 <- forecast(fit, h=20,
xreg=cbind(AdLag0 = rep(8,20),
AdLag1 = c(Advert[40,1], rep(8,19))))
autoplot(fc8) + ylab("Quotes") +
ggtitle("Forecast quotes with future advertising set to 8")

20 / 22

checkresiduals(fc8)


    Ljung-Box test
data:  Residuals from Regression with ARIMA(3,0,0) errors
Q* = 2.0324, df = 3, p-value = 0.5657
Model df: 6.   Total lags used: 9

21 / 22

Other approaches

Neural network models
Machine learning approaches
TBATS models
Forecasting with decomposition
Vector autoregressions

22 / 22

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Start & Stop the presentation timer

Reset the presentation timer

?, h

Toggle this help

IM532 3.0 Applied Time Series Forecasting

Dynamic regression models

Dr. Thiyanga Talagala

21 June 2020

Introduction

Regression model

Allow the errors from a regression to contain autocorrelation.

Stationary variables vs Nonstationary Variables

Regression with ARIMA errors in R

Using automated ARIMA algorithm

Using automated ARIMA algorithm

Using automated ARIMA algorithm

plot ηtηt and ϵtϵt

Calculate forecasts

Forecasting electricity demand

Forecasting electricity demand

Forecasting electricity demand

Forecasting

Forecasting electricity demand (cont.)

Lagged predictors

How to select kk ?

Lagged predictors

Lagged predictors (cont.)

Lagged predictors (cont.)

Generate forecasts

Other approaches

Introduction

Regression model

Allow the errors from a regression to contain autocorrelation.

Help

plot $η_{t}$ and $ϵ_{t}$

How to select $k$ ?