class: center, middle, inverse, title-slide # IM532 3.0 Applied Time Series Forecasting ## MSc in Industrial Mathematics ### Introduction to Time Series Analysis and Forecasting ### 1 March 2020 --- ## Time series - A time series is a sequence of observations taken sequentially in time. ## Time series data vs Cross sectional data .pull-left[ - Time series data <table> <thead> <tr> <th style="text-align:left;"> Year </th> <th style="text-align:left;"> Values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2012 </td> <td style="text-align:left;"> 120 </td> </tr> <tr> <td style="text-align:left;"> 2013 </td> <td style="text-align:left;"> 122 </td> </tr> <tr> <td style="text-align:left;"> 2014 </td> <td style="text-align:left;"> 140 </td> </tr> <tr> <td style="text-align:left;"> 2015 </td> <td style="text-align:left;"> 150 </td> </tr> </tbody> </table> - a set of observations, along with some information about what times those observations were recorded. - usually discrete and equally spaced time intervals. ] .pull-right[ - Cross-sectional data <table> <thead> <tr> <th style="text-align:left;"> ID </th> <th style="text-align:left;"> Values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 200 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 350 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 480 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 250 </td> </tr> </tbody> </table> - observations that come from different individuals or groups at a single point in time. ] --- ## Deterministic vs Non-deterministic time series .pull-left[ - **Deterministic time series:** future values can be exactly determined by using some mathematical function. ![](timeseries1_files/figure-html/unnamed-chunk-3-1.png)<!-- --> `$$y_t = cos(2\pi t)$$` ] .pull-right[ - **Non-deterministic time series:** future values can be determined only in terms of a probability distribution. ![](timeseries1_files/figure-html/unnamed-chunk-4-1.png)<!-- --> ] --- ## Stochastic processes "A statistical phenomenon that evolves in time according to probabilistic laws is called a stochastic process." (Box, George EP, et al. Time series analysis: forecasting and control.) -- ## Non-deterministic time series or statistical time series A sample realization from an infinite population of time series that could have been generated by a stochastic process. <table> <thead> <tr> <th style="text-align:left;"> Description </th> <th style="text-align:left;"> Notation </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Sequence of random Variables </td> <td style="text-align:left;"> `\(\{X_1, X_2...X_t...X_T\}\)` </td> </tr> <tr> <td style="text-align:left;"> Realized values of the random Variable </td> <td style="text-align:left;"> `\(\{x_1, x_2,...x_t...x_T\}\)` </td> </tr> <tr> <td style="text-align:left;"> Examples of realizations </td> <td style="text-align:left;"> {20, 40, 20..., 190} </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:left;"> {15, 20, 100..., 490} </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:left;"> {15, 44, 39, ..., 329} </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:left;"> ... </td> </tr> </tbody> <tfoot> <tr> <td style = 'padding: 0; border:0;' colspan='100%'><sup>a</sup> T - length of the time series, t - time index</td> </tr> </tfoot> </table> --- ## Time series to be analyzed .pull-left[ ``` Time Series: Start = c(2008, 51) End = c(2014, 17) Frequency = 52 [1] 15 44 39 57 53 29 45 47 34 28 26 27 25 28 23 20 31 57 [19] 75 78 94 88 158 110 209 166 288 153 161 203 163 110 106 94 78 61 [37] 28 58 51 53 80 72 68 54 94 88 64 38 85 48 24 124 126 176 [55] 137 128 97 81 145 136 72 92 65 65 43 16 8 25 42 47 58 44 [73] 61 85 152 49 145 213 308 334 257 158 223 223 270 171 134 111 50 68 [91] 25 56 51 60 49 25 22 26 19 21 14 35 23 18 38 77 72 60 [109] 72 77 82 78 101 77 73 69 81 99 54 46 116 138 157 112 108 250 [127] 235 99 384 306 475 301 161 341 277 130 149 135 97 178 25 143 118 92 [145] 66 88 126 128 25 44 91 162 188 281 168 338 207 219 242 232 232 215 [163] 171 128 94 156 75 126 33 99 42 47 131 38 0 0 64 69 67 70 [181] 0 72 91 159 99 0 206 128 136 297 152 78 91 107 106 126 144 105 [199] 220 55 45 0 36 51 70 127 148 135 95 127 186 109 162 113 75 172 [217] 102 133 124 80 100 77 113 62 59 60 112 72 55 91 94 169 130 41 [235] 231 232 218 184 154 180 144 219 132 109 108 112 57 48 164 145 177 139 [253] 123 215 254 256 249 329 244 193 299 200 142 118 143 182 136 144 69 128 [271] 60 71 52 42 159 115 146 154 329 ``` ] .pull-right[ ![](timeseries1_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ] - The observed time series or time series to be analyzed is **a particular realization of a stochastic process**. --- ## Set of numbers or set of images taken sequentially in time ![](figsat.png) --- ## Time series forecasting ![](timeseries1_files/figure-html/unnamed-chunk-8-1.png)<!-- --> ## Types of methods - Qualitative forecast - Quantitative forecast --- # Basic steps in a forecasting task - Problem definition - Collect data - Data visualization - Modelling - Evaluate the fitted model --- ## Frequency of a time series: Seasonal periods - Frequency: number of observation per natural time interval of measurement (usually year, but sometimes a week, a day or an hour) <table> <thead> <tr> <th style="text-align:left;"> Data </th> <th style="text-align:left;"> Frequency </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Annual </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> Quarterly </td> <td style="text-align:left;"> 4 </td> </tr> <tr> <td style="text-align:left;"> Monthly </td> <td style="text-align:left;"> 12 </td> </tr> <tr> <td style="text-align:left;"> Weekly </td> <td style="text-align:left;"> 52 or 52.18 </td> </tr> </tbody> </table> - Multiple frequency setting <table> <thead> <tr> <th style="text-align:left;"> Data </th> <th style="text-align:left;"> Minute </th> <th style="text-align:left;"> Hour </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Week </th> <th style="text-align:left;"> Year </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Daily </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> 365.25 </td> </tr> <tr> <td style="text-align:left;"> Hourly </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> 24 </td> <td style="text-align:left;"> 168 </td> <td style="text-align:left;"> 8766 </td> </tr> <tr> <td style="text-align:left;"> Half-Hourly </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> 48 </td> <td style="text-align:left;"> 336 </td> <td style="text-align:left;"> 17532 </td> </tr> <tr> <td style="text-align:left;"> Minutes </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> 60 </td> <td style="text-align:left;"> 1440 </td> <td style="text-align:left;"> 10080 </td> <td style="text-align:left;"> 525960 </td> </tr> <tr> <td style="text-align:left;"> Seconds </td> <td style="text-align:left;"> 60 </td> <td style="text-align:left;"> 3600 </td> <td style="text-align:left;"> 86400 </td> <td style="text-align:left;"> 604800 </td> <td style="text-align:left;"> 31557600 </td> </tr> </tbody> </table> --- .pull-left[ ## Monthly time series ![](timeseries1_files/figure-html/unnamed-chunk-11-1.png)<!-- --> - Length of the series: 72 - Monthly seasonality ] .pull-right[ ## Half-hourly Time Series ![](timeseries1_files/figure-html/unnamed-chunk-12-1.png)<!-- --> - Length of the series: 4032 - Daily seasonality and weekly seasonality ] -- Note: Monthly seasonality with high-frequency data (daily, hourly, etc.) is tricky due to variable month lengths. You can't specify that using seasonal periods. It could be possibly handled using a dummy variable. --- class: duke-orange # Your turn - What are the frequencies for a monthly time series with semi-annual and annual pattern? --- # `ts` object in R - Annual time series <table> <thead> <tr> <th style="text-align:left;"> Year </th> <th style="text-align:left;"> Values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2012 </td> <td style="text-align:left;"> 120 </td> </tr> <tr> <td style="text-align:left;"> 2013 </td> <td style="text-align:left;"> 122 </td> </tr> <tr> <td style="text-align:left;"> 2014 </td> <td style="text-align:left;"> 140 </td> </tr> <tr> <td style="text-align:left;"> 2015 </td> <td style="text-align:left;"> 150 </td> </tr> </tbody> </table> ```r y <- ts(c(120, 122, 140, 150), start=2012) y ``` ``` Time Series: Start = 2012 End = 2015 Frequency = 1 [1] 120 122 140 150 ``` --- # `ts` object in R - Quarterly time series <table> <thead> <tr> <th style="text-align:left;"> Quarter </th> <th style="text-align:left;"> Values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2012-Q1 </td> <td style="text-align:left;"> 120 </td> </tr> <tr> <td style="text-align:left;"> 2012-Q2 </td> <td style="text-align:left;"> 122 </td> </tr> <tr> <td style="text-align:left;"> 2012-Q3 </td> <td style="text-align:left;"> 140 </td> </tr> <tr> <td style="text-align:left;"> 2012-Q4 </td> <td style="text-align:left;"> 150 </td> </tr> <tr> <td style="text-align:left;"> 2013-Q1 </td> <td style="text-align:left;"> 200 </td> </tr> </tbody> </table> ```r y <- ts(c(120, 122, 140, 150, 200), start=c(2012, 1), frequency = 4) y ``` ``` Qtr1 Qtr2 Qtr3 Qtr4 2012 120 122 140 150 2013 200 ``` --- # `ts` object in R - Monthly time series <table> <thead> <tr> <th style="text-align:left;"> Month </th> <th style="text-align:left;"> Values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2012-Jan </td> <td style="text-align:left;"> 120 </td> </tr> <tr> <td style="text-align:left;"> 2012-Feb </td> <td style="text-align:left;"> 122 </td> </tr> <tr> <td style="text-align:left;"> 2012-March </td> <td style="text-align:left;"> 140 </td> </tr> <tr> <td style="text-align:left;"> 2012-April </td> <td style="text-align:left;"> 150 </td> </tr> </tbody> </table> ```r y <- ts(c(120, 122, 140, 150), start=c(2012, 1), frequency = 12) y ``` ``` Jan Feb Mar Apr 2012 120 122 140 150 ``` --- # `ts` object in R - Weekly time series <table> <thead> <tr> <th style="text-align:left;"> Week </th> <th style="text-align:left;"> Values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2012-W1 </td> <td style="text-align:left;"> 120 </td> </tr> <tr> <td style="text-align:left;"> 2012-W2 </td> <td style="text-align:left;"> 122 </td> </tr> <tr> <td style="text-align:left;"> 2012-W3 </td> <td style="text-align:left;"> 140 </td> </tr> <tr> <td style="text-align:left;"> 2012-W4 </td> <td style="text-align:left;"> 150 </td> </tr> </tbody> </table> ```r y <- ts(c(120, 122, 140, 150), start=c(2012, 1), frequency = 52) y ``` ``` Time Series: Start = c(2012, 1) End = c(2012, 4) Frequency = 52 [1] 120 122 140 150 ``` --- ## Time series plots .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> Year </th> <th style="text-align:left;"> Values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2012 </td> <td style="text-align:left;"> 120 </td> </tr> <tr> <td style="text-align:left;"> 2013 </td> <td style="text-align:left;"> 122 </td> </tr> <tr> <td style="text-align:left;"> 2014 </td> <td style="text-align:left;"> 140 </td> </tr> <tr> <td style="text-align:left;"> 2015 </td> <td style="text-align:left;"> 150 </td> </tr> <tr> <td style="text-align:left;"> 2016 </td> <td style="text-align:left;"> 200 </td> </tr> <tr> <td style="text-align:left;"> 2017 </td> <td style="text-align:left;"> 250 </td> </tr> </tbody> </table> ```r y <- ts(c(120, 122, 140, 150, 200, 250), start=2012) y ``` ``` Time Series: Start = 2012 End = 2017 Frequency = 1 [1] 120 122 140 150 200 250 ``` ```r class(y) ``` ``` [1] "ts" ``` ] .pull-right[ ```r autoplot(y) ``` ![](timeseries1_files/figure-html/unnamed-chunk-23-1.png)<!-- --> ] --- ## Add title and labels .pull-left[ ```r autoplot(y) ``` ![](timeseries1_files/figure-html/unnamed-chunk-24-1.png)<!-- --> ] .pull-right[ ```r autoplot(y)+ylab("Number of sales")+ xlab("Year")+ ggtitle("Time series plot of sales from 2012 to 2017") ``` ![](timeseries1_files/figure-html/unnamed-chunk-25-1.png)<!-- --> ] --- class: duke-orange ## Your turn Create plots of the following time series: dengue counts in Gampaha (Use `mozzie` package), a10 series (`fpp2` package). Use help() to find out about the data in each series. Modify the axes labels and title. --- # Time series patterns ### Trend - Long-term increase or decrease in the data. ### Seasonal - A seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). Seasonality is always of a **fixed** and **known period**. Hence, seasonal time series are sometimes called periodic time series. - Period is unchanging and associated with some aspect of the calendar. ### Cyclic - A cyclic pattern exists when data exhibit rises and falls that are not of fixed period. The duration of these fluctuations is usually of at least 2 years. In general, - the average length of cycles is longer than the length of a seasonal pattern. - the magnitude of cycles tends to be more variable than the magnitude of seasonal patterns. --- ## Cyclic pattern ![](timeseries1_files/figure-html/unnamed-chunk-26-1.png)<!-- --> --- ## Cyclic and seasonal pattern ![](timeseries1_files/figure-html/unnamed-chunk-27-1.png)<!-- --> --- ## Multiple seasonal pattern ![](timeseries1_files/figure-html/unnamed-chunk-28-1.png)<!-- --> --- ## Seasonal pattern ![](timeseries1_files/figure-html/unnamed-chunk-29-1.png)<!-- --> --- ## Trend ![](timeseries1_files/figure-html/unnamed-chunk-30-1.png)<!-- --> --- ## Trend and seasonal ![](timeseries1_files/figure-html/unnamed-chunk-31-1.png)<!-- --> --- ## What about this? - Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989. ![](timeseries1_files/figure-html/unnamed-chunk-32-1.png)<!-- --> --- ## What about this? ![](timeseries1_files/figure-html/unnamed-chunk-33-1.png)<!-- --> --- ## Autocorrelation function: ACF - **correlation** measures the strength of linear relationship between two variables. - **autocorrelation** measures the strength of linear relationship between lagged values of a time series. .pull-left[ ## Lagged values <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> `$$t$$` </th> <th style="text-align:center;"> `$$Y_t$$` </th> <th style="text-align:center;"> `$$Y_{t-1}$$` </th> <th style="text-align:center;"> `$$Y_{t-2}$$` </th> <th style="text-align:center;"> `$$Y_{t-3}$$` </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> NA </td> <td style="text-align:center;"> NA </td> <td style="text-align:center;"> NA </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> NA </td> <td style="text-align:center;"> NA </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 8 </td> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> NA </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 8 </td> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 3 </td> </tr> <tr> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 8 </td> <td style="text-align:center;"> 9 </td> </tr> <tr> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 8 </td> </tr> </tbody> </table> ] .pull-right[ ## Autocorrelation coefficient `$$r_k=\frac{\sum_{t=k+1}^{T}(y_t-\bar{y})(y_{t-k}-\bar{y})}{\sum_{t=1}^T(y_t-\bar{y})^2}$$` measures the linear relationship between `\(y_t\)` and `\(y_{t-k}\)`. ] --- ## Quarterly beer production .pull-left[ Time series plot ![](timeseries1_files/figure-html/unnamed-chunk-35-1.png)<!-- --> ] .pull-right[ Correlogram ![](timeseries1_files/figure-html/unnamed-chunk-36-1.png)<!-- --> ] ``` Autocorrelations of series 'beer', by lag 0 1 2 3 4 5 6 7 8 9 10 1.000 -0.102 -0.657 -0.060 0.869 -0.089 -0.635 -0.054 0.832 -0.108 -0.574 11 12 13 14 15 16 17 18 -0.055 0.774 -0.080 -0.568 -0.066 0.706 -0.065 -0.528 ``` --- .pull-left[ Time series plot ![](timeseries1_files/figure-html/unnamed-chunk-38-1.png)<!-- --> ] .pull-right[ Correlogram ![](timeseries1_files/figure-html/unnamed-chunk-39-1.png)<!-- --> ] ``` Autocorrelations of series 'a', by lag 0 1 2 3 4 5 6 7 8 9 10 1.000 0.044 -0.063 -0.184 -0.084 0.219 -0.121 -0.084 -0.165 -0.008 -0.082 11 12 13 14 15 16 -0.094 0.155 0.096 -0.024 -0.098 -0.001 ``` ** This is a white noise process. It is a time series of iid data.** --- ## Sampling distribution of autocorrelations - Sampling distribution of `\(r_k\)` for white noise data is asymptotically `\(N(0,1/T)\)`. - 95% confidence bands: `\(±1.96/\sqrt{T}\)` ## The autocorrelation plot can provide answers to the following questions: - Are the data random? - Is the observed time series white noise? - Does the time series contain strong seasonal patterns? - Does the time series contain a trend? --- # Trend .pull-left[ ![](timeseries1_files/figure-html/unnamed-chunk-41-1.png)<!-- --> ] .pull-right[ ![](timeseries1_files/figure-html/unnamed-chunk-42-1.png)<!-- --> ] --- # Seasonal and Trend .pull-left[ ![](timeseries1_files/figure-html/unnamed-chunk-43-1.png)<!-- --> ] .pull-right[ ![](timeseries1_files/figure-html/unnamed-chunk-44-1.png)<!-- --> ] --- # Is this a white noise process? .pull-left[ ![](timeseries1_files/figure-html/unnamed-chunk-45-1.png)<!-- --> ] .pull-right[ ![](timeseries1_files/figure-html/unnamed-chunk-46-1.png)<!-- --> ] --- class: duke-orange, center, middle # Your turn --- ## Which is which? ![](acf.png) Reference: Forecasting: Principles and Practice, Hyndman & Athanasopoulos (3rd ed., 2020) --- ## Portmanteau tests for autocorrelation - Consider the first `\(h\)` autocorrelation values together. ## Box-Pierce test The test statistics of the Box-Pierce test is `$$Q = T\sum_{k=1}^hr_k^2$$` where `\(h\)` is the maximum lag being considered and `\(T\)` is the number of observations. ## Ljung-Box test The test statistics of the Ljung-Box test test is `$$Q^* = T(T+2)\sum_{k=1}^{h}(T-k)^{-1}r_k^2$$` - The both `\(Q\)` and `\(Q^*\)` follows a `\(\chi^2\)` distribution with `\((h-K)\)` degrees of freedom, where `\(K\)` is the number of parameters in the model. If `\(Q\)` and `\(Q*\)` are calculated from raw data (rather than the residuals from a model), then set `\(K=0\)`. --- ## Portmanteau tests for autocorrelation H0: Data are not serially correlated. H1: Data are serially correlated. ```r set.seed(132020) a <- ts(rnorm(20), frequency = 1) #WN process Box.test(a, lag=10, fitdf=0, type="Lj") ``` ``` Box-Ljung test data: a X-squared = 4.061, df = 10, p-value = 0.9446 ``` ```r x <- 1:20 y <- 5*x+2 b <- ts(y, frequency = 1) #trend Box.test(b, lag=10, fitdf=0, type="Lj") ``` ``` Box-Ljung test data: b X-squared = 48.713, df = 10, p-value = 4.598e-07 ``` --- .pull-left[ ```r set.seed(132020) a <- ts(rnorm(20), frequency = 1) #WN process autoplot(a) ``` ![](timeseries1_files/figure-html/unnamed-chunk-49-1.png)<!-- --> ] .pull-right[ ```r x <- 1:20 y <- 5*x+2 b <- ts(y, frequency = 1) #trend autoplot(b) ``` ![](timeseries1_files/figure-html/unnamed-chunk-50-1.png)<!-- --> ] --- # Some simple forecasting methods ## Notation Training set: `\(\{y_1, y_2, y_3, ..., y_T\}\)` Test set: `\(\{y_{T+1}, y_{T+2}, y_{T+3}, ..., y_{T+h}, ...\}\)` Forecasts: `\(\{\hat{y}_{T+1|T}, \hat{y}_{T+2|T}, \hat{y}_{T+3|T}, ...\}\)` --- ## Average method - All future values are equal to the average (mean). `$$\hat{y}_{T+h|T} = \bar{y}=(y_1 + y_2 + ... + y_T)/T$$` ```r y <- ts(c(10, 20, 50, 70, 60, 40), frequency = 1) y ``` ``` Time Series: Start = 1 End = 6 Frequency = 1 [1] 10 20 50 70 60 40 ``` ```r meanf(y, h=2) ``` ``` Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 7 41.66667 4.736783 78.59655 -22.65498 105.9883 8 41.66667 4.736783 78.59655 -22.65498 105.9883 ``` --- ## Naive method/ random walk - Simply set all forecasts to be the value of the last observation. `$$\hat{y}_{T+h|T} = y_T$$` ```r y <- ts(c(10, 20, 50, 70, 60, 40), frequency = 1) y ``` ``` Time Series: Start = 1 End = 6 Frequency = 1 [1] 10 20 50 70 60 40 ``` ```r naive(y, h=2) ``` ``` Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 7 40 15.017961 64.98204 1.793268 78.20673 8 40 4.670061 75.32994 -14.032478 94.03248 ``` ```r rwf(y, h=2) ``` ``` Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 7 40 15.017961 64.98204 1.793268 78.20673 8 40 4.670061 75.32994 -14.032478 94.03248 ``` --- ## Seasonal naive method set each forecast to be equal to the last observed value from the same season of the year (e.g., this year sales forecasts for the month of December is set to what was sold in the previous year in the month of December). `$$\hat{y}_{T+h|T} = y_{T+h-m(k+1)}$$` `\(m\)` - the seasonal period `\(k\)` - the integer part of `\((h-1)/m\)` (i.e., the number of complete years in the forecast period prior to time `\(T+h\)`. ```r y <- ts(c(10, 20, 50, 70, 60, 40, 60, 100), frequency = 4) y ``` ``` Qtr1 Qtr2 Qtr3 Qtr4 1 10 20 50 70 2 60 40 60 100 ``` ```r snaive(y, h=2) ``` ``` Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 3 Q1 60 19.98356519 100.01643 -1.199856 121.1999 3 Q2 40 -0.01643481 80.01643 -21.199856 101.1999 ``` --- ## Drift method `$$\hat{y}_{T+h|T} = y_T + h\frac{y_T-y_1}{T-1}$$` - Similar to fitting a line between the first and last observations, and extrapolating it into the future `$$\frac{y-y_N}{x-x_N}=\frac{y_2-y_1}{x_2-x_1}$$` ```r y <- ts(c(10, 20, 50, 70, 60, 40), frequency = 1) rwf(y, h=2, drift=TRUE) ``` ``` Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 7 46 19.42518 72.57482 5.357322 86.64268 8 52 10.83047 93.16953 -10.963366 114.96337 ``` --- ## Comparison between methods ![](timeseries1_files/figure-html/unnamed-chunk-55-1.png)<!-- --> --- ## Training and test sets **Training set/ in-sample data:** is used to estimate any parameters of a forecasting method (80% of the total sample). **Test set/ out-of-sample data/ hold-out set:** evaluate the accuracy of the forecasts (20% of the total sample or at least as large as the length of the forecast horizon). ![](traintest.png) --- ## Functions to subset a time series ```r tsexample <- ts(c(10, 20, 50, 60, 70, 80, 90, 100, 150, 160, 180, 200), frequency = 4, start = c(2015,1)) tsexample ``` ``` Qtr1 Qtr2 Qtr3 Qtr4 2015 10 20 50 60 2016 70 80 90 100 2017 150 160 180 200 ``` ```r window(tsexample, start=c(2016, 2)) ``` ``` Qtr1 Qtr2 Qtr3 Qtr4 2016 80 90 100 2017 150 160 180 200 ``` ```r subset(tsexample, start=length(tsexample)-2*4) ``` ``` Qtr1 Qtr2 Qtr3 Qtr4 2015 60 2016 70 80 90 100 2017 150 160 180 200 ``` --- ## Functions to subset a time series (cont.) ```r subset(tsexample, quarter =2) ``` ``` Time Series: Start = 2015.25 End = 2017.25 Frequency = 1 [1] 20 80 160 ``` ```r head(tsexample, 2) ``` ``` Qtr1 Qtr2 2015 10 20 ``` ```r tail(tsexample, 2) ``` ``` Qtr3 Qtr4 2017 180 200 ``` --- ## Error measures error (unpredictable part of an observation): difference between an observed value and its forecast. `$$e_{T+h} = y_{T+h}-\hat{y}_{T+h|T}$$` ## Mean absolute error: MAE `$$MAE = mean(|e_t|)$$` ## Root mean squared error `$$RMSE = \sqrt{mean(e_t^2)}$$` ## Percentage errors `$$MAPE = mean(|100e_t/y_t|)$$` ## symmetric MAPE `$$sMAPE = mean(200|y_t-\hat{y}_t|/(y_t+\hat{y}_t))$$` --- ## Scaled errors `$$q_j = \frac{e_j}{\frac{1}{T-m}\sum_{t={m+1}}^T|y_t-y_{t-m}|}$$` `$$MASE = mean(|q_j|)$$` --- ## Forecast accuracy calculation ```r moz <- ts(mozzie$Colombo, frequency = 52, start=c(2008, 52)) moz_training <- window(moz, start=c(2008, 52), end=c(2013, 52)) moz_test <- window(moz, start=c(2014, 1)) length(moz_test) ``` ``` [1] 18 ``` ```r fit1 <- meanf(moz_training,h=18) fit2 <- rwf(moz_training,h=18) fit3 <- snaive(moz_training,h=18) ``` ![](timeseries1_files/figure-html/unnamed-chunk-62-1.png)<!-- --> --- ## forecast accuracy ```r accuracy(fit1, moz_test)[,c("RMSE", "MAE", "MAPE", "MASE")] # Mean ``` ``` RMSE MAE MAPE MASE Training set 80.17061 62.46083 Inf 0.8098711 Test set 67.34743 49.25032 45.59053 0.6385828 ``` ```r accuracy(fit2, moz_test)[,c("RMSE", "MAE", "MAPE", "MASE")] # Naive ``` ``` RMSE MAE MAPE MASE Training set 62.14487 42.98462 Inf 0.5573413 Test set 178.40715 169.55556 191.3185 2.1984683 ``` ```r accuracy(fit3, moz_test)[,c("RMSE", "MAE", "MAPE", "MASE")] # Seasonal naive ``` ``` RMSE MAE MAPE MASE Training set 103.27783 77.12440 Inf 1.0000000 Test set 72.34639 55.22222 44.38492 0.7160149 ``` --- class: center, middle All rights reserved by [Thiyanga S. Talagala](https://thiyanga.netlify.com/)