IM532 3.0 Applied Time Series Forecasting

# IM532 3.0 Applied Time Series Forecasting
## MSc in Industrial Mathematics
### Introduction to Time Series Analysis and Forecasting
### 1 March 2020

---

## Time series

- A time series is a sequence of observations taken sequentially in time.

## Time series data vs Cross sectional data

- Time series data

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Year </th>
   <th style="text-align:left;"> Values </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 2012 </td>
   <td style="text-align:left;"> 120 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2013 </td>
   <td style="text-align:left;"> 122 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2014 </td>
   <td style="text-align:left;"> 140 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2015 </td>
   <td style="text-align:left;"> 150 </td>
  </tr>
</tbody>
</table>

- a set of observations, along with some information about what times those observations were recorded.

- usually discrete and equally spaced time intervals.

]

- Cross-sectional data

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> ID </th>
   <th style="text-align:left;"> Values </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:left;"> 200 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2 </td>
   <td style="text-align:left;"> 350 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 3 </td>
   <td style="text-align:left;"> 480 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 4 </td>
   <td style="text-align:left;"> 250 </td>
  </tr>
</tbody>
</table>

- observations that come from different individuals or groups at a single point in time.

]
---

## Deterministic  vs Non-deterministic time series

.pull-left[
- **Deterministic time series:** future values can be exactly determined by using some mathematical function.

![](timeseries1_files/figure-html/unnamed-chunk-3-1.png)

`$$y_t = cos(2\pi t)$$`

]

.pull-right[
- **Non-deterministic time series:** future values can be determined only in terms of a probability distribution.

![](timeseries1_files/figure-html/unnamed-chunk-4-1.png)

]

---

## Stochastic processes

"A statistical phenomenon that evolves in time according to probabilistic laws is called a stochastic process." (Box, George EP, et al. Time series analysis: forecasting and control.)

--
 
##  Non-deterministic time series or statistical time series

A sample realization from an infinite population of time series that could have been generated by a stochastic process.

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Description </th>
   <th style="text-align:left;"> Notation </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Sequence of random Variables </td>
   <td style="text-align:left;"> `$\{X_1, X_2...X_t...X_T\}$` </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Realized values of the random Variable </td>
   <td style="text-align:left;"> `$\{x_1, x_2,...x_t...x_T\}$` </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Examples of realizations </td>
   <td style="text-align:left;"> {20, 40, 20..., 190} </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> {15, 20, 100..., 490} </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> {15, 44, 39, ..., 329} </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> ... </td>
  </tr>
</tbody>
<tfoot>
<tr>
<td style = 'padding: 0; border:0;' colspan='100%'><sup>a</sup> T - length of the time series, t - time index</td>
</tr>
</tfoot>
</table>

---

## Time series to be analyzed

```
Time Series:
Start = c(2008, 51) 
End = c(2014, 17) 
Frequency = 52 
  [1]  15  44  39  57  53  29  45  47  34  28  26  27  25  28  23  20  31  57
 [19]  75  78  94  88 158 110 209 166 288 153 161 203 163 110 106  94  78  61
 [37]  28  58  51  53  80  72  68  54  94  88  64  38  85  48  24 124 126 176
 [55] 137 128  97  81 145 136  72  92  65  65  43  16   8  25  42  47  58  44
 [73]  61  85 152  49 145 213 308 334 257 158 223 223 270 171 134 111  50  68
 [91]  25  56  51  60  49  25  22  26  19  21  14  35  23  18  38  77  72  60
[109]  72  77  82  78 101  77  73  69  81  99  54  46 116 138 157 112 108 250
[127] 235  99 384 306 475 301 161 341 277 130 149 135  97 178  25 143 118  92
[145]  66  88 126 128  25  44  91 162 188 281 168 338 207 219 242 232 232 215
[163] 171 128  94 156  75 126  33  99  42  47 131  38   0   0  64  69  67  70
[181]   0  72  91 159  99   0 206 128 136 297 152  78  91 107 106 126 144 105
[199] 220  55  45   0  36  51  70 127 148 135  95 127 186 109 162 113  75 172
[217] 102 133 124  80 100  77 113  62  59  60 112  72  55  91  94 169 130  41
[235] 231 232 218 184 154 180 144 219 132 109 108 112  57  48 164 145 177 139
[253] 123 215 254 256 249 329 244 193 299 200 142 118 143 182 136 144  69 128
[271]  60  71  52  42 159 115 146 154 329
```

]

- The observed time series or time series to be analyzed is **a particular realization of a stochastic process**.
---

## Set of numbers or set of images taken sequentially in time

![](figsat.png)
---

## Time series forecasting

![](timeseries1_files/figure-html/unnamed-chunk-8-1.png)

## Types of methods

- Qualitative forecast

- Quantitative forecast

---

# Basic steps in a forecasting task

- Problem definition

- Collect data

- Data visualization

- Modelling

- Evaluate the fitted model

---

## Frequency of a time series: Seasonal periods

- Frequency: number of observation per natural time interval of measurement (usually year,  but sometimes a week, a day or an hour)

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Data </th>
   <th style="text-align:left;"> Frequency </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Annual </td>
   <td style="text-align:left;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Quarterly </td>
   <td style="text-align:left;"> 4 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Monthly </td>
   <td style="text-align:left;"> 12 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weekly </td>
   <td style="text-align:left;"> 52 or 52.18 </td>
  </tr>
</tbody>
</table>

- Multiple frequency setting

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Data </th>
   <th style="text-align:left;"> Minute </th>
   <th style="text-align:left;"> Hour </th>
   <th style="text-align:left;"> Day </th>
   <th style="text-align:left;"> Week </th>
   <th style="text-align:left;"> Year </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Daily </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> 7 </td>
   <td style="text-align:left;"> 365.25 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Hourly </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> 24 </td>
   <td style="text-align:left;"> 168 </td>
   <td style="text-align:left;"> 8766 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Half-Hourly </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> 48 </td>
   <td style="text-align:left;"> 336 </td>
   <td style="text-align:left;"> 17532 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Minutes </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> 60 </td>
   <td style="text-align:left;"> 1440 </td>
   <td style="text-align:left;"> 10080 </td>
   <td style="text-align:left;"> 525960 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Seconds </td>
   <td style="text-align:left;"> 60 </td>
   <td style="text-align:left;"> 3600 </td>
   <td style="text-align:left;"> 86400 </td>
   <td style="text-align:left;"> 604800 </td>
   <td style="text-align:left;"> 31557600 </td>
  </tr>
</tbody>
</table>

---
.pull-left[

## Monthly time series

![](timeseries1_files/figure-html/unnamed-chunk-11-1.png)

- Length of the series: 72

- Monthly seasonality

]

## Half-hourly Time Series

![](timeseries1_files/figure-html/unnamed-chunk-12-1.png)

- Length of the series: 4032

- Daily seasonality and weekly seasonality

]

--
Note: Monthly seasonality with high-frequency data (daily, hourly, etc.) is tricky due to variable month lengths. You can't specify that using seasonal periods. It could be possibly handled using a dummy variable.

---

# Your turn

- What are the frequencies for a monthly time series with semi-annual and annual pattern?

---

# `ts` object in R

- Annual time series

```r
y <- ts(c(120, 122, 140, 150), start=2012)
y
```

```
Time Series:
Start = 2012 
End = 2015 
Frequency = 1 
[1] 120 122 140 150
```

---

# `ts` object in R

- Quarterly time series

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Quarter </th>
   <th style="text-align:left;"> Values </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 2012-Q1 </td>
   <td style="text-align:left;"> 120 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2012-Q2 </td>
   <td style="text-align:left;"> 122 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2012-Q3 </td>
   <td style="text-align:left;"> 140 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2012-Q4 </td>
   <td style="text-align:left;"> 150 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2013-Q1 </td>
   <td style="text-align:left;"> 200 </td>
  </tr>
</tbody>
</table>

```r
y <- ts(c(120, 122, 140, 150, 200), start=c(2012, 1), frequency = 4)
y
```

```
     Qtr1 Qtr2 Qtr3 Qtr4
2012  120  122  140  150
2013  200               
```

---
# `ts` object in R

- Monthly time series

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Month </th>
   <th style="text-align:left;"> Values </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 2012-Jan </td>
   <td style="text-align:left;"> 120 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2012-Feb </td>
   <td style="text-align:left;"> 122 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2012-March </td>
   <td style="text-align:left;"> 140 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2012-April </td>
   <td style="text-align:left;"> 150 </td>
  </tr>
</tbody>
</table>

```r
y <- ts(c(120, 122, 140, 150), start=c(2012, 1), frequency = 12)
y
```

```
     Jan Feb Mar Apr
2012 120 122 140 150
```

---
# `ts` object in R

- Weekly time series

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Week </th>
   <th style="text-align:left;"> Values </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 2012-W1 </td>
   <td style="text-align:left;"> 120 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2012-W2 </td>
   <td style="text-align:left;"> 122 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2012-W3 </td>
   <td style="text-align:left;"> 140 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2012-W4 </td>
   <td style="text-align:left;"> 150 </td>
  </tr>
</tbody>
</table>

```r
y <- ts(c(120, 122, 140, 150), start=c(2012, 1), frequency = 52)
y
```

```
Time Series:
Start = c(2012, 1) 
End = c(2012, 4) 
Frequency = 52 
[1] 120 122 140 150
```

---

## Time series plots

```r
y <- ts(c(120, 122, 140, 150, 200, 250), start=2012)
y
```

```
Time Series:
Start = 2012 
End = 2017 
Frequency = 1 
[1] 120 122 140 150 200 250
```

```r
class(y)
```

```
[1] "ts"
```

]

```r
autoplot(y)
```

![](timeseries1_files/figure-html/unnamed-chunk-23-1.png)

]

---

## Add title and labels

```r
autoplot(y)
```

![](timeseries1_files/figure-html/unnamed-chunk-24-1.png)

]

```r
autoplot(y)+ylab("Number of sales")+
  xlab("Year")+
  ggtitle("Time series plot of sales from 2012 to 2017")
```

![](timeseries1_files/figure-html/unnamed-chunk-25-1.png)

]

---

## Your turn

Create plots of the following time series: dengue counts in Gampaha (Use `mozzie` package), a10 series (`fpp2` package).

Use help() to find out about the data in each series.

Modify the axes labels and title.

---

# Time series patterns

### Trend

- Long-term increase or decrease in the data.

### Seasonal

- A seasonal pattern exists when a series is influenced by seasonal factors (e.g.,
the quarter of the year, the month, or day of the week). Seasonality is always
of a **fixed** and **known period**. Hence, seasonal time series are sometimes called
periodic time series.

- Period is unchanging and associated with some aspect of the
calendar.

### Cyclic

- A cyclic pattern exists when data exhibit rises and falls that are not of fixed
period. The duration of these fluctuations is usually of at least 2 years.

In general,

- the average length of cycles is longer than the length of a seasonal pattern.

- the magnitude of cycles tends to be more variable than the magnitude of seasonal patterns.
  
---

## Cyclic pattern

![](timeseries1_files/figure-html/unnamed-chunk-26-1.png)
  
---

## Cyclic and seasonal pattern

![](timeseries1_files/figure-html/unnamed-chunk-27-1.png)

---

## Multiple seasonal pattern

![](timeseries1_files/figure-html/unnamed-chunk-28-1.png)
  
---

## Seasonal pattern

![](timeseries1_files/figure-html/unnamed-chunk-29-1.png)
---

## Trend

![](timeseries1_files/figure-html/unnamed-chunk-30-1.png)

---

## Trend and seasonal

![](timeseries1_files/figure-html/unnamed-chunk-31-1.png)

---

## What about this?

- Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989.

![](timeseries1_files/figure-html/unnamed-chunk-32-1.png)

---

## What about this?

![](timeseries1_files/figure-html/unnamed-chunk-33-1.png)

---

## Autocorrelation function: ACF

- **correlation** measures the strength of linear relationship between two variables.

- **autocorrelation** measures the strength of linear relationship between lagged values of a time series.

]

## Autocorrelation coefficient

`$$r_k=\frac{\sum_{t=k+1}^{T}(y_t-\bar{y})(y_{t-k}-\bar{y})}{\sum_{t=1}^T(y_t-\bar{y})^2}$$`

measures the linear relationship between `$y_t$` and `$y_{t-k}$`.

]

---
## Quarterly beer production

Time series plot

![](timeseries1_files/figure-html/unnamed-chunk-35-1.png)

]

Correlogram

![](timeseries1_files/figure-html/unnamed-chunk-36-1.png)

]

```

Autocorrelations of series 'beer', by lag

0      1      2      3      4      5      6      7      8      9     10 
 1.000 -0.102 -0.657 -0.060  0.869 -0.089 -0.635 -0.054  0.832 -0.108 -0.574 
    11     12     13     14     15     16     17     18 
-0.055  0.774 -0.080 -0.568 -0.066  0.706 -0.065 -0.528 
```

---

Time series plot

![](timeseries1_files/figure-html/unnamed-chunk-38-1.png)

]

Correlogram

![](timeseries1_files/figure-html/unnamed-chunk-39-1.png)

]

```

Autocorrelations of series 'a', by lag

0      1      2      3      4      5      6      7      8      9     10 
 1.000  0.044 -0.063 -0.184 -0.084  0.219 -0.121 -0.084 -0.165 -0.008 -0.082 
    11     12     13     14     15     16 
-0.094  0.155  0.096 -0.024 -0.098 -0.001 
```

** This is a white noise process. It is a time series of iid data.**

---

## Sampling distribution of autocorrelations

- Sampling distribution of `$r_k$` for white noise data is
asymptotically `$N(0,1/T)$`.

- 95% confidence bands: `$±1.96/\sqrt{T}$`

## The autocorrelation plot can provide answers to the following questions:

- Are the data random?
    
  - Is the observed time series white noise?
    
  - Does the time series contain strong seasonal patterns?
    
  - Does the time series contain a trend?

---

# Trend

![](timeseries1_files/figure-html/unnamed-chunk-41-1.png)

]

![](timeseries1_files/figure-html/unnamed-chunk-42-1.png)

]

---

# Seasonal and Trend

![](timeseries1_files/figure-html/unnamed-chunk-43-1.png)

]

![](timeseries1_files/figure-html/unnamed-chunk-44-1.png)

]

---

# Is this a white noise process?

![](timeseries1_files/figure-html/unnamed-chunk-45-1.png)

]
.pull-right[

![](timeseries1_files/figure-html/unnamed-chunk-46-1.png)

]
---

# Your turn

---
## Which is which?

![](acf.png)

Reference: Forecasting: Principles and Practice, Hyndman & Athanasopoulos (3rd ed., 2020)

---

## Portmanteau tests for autocorrelation

- Consider the first `$h$` autocorrelation values together.

## Box-Pierce test

The test statistics of the Box-Pierce test is

`$$Q = T\sum_{k=1}^hr_k^2$$`

where `$h$` is the maximum lag being considered and `$T$` is the number of observations.

## Ljung-Box test

The test statistics of the Ljung-Box test test is

`$$Q^* = T(T+2)\sum_{k=1}^{h}(T-k)^{-1}r_k^2$$`

- The both `$Q$` and `$Q^*$` follows a `$\chi^2$` distribution with `$(h-K)$` degrees of freedom, where `$K$` is the number of parameters in the model. If `$Q$` and `$Q*$` are calculated from raw data (rather than the residuals from a model), then set `$K=0$`.

---

## Portmanteau tests for autocorrelation

H0: Data are not serially correlated.

H1: Data are serially correlated.

```r
set.seed(132020)
a <- ts(rnorm(20), frequency = 1) #WN process
Box.test(a, lag=10, fitdf=0, type="Lj")
```

```

Box-Ljung test

data:  a
X-squared = 4.061, df = 10, p-value = 0.9446
```

```r
x <- 1:20
y <- 5*x+2
b <- ts(y, frequency = 1) #trend
Box.test(b, lag=10, fitdf=0, type="Lj")
```

```

Box-Ljung test

data:  b
X-squared = 48.713, df = 10, p-value = 4.598e-07
```

---

```r
set.seed(132020)
a <- ts(rnorm(20), frequency = 1) #WN process
autoplot(a)
```

![](timeseries1_files/figure-html/unnamed-chunk-49-1.png)

]

```r
x <- 1:20
y <- 5*x+2
b <- ts(y, frequency = 1) #trend
autoplot(b)
```

![](timeseries1_files/figure-html/unnamed-chunk-50-1.png)
]

---

# Some simple forecasting methods

## Notation

Training set: `$\{y_1, y_2, y_3, ..., y_T\}$`

Test set: `$\{y_{T+1}, y_{T+2}, y_{T+3}, ..., y_{T+h}, ...\}$`

Forecasts: `$\{\hat{y}_{T+1|T}, \hat{y}_{T+2|T}, \hat{y}_{T+3|T}, ...\}$`
---

## Average method

- All future values are equal to the average (mean).

`$$\hat{y}_{T+h|T} = \bar{y}=(y_1 + y_2 + ... + y_T)/T$$`

```r
y <- ts(c(10, 20, 50, 70, 60, 40), frequency = 1)
y
```

```
Time Series:
Start = 1 
End = 6 
Frequency = 1 
[1] 10 20 50 70 60 40
```

```r
meanf(y, h=2)
```

```
  Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
7       41.66667 4.736783 78.59655 -22.65498 105.9883
8       41.66667 4.736783 78.59655 -22.65498 105.9883
```

---
## Naive method/ random walk

- Simply set all forecasts to be the value of the last observation.

`$$\hat{y}_{T+h|T} = y_T$$`

```r
y <- ts(c(10, 20, 50, 70, 60, 40), frequency = 1)
y
```

```
Time Series:
Start = 1 
End = 6 
Frequency = 1 
[1] 10 20 50 70 60 40
```

```r
naive(y, h=2) 
```

```
  Point Forecast     Lo 80    Hi 80      Lo 95    Hi 95
7             40 15.017961 64.98204   1.793268 78.20673
8             40  4.670061 75.32994 -14.032478 94.03248
```

```r
rwf(y, h=2)
```

```
  Point Forecast     Lo 80    Hi 80      Lo 95    Hi 95
7             40 15.017961 64.98204   1.793268 78.20673
8             40  4.670061 75.32994 -14.032478 94.03248
```

---
## Seasonal naive method

set each forecast to be equal to the last observed value from the same season of the year (e.g., this year sales forecasts for the month of December is set to what was sold in the previous year in the month of December).

`$$\hat{y}_{T+h|T} = y_{T+h-m(k+1)}$$`
`$m$` - the seasonal period

`$k$` - the integer part of  `$(h-1)/m$` (i.e., the number of complete years in the forecast period prior to time  `$T+h$`.

```r
y <- ts(c(10, 20, 50, 70, 60, 40, 60, 100), frequency = 4)
y
```

```
  Qtr1 Qtr2 Qtr3 Qtr4
1   10   20   50   70
2   60   40   60  100
```

```r
snaive(y, h=2)
```

```
     Point Forecast       Lo 80     Hi 80      Lo 95    Hi 95
3 Q1             60 19.98356519 100.01643  -1.199856 121.1999
3 Q2             40 -0.01643481  80.01643 -21.199856 101.1999
```

---

## Drift method

`$$\hat{y}_{T+h|T} = y_T + h\frac{y_T-y_1}{T-1}$$`

- Similar to fitting a line between the first and last observations, and extrapolating it into the future

`$$\frac{y-y_N}{x-x_N}=\frac{y_2-y_1}{x_2-x_1}$$`

```r
y <- ts(c(10, 20, 50, 70, 60, 40), frequency = 1)
rwf(y, h=2, drift=TRUE)
```

```
  Point Forecast    Lo 80    Hi 80      Lo 95     Hi 95
7             46 19.42518 72.57482   5.357322  86.64268
8             52 10.83047 93.16953 -10.963366 114.96337
```

---

## Comparison between methods

![](timeseries1_files/figure-html/unnamed-chunk-55-1.png)

---

## Training and test sets

**Training set/ in-sample data:** is used to estimate any parameters of a forecasting method (80% of the total sample).

**Test set/ out-of-sample data/ hold-out set:** evaluate the accuracy of the forecasts (20% of the total sample or at least as large as the length of the forecast horizon).

![](traintest.png)

---

## Functions to subset a time series

```r
tsexample <- ts(c(10, 20, 50, 60, 70, 80, 90, 100, 150, 160, 180, 200), frequency = 4, start = c(2015,1))
tsexample
```

```
     Qtr1 Qtr2 Qtr3 Qtr4
2015   10   20   50   60
2016   70   80   90  100
2017  150  160  180  200
```

```r
window(tsexample, start=c(2016, 2))
```

```
     Qtr1 Qtr2 Qtr3 Qtr4
2016        80   90  100
2017  150  160  180  200
```

```r
subset(tsexample, start=length(tsexample)-2*4)
```

```
     Qtr1 Qtr2 Qtr3 Qtr4
2015                  60
2016   70   80   90  100
2017  150  160  180  200
```

---

## Functions to subset a time series (cont.)

```r
subset(tsexample, quarter =2)
```

```
Time Series:
Start = 2015.25 
End = 2017.25 
Frequency = 1 
[1]  20  80 160
```

```r
head(tsexample, 2)
```

```
     Qtr1 Qtr2
2015   10   20
```

```r
tail(tsexample, 2)
```

```
     Qtr3 Qtr4
2017  180  200
```

---

## Error measures

error (unpredictable part of an observation): difference between an observed value and its forecast.

`$$e_{T+h} = y_{T+h}-\hat{y}_{T+h|T}$$`

## Mean absolute error: MAE

`$$MAE = mean(|e_t|)$$`

## Root mean squared error

`$$RMSE = \sqrt{mean(e_t^2)}$$`

## Percentage errors

`$$MAPE = mean(|100e_t/y_t|)$$`

## symmetric MAPE

`$$sMAPE = mean(200|y_t-\hat{y}_t|/(y_t+\hat{y}_t))$$`

---

## Scaled errors

`$$q_j = \frac{e_j}{\frac{1}{T-m}\sum_{t={m+1}}^T|y_t-y_{t-m}|}$$`

`$$MASE = mean(|q_j|)$$`

---

## Forecast accuracy calculation

```r
moz <- ts(mozzie$Colombo, frequency = 52, start=c(2008, 52))
moz_training <- window(moz, start=c(2008, 52), end=c(2013, 52))
moz_test <- window(moz, start=c(2014, 1))
length(moz_test)
```

```
[1] 18
```

```r
fit1 <- meanf(moz_training,h=18)
fit2 <- rwf(moz_training,h=18)
fit3 <- snaive(moz_training,h=18)
```

![](timeseries1_files/figure-html/unnamed-chunk-62-1.png)

---

## forecast accuracy

```r
accuracy(fit1, moz_test)[,c("RMSE", "MAE", "MAPE", "MASE")] # Mean
```

```
                 RMSE      MAE     MAPE      MASE
Training set 80.17061 62.46083      Inf 0.8098711
Test set     67.34743 49.25032 45.59053 0.6385828
```

```r
accuracy(fit2, moz_test)[,c("RMSE", "MAE", "MAPE", "MASE")] # Naive
```

```
                  RMSE       MAE     MAPE      MASE
Training set  62.14487  42.98462      Inf 0.5573413
Test set     178.40715 169.55556 191.3185 2.1984683
```

```r
accuracy(fit3, moz_test)[,c("RMSE", "MAE", "MAPE", "MASE")] # Seasonal naive
```

```
                  RMSE      MAE     MAPE      MASE
Training set 103.27783 77.12440      Inf 1.0000000
Test set      72.34639 55.22222 44.38492 0.7160149
```

---
class: center, middle