Forecasting: principles and practice. Welcome to our online textbook on forecasting.
This textbook is intended to provide a comprehensive introduction to forecasting methods and to present enough information about each method for readers to be able to use them sensibly. We don’t attempt to give a thorough discussion of the theoretical details behind each method, although the references at the end of each chapter will fill in many of those details. 13 Resources. Time series - AR(1) selection using sample ACF-PACF. Forecasting within limits. Forecasting within limits It is common to want forecasts to be positive, or to require them to be within some specified range .
Both of these situations are relatively easy to handle using transformations. Positive forecasts To impose a positivity constraint, simply work on the log scale. . Interpreting noise. When watching the TV news, or reading newspaper commentary, I am frequently amazed at the attempts people make to interpret random noise.
For example, the latest tiny fluctuation in the share price of a major company is attributed to the CEO being ill. When the exchange rate goes up, the TV finance commentator confidently announces that it is a reaction to Chinese building contracts. No one ever says “The unemployment rate has dropped by 0.1% for no apparent reason.” Errors on percentage errors. The MAPE (mean absolute percentage error) is a popular measure for forecast accuracy and is defined as where denotes an observation and denotes its forecast, and the mean is taken over Armstrong (1985, p.348) was the first (to my knowledge) to point out the asymmetry of the MAPE saying that “it has a bias favoring estimates that are below the actual values”.
A few years later, Armstrong and Collopy (1992) argued that the MAPE “puts a heavier penalty on forecasts that exceed the actual than those that are less than the actual”. And , so that the relative error is 50÷150=0.33, in contrast to the situation where , when the relative error would be 50÷100=0.50. Modelling seasonal data with GAMs. In previous posts I have looked at how generalized additive models (GAMs) can be used to model non-linear trends in time series data.
At the time a number of readers commented that they were interested in modelling data that had more than just a trend component; how do you model data collected throughout the year over many years with a GAM? In this post I will show one way that I have found particularly useful in my research. First an equation. Monthly seasonality. I regularly get asked why I don’t consider monthly seasonality in my models for daily or sub-daily time series.
For example, this recent comment on my post on seasonal periods, or this comment on my post on daily data. The fact is, I’ve never seen a time series with monthly seasonality, although that does not mean it does not exist. Monthly seasonality would occur if there is some regular activity that takes place every month and which affects the time series. For example, some companies try to average their expenditure across the month and often have to spend more at the end of the month to justify the budget. So daily expenditure tends to increase at the end of each month, producing a monthly seasonal pattern. Or imagine a situation where a company always stocks up on supplies on the second Tuesday in every month. Rmnppt/FruitAndVeg - HTML - GitHub. Monitoring Count Time Series in R: Aberration Detection in Public Health Surveillance. Why time series forecasts prediction intervals aren't as good as we'd hope. Five different sources of error When it comes to time series forecasts from a statistical model we have five sources of error: Random individual errors Random estimates of parameters (eg the coefficients for each autoregressive term) Uncertain meta-parameters (eg number of autoregressive terms) Unsure if the model was right for the historical data Even given #4, unsure if the model will continue to be right A confidence interval is an estimate of the statistical uncertainty of the estimated parameters in the model.
It usually estimates the uncertainty source #2 above, not interested in #1 and conditional on the uncertainty of sources #3, #4 and #5 all being taken out of the picture. A prediction interval should ideally take all five sources into account (see Rob Hyndman for more on the distinction between prediction and confidence intervals). Here’s a simple simulation to show the cost of estimating the meta-parameters, even when sources of error #4 and #5 can be discounted. Code. Prophet.
Rolling Origins and Fama French. By Jonathan Regenstein Today, we continue our work on sampling so that we can run models on subsets of our data and then test the accuracy of the models on data not included in those subsets.
In the machine learning prediction world, these two data sets are often called training data and testing data, but we’re not going to do any machine learning prediction today. Cross-validation for time series forecast. Time series cross-validation is important part of the toolkit for good evaluation of forecasting models. forecast::tsCV makes it straightforward to implement, even with different combinations of explanatory regressors in the different candidate models for evaluation.
Suprious correlation between time series is a well documented and mocked problem, with Tyler Vigen’s educational website on the topic (“per capita cheese consumption correlated with number of people dying by becoming entangled in their bedsheets”) even spawning a whole book of humourous examples. Identifying genuinely-correlated series can be immensely helpful for time series forecasting.
Forecasting is hard, and experience generally shows that complex causal models don’t do as well as much simpler methods. However, a well chosen small set of “x regressors” can improve forecasting performance in many situations. I have been investigating one of those situations for a future blog post on forecasting unemployment rates.
SimITS. Tidy forecasting in R. The fable package for doing tidy forecasting in R is now on CRAN.
Like tsibble and feasts, it is also part of the tidyverts family of packages for analysing, modelling and forecasting many related time series (stored as tsibbles). For a brief introduction to tsibbles, see this post from last month. Here we will forecast Australian tourism data by state/region and purpose. This data is stored in the tourism tsibble where Trips contains domestic visitor nights in thousands. library(tidyverse) library(tsibble) library(lubridate) library(fable) tourism There are 304 combinations of Region, State and Purpose, each one defining a time series of 80 observations. Fable: Tidy forecasting in R. Reintroducing tsibble: data tools that melt the clock. Reintroducing tsibble: data tools that melt the clock Preface I have introduced tsibble before in comparison with another package.
Now I’d like to reintroduce tsibble (bold for package) to you and highlight the role tsibble (italic for data structure) plays in tidy time series analysis. The development of the tsibble package has been taking place since July 2017, and v0.6.2 has landed on CRAN in mid-December. Yup, there have been 14 CRAN releases since the initial release, and it has evolved substantially. Motivation Figure 1: Can we flatten the lumpy path of converting raw temporal data to model-ready objects? If data comes with a time variable, it is referred to as “temporal data”. However, the temporal data pre-processing can be largely formalised, and the tools provided by tsibble do this, and more. Time series graphics using feasts. This is the second post on the new tidyverts packages for tidy time series analysis. The previous post is here. For users migrating from the forecast package, it might be useful to see how to get similar graphics to those they are used to.
The forecast package is built for ts objects, while the feasts package provides features, statistics and graphics for tsibbles. Tidyrisk - Making Quant Risk Tidy. Cleaning Anomalies to Reduce Forecast Error by 9% with anomalize. Written by Matt Dancho on September 30, 2019 In this tutorial, we’ll show how we used clean_anomalies() from the anomalize package to reduce forecast error by 9%. R Packages Covered: Forecasting the next decade in the stock market using time series models. This post will introduce one way of forecasting the stock index returns on the US market. Typically, single measures such as CAPE have been used to do this, but they lack accuracy compared to using many variables and can also have different relationships with returns on different markets.
Furthermore, it is possible to train different types of models and combine them to increase the accuracy even more, as is done in this post.We'll use a variety of time series models, with a goal of forecasting future returns for the S&P 500. The variable to be forecasted is the annual future ten-year return, and all of the used models except for ETS are dynamic, i.e. they also use some regressors such as valuation multiples, which are mostly the same ones as in this post.