background preloader

Forecasting

Facebook Twitter

Forecasting: principles and practice. Welcome to our online textbook on forecasting.

Forecasting: principles and practice

This textbook is intended to provide a comprehensive introduction to forecasting methods and to present enough information about each method for readers to be able to use them sensibly. We don’t attempt to give a thorough discussion of the theoretical details behind each method, although the references at the end of each chapter will fill in many of those details. 13 Resources. Time series - AR(1) selection using sample ACF-PACF. Forecasting within limits. Forecasting within limits It is com­mon to want fore­casts to be pos­i­tive, or to require them to be within some spec­i­fied range .

Forecasting within limits

Both of these sit­u­a­tions are rel­a­tively easy to han­dle using transformations. Pos­i­tive forecasts To impose a pos­i­tiv­ity con­straint, sim­ply work on the log scale. . Interpreting noise. When watch­ing the TV news, or read­ing news­pa­per com­men­tary, I am fre­quently amazed at the attempts peo­ple make to inter­pret ran­dom noise.

Interpreting noise

For exam­ple, the lat­est tiny fluc­tu­a­tion in the share price of a major com­pany is attrib­uted to the CEO being ill. When the exchange rate goes up, the TV finance com­men­ta­tor con­fi­dently announces that it is a reac­tion to Chi­nese build­ing con­tracts. No one ever says “The unem­ploy­ment rate has dropped by 0.1% for no appar­ent reason.” Errors on percentage errors. The MAPE (mean absolute per­cent­age error) is a pop­u­lar mea­sure for fore­cast accu­racy and is defined as where denotes an obser­va­tion and denotes its fore­cast, and the mean is taken over Arm­strong (1985, p.348) was the first (to my knowl­edge) to point out the asym­me­try of the MAPE say­ing that “it has a bias favor­ing esti­mates that are below the actual val­ues”.

Errors on percentage errors

A few years later, Arm­strong and Col­lopy (1992) argued that the MAPE “puts a heav­ier penalty on fore­casts that exceed the actual than those that are less than the actual”. And , so that the rel­a­tive error is 50÷150=0.33, in con­trast to the sit­u­a­tion where , when the rel­a­tive error would be 50÷100=0.50. Modelling seasonal data with GAMs. In previous posts I have looked at how generalized additive models (GAMs) can be used to model non-linear trends in time series data.

Modelling seasonal data with GAMs

At the time a number of readers commented that they were interested in modelling data that had more than just a trend component; how do you model data collected throughout the year over many years with a GAM? In this post I will show one way that I have found particularly useful in my research. First an equation. Monthly seasonality. I regularly get asked why I don’t consider monthly seasonality in my models for daily or sub-daily time series.

Monthly seasonality

For example, this recent comment on my post on seasonal periods, or this comment on my post on daily data. The fact is, I’ve never seen a time series with monthly seasonality, although that does not mean it does not exist. Monthly seasonality would occur if there is some regular activity that takes place every month and which affects the time series. For example, some companies try to average their expenditure across the month and often have to spend more at the end of the month to justify the budget. So daily expenditure tends to increase at the end of each month, producing a monthly seasonal pattern. Or imagine a situation where a company always stocks up on supplies on the second Tuesday in every month. Rmnppt/FruitAndVeg - HTML - GitHub. Monitoring Count Time Series in R: Aberration Detection in Public Health Surveillance. Why time series forecasts prediction intervals aren't as good as we'd hope. Five different sources of error When it comes to time series forecasts from a statistical model we have five sources of error: Random individual errors Random estimates of parameters (eg the coefficients for each autoregressive term) Uncertain meta-parameters (eg number of autoregressive terms) Unsure if the model was right for the historical data Even given #4, unsure if the model will continue to be right A confidence interval is an estimate of the statistical uncertainty of the estimated parameters in the model.

Why time series forecasts prediction intervals aren't as good as we'd hope

It usually estimates the uncertainty source #2 above, not interested in #1 and conditional on the uncertainty of sources #3, #4 and #5 all being taken out of the picture. A prediction interval should ideally take all five sources into account (see Rob Hyndman for more on the distinction between prediction and confidence intervals). Here’s a simple simulation to show the cost of estimating the meta-parameters, even when sources of error #4 and #5 can be discounted. Code. Prophet.

Rolling Origins and Fama French. By Jonathan Regenstein Today, we continue our work on sampling so that we can run models on subsets of our data and then test the accuracy of the models on data not included in those subsets.

Rolling Origins and Fama French

In the machine learning prediction world, these two data sets are often called training data and testing data, but we’re not going to do any machine learning prediction today. Cross-validation for time series forecast. Time series cross-validation is important part of the toolkit for good evaluation of forecasting models. forecast::tsCV makes it straightforward to implement, even with different combinations of explanatory regressors in the different candidate models for evaluation.

Cross-validation for time series forecast

Suprious correlation between time series is a well documented and mocked problem, with Tyler Vigen’s educational website on the topic (“per capita cheese consumption correlated with number of people dying by becoming entangled in their bedsheets”) even spawning a whole book of humourous examples. Identifying genuinely-correlated series can be immensely helpful for time series forecasting.

Forecasting is hard, and experience generally shows that complex causal models don’t do as well as much simpler methods. However, a well chosen small set of “x regressors” can improve forecasting performance in many situations. I have been investigating one of those situations for a future blog post on forecasting unemployment rates.

SimITS. Tidy forecasting in R. The fable package for doing tidy forecasting in R is now on CRAN.

Tidy forecasting in R

Like tsibble and feasts, it is also part of the tidyverts family of packages for analysing, modelling and forecasting many related time series (stored as tsibbles). For a brief introduction to tsibbles, see this post from last month. Here we will forecast Australian tourism data by state/region and purpose. This data is stored in the tourism tsibble where Trips contains domestic visitor nights in thousands. library(tidyverse) library(tsibble) library(lubridate) library(fable) tourism There are 304 combinations of Region, State and Purpose, each one defining a time series of 80 observations. Fable: Tidy forecasting in R. Reintroducing tsibble: data tools that melt the clock. Reintroducing tsibble: data tools that melt the clock Preface I have introduced tsibble before in comparison with another package.

Now I’d like to reintroduce tsibble (bold for package) to you and highlight the role tsibble (italic for data structure) plays in tidy time series analysis. The development of the tsibble package has been taking place since July 2017, and v0.6.2 has landed on CRAN in mid-December. Yup, there have been 14 CRAN releases since the initial release, and it has evolved substantially. Motivation Figure 1: Can we flatten the lumpy path of converting raw temporal data to model-ready objects? If data comes with a time variable, it is referred to as “temporal data”. However, the temporal data pre-processing can be largely formalised, and the tools provided by tsibble do this, and more. Time series graphics using feasts. This is the second post on the new tidyverts packages for tidy time series analysis. The previous post is here. For users migrating from the forecast package, it might be useful to see how to get similar graphics to those they are used to.

The forecast package is built for ts objects, while the feasts package provides features, statistics and graphics for tsibbles. Tidyrisk - Making Quant Risk Tidy. Cleaning Anomalies to Reduce Forecast Error by 9% with anomalize. Written by Matt Dancho on September 30, 2019 In this tutorial, we’ll show how we used clean_anomalies() from the anomalize package to reduce forecast error by 9%. R Packages Covered: Forecasting the next decade in the stock market using time series models. This post will introduce one way of forecasting the stock index returns on the US market. Typically, single measures such as CAPE have been used to do this, but they lack accuracy compared to using many variables and can also have different relationships with returns on different markets.

Furthermore, it is possible to train different types of models and combine them to increase the accuracy even more, as is done in this post.We'll use a variety of time series models, with a goal of forecasting future returns for the S&P 500. The variable to be forecasted is the annual future ten-year return, and all of the used models except for ETS are dynamic, i.e. they also use some regressors such as valuation multiples, which are mostly the same ones as in this post.