background preloader

An online textbook by Rob J Hyndman and George Athanasopoulos

An online textbook by Rob J Hyndman and George Athanasopoulos
Welcome to our online textbook on forecasting. This textbook is intended to provide a comprehensive introduction to forecasting methods and to present enough information about each method for readers to be able to use them sensibly. We don’t attempt to give a thorough discussion of the theoretical details behind each method, although the references at the end of each chapter will fill in many of those details. For most sections, we only assume that readers are familiar with algebra, and high school mathematics should be sufficient background. At the end of each chapter we provide a list of “further reading”. We use R throughout the book and we intend students to learn how to forecast with R. It is free and online, making it accessible to a wide audience.It uses R, which is free, open-source, and extremely powerful software.It is continuously updated. Use the table of contents on the right to browse the book. Happy forecasting! Related:  Time SeriesForecasting

What is volatility? Some facts and some speculation. Definition Volatility is the annualized standard deviation of returns — it is often expressed in percent. A volatility of 20 means that there is about a one-third probability that an asset’s price a year from now will have fallen or risen by more than 20% from its present value. In R the computation, given a series of daily prices, looks like: sqrt(252) * sd(diff(log(priceSeriesDaily))) * 100 Usually — as here – log returns are used (though it is unlikely to make much difference). Historical estimation What frequency of returns should be used when estimating volatility? There is folklore that it is better to use monthly data than daily data because daily data is more noisy. However, this is finance, so things aren’t that easy. Another complication is if there are assets from around the globe. Through time Volatility would be more boring if finance were like other fields where standard deviations never change. But why? Across assets Implied volatility Not risk

Forecasting within limits Forecasting within limits It is com­mon to want fore­casts to be pos­i­tive, or to require them to be within some spec­i­fied range . Both of these sit­u­a­tions are rel­a­tively easy to han­dle using transformations. Pos­i­tive forecasts To impose a pos­i­tiv­ity con­straint, sim­ply work on the log scale. . Fore­casts con­strained to an interval To see how to han­dle data con­strained to an inter­val, imag­ine that the egg prices were con­strained to lie within and . to the whole real line: where is on the orig­i­nal scale and is the trans­formed data. The pre­dic­tion inter­vals from these trans­for­ma­tions have the same cov­er­age prob­a­bil­ity as on the trans­formed scale, because quan­tiles are pre­served under monot­o­n­i­cally increas­ing trans­for­ma­tions. Related Posts:

D3.js Resources to Level Up | Engineering Blog I have gotten a lot better at D3.js development over the past few years, and can trace most of my improvement to coming across a few key tutorials, blogs, books and other resources on the topic. They’ve been a huge help for me, and I’ve gathered a bunch of my favorites in this post to hopefully help others improve their D3 experience. Here it goes: Assessing your level First, let’s define four general D3.js levels: Complete Beginner: You have no previous experience with D3.js or any front end technologies (HTML/CSS).Basic: You have some HTML/CSS/JS skills and have played around with some D3.js examples, but don’t completely understand the patterns and mechanics it uses.Intermediate You know how to customize D3.js graphs using examples found in search engines, but you struggle to reuse them and aren’t quite happy with the quality of the code itself.Proficient: You have build a lot of different graphs, tests and integrated them with different technologies or libraries. Complete Beginner Books

Time Series Analysis | R Statistics.Net Any metric that is measured over time is a time series. It is of high importance because of industrial relevance especially w.r.t forecasting (demand, sales, supply etc). It can be broken down to its components so as to systematically forecast it. This is a beginners introduction to time series analysis, answering fundamental questions such as: what is a stationary time series, how to decompose it, how to de-trend, de-seasonalize a time series, what is auto correlation, etc. What is a Time Series ? Any metric that is measured over regular time intervals makes a Time Series. How To Create A Time Series In R ? Upon importing your data into R, use ts() function as follows. ts (inputData, frequency = 4, start = c(1959, 2)) # frequency 4 => Quarterly Data ts (1:10, frequency = 12, start = 1990) # freq 12 => Monthly data. Understanding Your Time Series For Additive Time Series, Yt = St + Tt + et For Multiplicative Time Series, Yt = St * Tt * et What Is A Stationary Time Series ?

Interpreting noise When watch­ing the TV news, or read­ing news­pa­per com­men­tary, I am fre­quently amazed at the attempts peo­ple make to inter­pret ran­dom noise. For exam­ple, the lat­est tiny fluc­tu­a­tion in the share price of a major com­pany is attrib­uted to the CEO being ill. When the exchange rate goes up, the TV finance com­men­ta­tor con­fi­dently announces that it is a reac­tion to Chi­nese build­ing con­tracts. No one ever says “The unem­ploy­ment rate has dropped by 0.1% for no appar­ent reason.” What is going on here is that the com­men­ta­tors are assum­ing we live in a noise-​​free world. The finance news Every night on the nightly TV news bul­letins, a sup­posed expert will go through the changes in share prices, stock prices indexes, cur­rency rates, and eco­nomic indi­ca­tors, from the past 24 hours. A good rule-​​of-​​thumb would be that the change should not be inter­preted unless it is at least in mag­ni­tude, where Sadly, that’s unlikely to hap­pen. Sea­son­ally adjusted data where

CSV To SQL Converter Convert CSV to SQL Use this tool to convert CSV to SQL statements. From CSV To CSV/Excel Data Tools What can this tool do? INSERT, UPDATE, DELETE, MERGE, and SELECT statements can be created. What are my options? You can specify which fields to include and specify the name of the field. Step 1: Select your input Option 1 - Choose a CSV file Option 2 - Enter an URL Option 3 - paste into Text Box below Input Records- Header: false Data: Separator: , Fields: 0 Records: 0 Step 2: Choose input options (optional) Step 3: Choose output options Step 4: Generate output .sql

R Video tutorial for Spatial Statistics: Introductory Time-Series analysis of US Environmental Protection Agency (EPA) pollution data Download EPA air pollution data The US Environmental Protection Agency (EPA) provides tons of free data about air pollution and other weather measurements through their website. An overview of their offer is available here: The data are provided in hourly, daily and annual averages for the following parameters: Ozone, SO2, CO,NO2, Pm 2.5 FRM/FEM Mass, Pm2.5 non FRM/FEM Mass, PM10, Wind, Temperature, Barometric Pressure, RH and Dewpoint, HAPs (Hazardous Air Pollutants), VOCs (Volatile Organic Compounds) and Lead. All the files are accessible from this page: The web links to download the zip files are very similar to each other, they have an initial starting URL: and then the name of the file has the following format: type_property_year.zip The type can be: hourly, daily or annual. data <- download.EPA(year=2013,property="ozone",type="daily")

Errors on percentage errors The MAPE (mean absolute per­cent­age error) is a pop­u­lar mea­sure for fore­cast accu­racy and is defined as where denotes an obser­va­tion and denotes its fore­cast, and the mean is taken over Arm­strong (1985, p.348) was the first (to my knowl­edge) to point out the asym­me­try of the MAPE say­ing that “it has a bias favor­ing esti­mates that are below the actual val­ues”. and , so that the rel­a­tive error is 50÷150=0.33, in con­trast to the sit­u­a­tion where , when the rel­a­tive error would be 50÷100=0.50. Thus, the MAPE puts a heav­ier penalty on neg­a­tive errors (when ) than on pos­i­tive errors. , so pos­i­tive errors arise only when the fore­cast is too small. To avoid the asym­me­try of the MAPE, Arm­strong (1985, p.348) pro­posed the “adjusted MAPE”, which he defined as By that def­i­n­i­tion, the adjusted MAPE can be neg­a­tive (if ), or infi­nite (if Of course, the true range of the adjusted MAPE is as is eas­ily seen by con­sid­er­ing the two cases , where , and let­ting . . , then .

21 tools that will help your remote team work better together - Page 20 of 20 Meldium Securely sharing passwords with people in your team across the Internet is no easy feat. Getting your team on Meldium means you have control over who has access to what and passwords are never exposed to team members. Meldium works with Internet Explorer, Firefox, Chrome, iOS and Android. ➤ Meldium Time series outlier detection (a simple R function) (By Andrea Venturini) Imagine you have a lot of time series – they may be short ones – related to a lot of different measures and very little time to find outliers. You need something not too sophisticated to solve quickly the mess. This is – very shortly speaking – the typical situation in which you can adopt washer.AV() function in R language. > dati phen time zone value 1 Temperature 1 a01 2.0 2 Temperature 1 a02 20.0 160 Rain 4 a20 8.5 The example of 20 meteorological stations measuring rainfall and temperature is useful to understand in which situation you can implement the washer() methodology. > out=washer.AV(dati) [1] phenomenon: 1 [1] phenomenon: 2 > out[out[,”test.AV”]>5,] fen t.2 series y.1 y.2 y.3 test.AV AV n median.AV mad.AV madindex.AV 18 Rain 2 a18 5.5 6.3 17.0 5.43 -22.2 20 7.580 5.49 36.58 38 Rain 3 a18 6.3 17.0 5.9 24.25 47.2 20 -4.978 2.15 14.34 59 Temperature 2 a19 22.0 21.0 9.0 5.25 10.7 20 0.000 2.04 13.63 79 Temperature 3 a19 21.0 9.0 18.0 14.92 -21.2 20 -0.917 1.36 9.07 1.

Modelling seasonal data with GAMs In previous posts I have looked at how generalized additive models (GAMs) can be used to model non-linear trends in time series data. At the time a number of readers commented that they were interested in modelling data that had more than just a trend component; how do you model data collected throughout the year over many years with a GAM? In this post I will show one way that I have found particularly useful in my research. First an equation. any trend or long term change in the level of the time series, andany seasonal or within-year variation, andany variation or interaction in the trend and seasonal features of the data, I’m not going to cover point 3 in this post, but it is a relatively simple extension to what I will discuss here. y = \beta_0 + f_{\mathrm{seasonal}}(x_1) + f_{\mathrm{trend}}(x_2) + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2\mathbf{\Lambda}) > mod <- gam(y ~ s(x1) + s(x2), data = foo) Data preparation and we are good to go. Load mgcv and fit the naive model

Programming for Data Science the Polyglot approach: Python + R + SQL Guest blog post by ajit jaokar In this post, I discuss a possible new approach to teaching Programming for Data Science. Programming for Data Science is focussed on the R vs. Python question. Everyone seems to have a view including the venerable Nature journal (Programming – Pick up Python). Here, I argue that we look beyond Python vs. On first impressions, this Polyglot approach (ability to master multiple languages) sounds complex. Why teach 3 languages together? Here is some background Outside of Data science, I also co-founded a social enterprise to teach Computer Science to kids Feynlabs. To learn programming for Data Science, it would thus help to build up from an existing foundation they are already familiar with and then co-relate new ideas to this foundation through other approaches. But first, we address what is the problem we are trying to solve and how that problem can be broken down Data Science – the problem we are trying to solve Tools, IDE and Packages Data management

Introducing practical and robust anomaly detection in a time series Both last year and this year, we saw a spike in the number of photos uploaded to Twitter on Christmas Eve, Christmas and New Year’s Eve (in other words, an anomaly occurred in the corresponding time series). Today, we’re announcing AnomalyDetection, our open-source R package that automatically detects anomalies like these in big data in a practical and robust way. Time series from Christmas Eve 2014 Time series from Christmas Eve 2013 Early detection of anomalies plays a key role in ensuring high-fidelity data is available to our own product teams and those of our data partners. This package helps us monitor spikes in user engagement on the platform surrounding holidays, major sporting events or during breaking news. Recently, we open-sourced BreakoutDetection, a complementary R package for automatic detection of one or more breakouts in time series. Broadly, an anomaly can be characterized in the following ways: How does the package work? This yields the following plot: Acknowledgements

Related: