background preloader

Stats Books (Including R)

Facebook Twitter

16 Free Data Science Books. ONLINE OPEN-ACCESS TEXTBOOKS. Search form You are here Forecasting: principles and practice Rob J Hyndman George Athana­sopou­los Statistical foundations of machine learning Gianluca Bontempi Souhaib Ben Taieb Electric load forecasting: fundamentals and best practices Tao Hong.


Forecasting: Principles and Practice. Book: stats done wrong. Mining of Massive Datasets. The book has now been published by Cambridge University Press.

Mining of Massive Datasets

The publisher is offering a 20% discount to anyone who buys the hardcopy Here. By agreement with the publisher, you can still download it free from this page. Cambridge Press does, however, retain copyright on the work, and we expect that you will obtain their permission and acknowledge our authorship if you republish parts or all of it. We are sorry to have to mention this point, but we have evidence that other items we have published on the Web have been appropriated and republished under other names. It is easy to detect such misuse, by the way, as you will learn in Chapter 3. --- Jure Leskovec, Anand Rajaraman (@anand_raj), and Jeff Ullman Download Version 2.1 The following is the second edition of the book, which we expect to be published soon.

There is a revised Chapter 2 that treats map-reduce programming in a manner closer to how it is used in practice, rather than how it was described in the original paper. RAP Companion. Producing official statistics for publications is a key function of many teams across Government.

RAP Companion

It’s a time consuming and meticulous process to ensure that statistics are accurate and timely. With open source software becoming more widely used, there’s now a range of tools and techniques that can be used to reduce production time, whilst maintaining and even improving the quality of the publications. This book is about these techniques: what they are, and how we can use them.

Discovery Something is better than nothing. This is NOT TRUE in the agile world; consider the opportunity cost. Data Science at the Command Line. The Unix Workbench. Backtesting Strategies with R. This book is designed to not only produce statistics on many of the most common technical patterns in the stock market, but to show actual trades in such scenarios.

Backtesting Strategies with R

Test a strategy; reject if results are not promisingApply a range of parameters to strategies for optimizationAttempt to kill any strategy that looks promising. Let me explain that last one a bit. Just because you may find a strategy that seems to outperform the market, have good profit and low drawdown this doesn’t mean you’ve found a strategy to put to work. On the contrary, you must work to disprove it. Nothing is worse than putting a non-profitable strategy to work because it wasn’t rigurously tested. Preview: Intro to Reproducible Science in R. I’m pleased to share Part I of my new book “Introduction to Reproducible Science in R“.

Preview: Intro to Reproducible Science in R

The purpose of this book is to approach model development and software development holistically to help make science and research more reproducible. The need for such a book arose from observing some of the challenges that I’ve seen teaching graduate courses in natural language processing and machine learning, as well as training my own staff to become effective data scientists. While quantitative reasoning and mathematics are important, often I found that the primary obstacle to good data science was reproducibility and repeatability: it’s difficult to quickly reproduce someone else’s results. R for Data Science. One of the best ways to improve your reach as a data scientist is to write functions.

R for Data Science

Functions allow you to automate common tasks. Writing a function has three big advantages over using copy-and-paste: You drastically reduce the chances of making incidental mistakes when you copy and paste.As requirements change, you only need to update code in one place, instead of many.You can give a function an evocative name that makes your code easier to understand. Writing good functions is a lifetime journey. R for Data Science Solutions. Yet another study guide to ‘R for Data Science’ What They Forgot to Teach You About R. YaRrr! The Pirate’s Guide to R. R Programming for Data Science. Data science has taken the world by storm.

R Programming for Data Science

Every field of study and area of business has been affected as people increasingly realize the value of the incredible quantities of data being generated. But to extract value from those data, one needs to be trained in the proper data science skills. The R programming language has become the de facto programming language for data science. Its flexibility, power, sophistication, and expressiveness have made it an invaluable tool for data scientists around the world. Advanced R 2nd Edition. Advanced R Programming. Advanced R Solutions. Efficient R programming. Colin Gillespie is Senior lecturer (Associate professor) at Newcastle University, UK.

Efficient R programming

Mastering Software Development in R. The R Inferno. Advanced R Stats - Thesis: practical tools for exploring data and models. Practical tools for exploring data and models This thesis describes three families of tools for exploring data and models.

Thesis: practical tools for exploring data and models

It is organised in roughly the same way that you perform a data analysis. First, you get the data in a form that you can work with. Some Free R Books on CRAN. Digital History Methods in R. An R "meta" book. By Joseph Rickert I am a book person.

An R "meta" book

I collect books on all sorts of subjects that interest me and consequently I have a fairly extensive collection of R books, many of which I find to be of great value. Creating APIs in R with Plumber. R security practices. Geocomputation with R. Development Inspired by the bookdown R package we are developing this book in the open. We decided to make the book open source to encourage contributions, ensure reproducibility and provide access to the material as it evolves. We’re developing the book in 3 main phases. We’re in phase 1 and focussed on the first 5 main chapters, which we aim to be complete by September. Drafts of other chapters will be added to this website as the project progresses. The latest version is hosted at

The version of the book you are reading now was built on 2017-08-06 and was built on Travis. bookdown makes editing a book as easy as editing a wiki. To raise an issue about the book’s content (e.g. code not running) or make a feature request, check-out the issue tracker. Reproducibility. The caret Package. The caret package (short for _C_lassification _A_nd _RE_gression _T_raining) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for: data splittingpre-processingfeature selectionmodel tuning using resamplingvariable importance estimation as well as other functionality.

There are many different modeling functions in R. Some have different syntax for model training and/or prediction. The current release version can be found on CRAN and the project is hosted on github. Feature Engineering and Selection: A Practical Approach for Predictive Models. Notes to readers: A note to readers: this text is a work in progress. It will eventually be published in this format as well as a more traditional physical medium by Chapman & Hall/CRC. We’ve released this initial version to get more feedback beyond what our excellent reviewers and editor have already provided. Feedback can be given at the GitHub repo Copyediting has not been done yet so read at your own risk. Right now, we are primarily interested in the quality and organization of the content but are open to all of your thoughts. Hands-On Machine Learning with R. This book is sold by Taylor & Francis Group, who owns the copyright. The physical copies are available at Taylor & Francis and Amazon.

Welcome to Hands-On Machine Learning with R. This book provides hands-on modules for many of the most common machine learning methods to include: Generalized low rank modelsClustering algorithmsAutoencodersRegularized modelsRandom forestsGradient boosting machinesDeep neural networksStacking / super learnersand more! Little Book of R for Time Series.

An Intro to Statistical and Data Sciences via R. Happy Git and GitHub for the useR. Mastering Shiny. RMarkdown for Scientists. Ggplot2: Elegant Graphics for Data Analysis. Data Skills for Reproducible Science. Mastering Spark with R. R Cookbook, 2nd Edition. Mixed Models in R. Twitter for R programmers. JavaScript for R.