Finding the Best Subset of a GAM using Tabu Search and Visualizing It in R
The famous probabilist and statistician Persi Diaconis wrote an article not too long ago about the "Markov chain Monte Carlo (MCMC) Revolution." The paper describes how we are able to solve a diverse set of problems with MCMC. The first example he gives is a text decryption problem solved with a simple Metropolis Hastings sampler. I was always stumped by those cryptograms in the newspaper and thought it would be pretty cool if I could crack them with statistics. So I decided to try it out on my own. The example Diaconis gives is fleshed out in more details by its original authors in its own article. The decryption I will be attempting is called substitution cipher, where each letter of the alphabet corresponds to another letter (possibly the same one). The strategy is to use a reference text to create transition probabilities from each letter to the next. To create a transition matrix, I downloaded War and Peace from Project Gutenberg. Created by Pretty R at inside-R.org

Brandon Foltz, M.Ed.
This is the first video in what will be, or is (depending on when you are watching this) a multipart video series about Simple Linear Regression. In the next few minutes we will cover the basics of Simple Linear Regression starting at square one. And for the record, from now on if I say "regression" I am referring to simple linear regression as opposed to multiple regression or models that are not linear. Regression allows us to model, mathematically, the relationship between two or more variables. For now, we will be working with just two variables; an independent variable and a dependent variable. So in this video, we are going to talk about that idea. So if you are new to Regression or are still trying to figure out exactly what it even IS…this video is for you. So sit back, relax, and let's go ahead and get to work. For my complete video library organized by playlist, please go to my video page here:

Formulae in R: ANOVA and other models, mixed and fixed | Just the kind of thing you were expecting
R’s formula interface is sweet but sometimes confusing. ANOVA is seldom sweet and almost always confusing. And random (a.k.a. mixed) versus fixed effects decisions seem to hurt peoples’ heads too. In the following, assume that Y is a dependent variable and A, B, C, etc. are predictors, all contained in data frame d. Formula Recap If you use R then you probably already know this, but let’s recap anyway. lm(Y ~ A + B, data=d) Interactions are expressed succinctly with the asterisk lm(Y ~ A * B, data=d) or equivalently but more explicitly by specifying component parts using the colon notation, like lm(Y ~ A + B + A:B, data=d) This is useful for more complex interaction structures, e.g. lm(Y ~ A * B * C, data=d) which contains all main effects, all two way interactions, and a three way interaction. lm(Y ~ A + B + C + A:B + A:C + B:C, data=d) is the same except for having no three way interaction. lm(Y ~ (A + B + C)**2, data=d) lm(Y ~ (A + B + C)^2, data=d) lm(Y ~ A + B + B**2, data=d) Classical ANOVA

Simply Statistics
Using Google Analytics with R - ThinkToStart
For the most part, SMB’s tend to utilize free analytics solutions like Google Analytics for their web and digital strategy. A powerful platform in its own right, it can be combined with the R to create custom visualizations, deep dives into data, and statistical inferences. This article will focus on the usage of R and the Google Analytics API. We will go over connecting to the API, querying data and making a quick time series graph of a metric. To make an API call, you’ll need two things. A Client ID and a Secret ID. Login to your GA analytics accountGo to the Google Developers page: a New Project and enable the Google Analytics APIOn the Credentials screen (under the API’s and auth menu), create a new Client ID for Application Type “Installed Application”Copy the Client ID and Client Secret In R (I’ll be using RStudio), load the necessary packages: library(ggplot2) library(RGoogleAnalytics) library(scales)

New R package for K-S goodness-of-fit tests
This is a re-post from the R packages mailing list Greetings, We wanted to announce a new R package ‘KScorrect’ that carries out the Lilliefors correction to the Kolmogorov-Smirnoff test for use in (one-sample) goodness-of-fit tests. It’s well-established it’s inappropriate to use the K-S test when sample statistics are used to estimate parameters, which results in substantially increased Type-II errors. This warning is mentioned in the ks.test Help page, but no general solution is currently available for non-normal distributions. The ‘KScorrect’ package corrects for the bias by using Monte Carlo simulation, a solution first recommended by Lilliefors (1967) but not widely heeded. Distribution functions are provided in the package for the loguniform and univariate mixture of normal distributions, which are not included in the R base installation. Simple examples are provided by calling example(KScorrect) or example(LcKS). Sincerely, Phil Novack-Gottshall and Steve Wang Related le logiciel R

Common statistical tests are linear models (or: how to teach stats)
By Jonas Kristoffer Lindeløv (blog, profile). Last updated: 28 June, 2019 (See changelog). Check out the Python version and the Twitter summary. This document is summarised in the table below. Most of the common statistical models (t-test, correlation, ANOVA; chi-square, etc.) are special cases of linear models or a very close approximation. This needless complexity multiplies when students try to rote learn the parametric assumptions underlying each test separately rather than deducing them from the linear model. For this reason, I think that teaching linear models first and foremost and then name-dropping the special cases along the way makes for an excellent teaching strategy, emphasizing understanding over rote learning. Concerning the teaching of “non-parametric” tests in intro-courses, I think that we can justify lying-to-children and teach “non-parametric”" tests as if they are merely ranked versions of the corresponding parametric tests. Show Source Theory: As linear models Magic!