background preloader


Facebook Twitter

Data R Value: Machine Learning. Linear Regression Full Example (Boston Housing). It is important to mention that the present posts began as a personal way of practicing R programming and machine learning.

Data R Value: Machine Learning. Linear Regression Full Example (Boston Housing).

Subsequently feedback from the community, urged me to continue performing these exercises and sharing them. The bibliography and corresponding authors are cited at all times and it is a way of honoring them and giving them the credit they deserve for their work. Regtools. Let’s take a look at the data set prgeng, some Census data for California engineers and programmers in the year 2000.


The response variable in this example is wage income, and the predictors are age, number of weeks worked, and dummy variables for MS and PhD degrees. (Some data wrangling was performed first; type ? Knnest for the details.) The fit assessment techniques in regtools gauge the fit of parametric models by comparing to nonparametric ones. Going Deeper into Regression Analysis with Assumptions, Plots & Solutions. Introduction All models are wrong, but some are useful – George Box Regression analysis marks the first step in predictive modeling.

Going Deeper into Regression Analysis with Assumptions, Plots & Solutions

No doubt, it’s fairly easy to implement. Neither it’s syntax nor its parameters create any kind of confusion. But, merely running just one line of code, doesn’t solve the purpose. First steps with Non-Linear Regression in R. Using segmented regression to analyse world record running times. By Andrie de Vries A week ago my high school friend, @XLRunner, sent me a link to the article "How Zach Bitter Ran 100 Miles in Less Than 12 Hours".

Using segmented regression to analyse world record running times

Applied Statistical Theory: Quantile Regression. This is part two of the ‘applied statistical theory’ series that will cover the bare essentials of various statistical techniques.

Applied Statistical Theory: Quantile Regression

As analysts, we need to know enough about what we’re doing to be dangerous and explain approaches to others. It’s not enough to say “I used X because the misclassification rate was low.” A function to help graphical model checks of lm and ANOVA. As always a more colourful version of this post is available on rpubs.

A function to help graphical model checks of lm and ANOVA

Even if LM are very simple models at the basis of many more complex ones, LM still have some assumptions that if not met would render any interpretation from the models plainly wrong. In my field of research most people were taught about checking ANOVA assumptions using tests like Levene & co. This is however not the best way to check if my model meet its assumptions as p-values depend on the sample size, with small sample size we will almost never reject the null hypothesis while with big sample even small deviation will lead to significant p-values (discussion). Using and interpreting different contrasts in linear models in R. When building a regression model with categorical variables with more than two levels (ie “Cold”, “Freezing”, “Warm”) R is doing internally some transformation to be able to compute regression coefficient.

Using and interpreting different contrasts in linear models in R

What R is doing is that it is turning your categorical variables into a set of contrasts, this number of contrasts is the number of level in the variable (3 in the example above) minus 1. Here I will present three ways to set the contrasts and depending on your research question and your variables one might be more appropriate than the others. ## $f ## [1] "contr.treatment" # simulate some ys beta y m |t|) ## (Intercept) 11.498 0.463 24.84 < 2e-16 *** ## flow 3.037 0.655 4.64 2.1e-05 *** ## fhigh 6.163 0.655 9.41 3.3e-13 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.07 on 57 degrees of freedom ## Multiple R-squared: 0.609, Adjusted R-squared: 0.595 ## F-statistic: 44.3 on 2 and 57 DF, p-value: 2.45e-12. R Tutorial Series: Graphic Analysis of Regression Assumptions. An important aspect of regression involves assessing the tenability of the assumptions upon which its analyses are based.

R Tutorial Series: Graphic Analysis of Regression Assumptions

This tutorial will explore how R can help one scrutinize the regression assumptions of a model via its residuals plot, normality histogram, and PP plot. Tutorial Files Before we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains information used to estimate undergraduate enrollment at the University of New Mexico (Office of Institutional Research, 1990). Pre-Analysis Steps. Linear Models. Notes on chapter 3 of Introduction to Statistical Learning and the Stanford Online Statistical Learning class.

Linear Models

The chapter uses the Advertising data set available from the book's website: Testing the assumptions of linear regression. R tips pages. This page provides tips and recommendations for fitting linear and nonlinear models to data.

R tips pages

Updated and revised frequently (click the reload button on your browser to make sure you are seeing the most recent version). The main model-fitting commands covered on this page are: lm (linear models for fixed effects)lme (linear models for mixed effects)glm (generalized linear models)nls (nonlinear least squares)gam (generalized additive models) Regression on categorical variables. This morning, Stéphane asked me tricky question about extracting coefficients from a regression with categorical explanatory variates.

More precisely, he asked me if it was possible to store the coefficients in a nice table, with information on the variable and the modality (those two information being in two different columns). Here is some code I did to produce the table he was looking for, but I guess that some (much) smarter techniques can be used (comments – see below – are open). Consider the following dataset > base x sex hair 1 1 H Black 2 4 F Brown 3 6 F Black 4 6 H Black 5 10 H Brown 6 5 H Blonde. Model Validation: Interpreting Residual Plots. When conducting any statistical analysis it is important to evaluate how well the model fits the data and that the data meet the assumptions of the model. There are numerous ways to do this and a variety of statistical tests to evaluate deviations from model assumptions.

However, there is little general acceptance of any of the statistical tests. GLM – Evaluating Logistic Regression Models (part 3) Third part on logistic regression (first here, second here). Two steps in assessing the fit of the model: first is to determine if the model fits using summary measures of goodness of fit or by assessing the predictive ability of the model; second is to deterime if there’s any observations that do not fit the model or that have an influence on the model. Covariate pattern A covariate pattern is a unique combination of values of predictor variables. There are 30 covariate patterns in the dataset. Continuous piecewise linear regression. When talking about smoothing splines a simple point to start with is a continuous piecewise linear regression with fixed knots.

I did not find any simple example showing how to estimate the it in GNU R so I have created a little snippet that does the job. Visualization in regression analysis. Visualization is a key to success in regression analysis. Binary classif. eval. in R via ROCR.