 # OpenIntro Data Science Wars: Python vs. R As I frequently travel in data science circles, I’m hearing more and more about a new kind of tech war: Python vs. R. I’ve lived through many tech wars in the past, e.g. Windows vs. Linux, iPhone vs. While R has traditionally been the programming language of choice for data scientists, some believe it is ceding ground to Python. R is Too Complex The most frequently stated argument I’ve heard is the view that Python is general purpose and comparatively easy to learn whereas R remains a somewhat complex programming environment to master. When I first learned R, I did not find it particularly complex; it was a lot easier for me to learn R than C++ or Java with their mammoth frameworks. R Isn’t Really a Language Another argument says that part of the reason people struggle to learn R is that it’s not really a language. Python is More Approachable Some feel that Python is more approachable. Remember, R is a very old statistical environment that has an incredible global following.

Detecting multicollinearity using variance inflation factors | STAT 501 - Regression Methods Printer-friendly version Okay, now that we know the effects that multicollinearity can have on our regression analyses and subsequent conclusions, how do we tell when it exists? That is, how can we tell if multicollinearity is present in our data? Some of the common methods used for detecting multicollinearity include: The analysis exhibits the signs of multicollinearity — such as, estimates of the coefficients vary from model to model. Looking at correlations only among pairs of predictors, however, is limiting. What is a variation inflation factor? As the name suggests, a variance inflation factor (VIF) quantifies how much the variance is inflated. Let's be a little more concrete. it can be shown that the variance of the estimated coefficient bk is: Note that we add the subscript "min" in order to denote that it is the smallest the variance can be. Let's consider such a model with correlated predictors: How much larger? An example the matrix plot of BP, Dur, Pulse, and Stress:

Crime data exploration in R using ggplot2 - Active Analytics Introduction The purpose of this blog post is to outline some exploratory plots using crime data, available from data.gov.uk website and the ggplot2 package in R. The ggplot2 package is a plotting and graphics package written for R by Hadley Wickham. Its great looking plots and impressive flexibility have made it a popular amongst R coders. Though this blog post has been created for crime data, the principles can be extended to analysis of many different data sets. Before I begin there are two items to cover: 1. 2. The Data The data used in this plotting tutorial was from the data.gov.uk website. #We load some packages # Our plotting tool require(ggplot2) # For arranging the plots require(gridExtra) # For manipulating the plot scales require(scales) # For generting our svg files require(grDevices) options("stringsAsFactors" = TRUE) # Path to the folder holding the data csv path <- "C:\\ btpData <- read.csv(file = paste(path, "BTP-Dec-2012.csv", sep = ""), header = TRUE) The dimensions of the table ...