background preloader

Statistics

Facebook Twitter

R

DAYTUM. Billionaire SAS co-founder keeps on coding. News By Eric Lai September 18, 2009 05:39 PM ET Computerworld - John Sall doesn't have to work. As one of SAS Institute Inc.'s four co-founders, the 60-year old has a net worth of $3.1 billion, according to Forbes' estimate, ranking him the 196th richest person in the world (just ahead of George Lucas, Steven Spielberg and Ralph Lauren). So why does he stay at the Cary, N.C. "It's always been my job to be a statistical software developer," Sall said in an interview earlier this week in Chicago, where he was in town for a SAS user conference. "I've invested my life in it. SAS' John Sall. Perhaps we can credit Sall's Midwestern roots for his drive. Sall was one of the lead developers for SAS during its first decade. He started working on a data visualization tool called JMP (pronounced "Jump"), an acronym for "John's Macintosh Project.

" The software took off with the release of a Windows version in 1994. Elixir Strings uses JMP to help create its good-sounding, non-breaking guitar strings. Statistical Learning in Clojure Part 1: LDA & QDA Classifier. This will hopefully be the first of a series of posts based on a book that has substantially influenced me over the last several years, The Elements of Statistical Learning (EoSL) by Hastie, Tibshirani, and Friedman (I went and got a degree in statistics essentially for the purposes of better understanding this book). Best of all, the pdf version of EoSL is now available free of charge at the book’s website, along with data, code, errata, and more.

This post will demonstrate the use of Linear Discriminant Analysis and Quadratric Discriminant Analysis for classification, as described in chapter 4, “Linear Methods for Classification”, of EoSL. I will implement the classifiers in Clojure and Incanter, and use the same data set as EoSL to train and test them. The data has 11 different classes, each representing a vowel sound, and 10 predictors, each representing processed audio information captured from eight male and seven female speakers. (eq 4.7) Where, is the prior probability of class k. Lisp and statistics for 21. century: some articles, books and so. Suggestions for learning statistics (& prob) really.

For Today’s Graduate, Just One Word - Statistics. Entropy, Order Parameters, and Complexity. Example of efficiency for mean vs. median. Two common ways to estimate the center of a set of data are the sample mean and the sample median. The sample mean is sometimes more efficient, but the sample median is always more robust. (I’m going to cut to the chase first, then go back and define basic terms like “median” and “robust” below.) When the data come from distributions with thick tails, the sample median is more efficient. When the data come from distributions with a thin tail, like the normal, the sample mean is more efficient.

The Student-t distribution illustrates both since it goes from having thick tails to having thinner tails as the degrees of freedom, denoted ν, increase. When ν = 1, the Student-t is a Cauchy distribution and the sample mean wanders around without converging to anything, though the sample median behaves well. Here is a plot of the asymptotic relative efficiency (ARE) of the median compared to the mean for samples from a Student-t distribution as a function of the degrees of freedom ν. Backing up. COMPSTAT 2008 - Proceedings in ... - Google Book Search. Arithmetic Mean, Geometric Mean, Harmonic Mean and Weighted Arit. August 10, 20084:34 pmJohn HaugelandErlang, Math, Programming, Statistics, Tools and Libraries [digg-reddit-me]I’ll be putting up some statistical functions I’ve had to write recently. This is the first batch. I confess, I find the erlang implementations far more readable than the pure math definitions one finds around; I’ve been thinking about writing tutorials, but with this code here, I’m not entirely sure it’s necessary.

At any rate, the code follows. As with so much of my erlang code, this code is part of the ScUtil library. ScUtil is free and MIT license, because the GPL is evil. There’re more statistics coming, I just don’t want to make any one post too huge, and I don’t want keyword saturation. This partially closes issue 119.