background preloader

Probability and Statistics

Facebook Twitter

(255) William Chen's answer to Data Science: How do data scientists use statistics? Blogs about machine learning, statistics, recommendations, and related topics. Blogs about machine learning, statistics, recommendations, and related topics. Probability and Random Variables | Mathematics. An Introduction to Bayesian Networks with Jayes | Codetrails. At Eclipse Code Recommenders, most of our recommendation engines use Bayesian Networks, which are a compact representation of probability distributions. They thus serve to express relationships between variables in a partially observable world.

Our recommenders use these networks to predict what the developer wants to use next, based on what he has done previously. When the Code Recommenders project first started, there was a need for a new open-source, pure-Java bayesian network library. As part of my bachelor thesis, I created such a library, called Jayes. This post describes how to use Jayes for your own inference tasks. Guest Blogger: Michael Michael Kutschke is currently completing his Master of Science at the Computer Science department of Technische Universität Darmstadt. What Jayes is, and what it isn’t Jayes is a library for Bayesian networks and the inference in such networks.

Where can I get it? There are two sources for getting your hands on Jayes’ source code: How do I use it? StatPrimer © B. Gerstman 2003. StatPrimer (Version 6.4) B. Burt Gerstman (email) Part A (Introductory) (1) Measurement and sampling [Exercises] (2) Frequency distributions [Exercises] (3) Summary statistics [Exercises] (4) Probability [Exercises Part A] [Exercises Part B] (5) Introduction to estimation [Exercises] (6) Introduction to hypothesis testing [Exercises] (7) Paired samples [Exercises] (8) Comparing Independent means [Exercises] (9) Inference about a proportion [Exercises] (9.5) Comparing two proportion (*.ppt) [Exercises] (10) Cross-tabulated counts [Exercises] Part B (Intermediate) (11) Variances and means [Exercises] (12) ANOVA [Exercises] (13) ANOVA topics (post hoc comparisons, Levene's test, Non-parametric tests) [Exercises] (14) Correlation [Exercises] (15) Regression [Exercises] (16) Risk ratios and prevalence ratios [Exercises] (17) Case-control odds ratios [Exercises] Additional notes Power and sample size [Exercises] How To Know What to Use [Exercises]Approaches Toward Data Analysis Data Files.

HyperStat Online: An Introductory Statistics Textbook and Online Tutorial for Help in Statistics Courses. Click here for more cartoons by Ben Shabad. Other Sources NIST/SEMATECH e-Handbook of Statistical Methods Stat Primer by Bud Gerstman of San Jose State University Statistical forecasting notes by Robert Nau of Duke University related: RegressIt Excel add-in by Robert Nau CADDIS Volume 4: Data Analysis (EPA) The little handbook of statistical practice by Gerard E. Dallal of Tufts University Stat Trek Tutorial Statistics at square 1 by T. Concepts and applications of inferential statistics by Richard Lowry of Vassar College CAST by W. SticiGui by P. Online Statistics Education: A Free Resource for Introductory Statistics.

OpenIntro. Statistics, Probability, and Survey Sampling. Writing Better Statistical Programs in R. A while back a friend asked me for advice about speeding up some R code that they’d written. Because they were running an extensive Monte Carlo simulation of a model they’d been developing, the poor performance of their code had become an impediment to their work. After I looked through their code, it was clear that the performance hurdles they were stumbling upon could be overcome by adopting a few best practices for statistical programming. This post tries to describe some of the simplest best practices for statistical programming in R. Following these principles should make it easier for you to write statistical programs that are both highly performant and correct.

Write Out a DAG Whenever you’re running a simulation study, you should appreciate the fact that you are working with a probabilistic model. Almost certainly the most important concept in probabilistic modeling when you want to write efficient code is the notion of conditional independence. Let’s go through an example. Speed. Bayes' Theorem. An Intuitive Explanation of Bayes' Theorem Bayes' Theorem for the curious and bewildered; an excruciatingly gentle introduction. Your friends and colleagues are talking about something called "Bayes' Theorem" or "Bayes' Rule", or something called Bayesian reasoning. They sound really enthusiastic about it, too, so you google and find a webpage about Bayes' Theorem and... It's this equation. That's all. So you came here. Why does a mathematical concept generate this strange enthusiasm in its students?

Soon you will know. While there are a few existing online explanations of Bayes' Theorem, my experience with trying to introduce people to Bayesian reasoning is that the existing online explanations are too abstract. Or so they claim. And let's begin. Here's a story problem about a situation that doctors often encounter: What do you think the answer is? Next, suppose I told you that most doctors get the same wrong answer on this problem - usually, only around 15% of doctors get it right. E. E. Stats 329 - Winter 2009/2010. SticiGui Statistics. OpenIntro. Statistics.