background preloader

OpenIntro

OpenIntro

Mining of Massive Datasets The book has now been published by Cambridge University Press. The publisher is offering a 20% discount to anyone who buys the hardcopy Here. By agreement with the publisher, you can still download it free from this page. --- Jure Leskovec, Anand Rajaraman (@anand_raj), and Jeff Ullman Download Version 2.1 The following is the second edition of the book, which we expect to be published soon. There is a revised Chapter 2 that treats map-reduce programming in a manner closer to how it is used in practice, rather than how it was described in the original paper. Version 2.1 adds Section 10.5 on finding overlapping communities in social graphs. Download the Latest Book (511 pages, approximately 3MB) Download chapters of the book: Download Version 1.0 The following materials are equivalent to the published book, with errata corrected to July 4, 2012. Download the Book as Published (340 pages, approximately 2MB) Gradiance Support Other Stuff Jure's Materials from the most recent CS246.

Bayes' Theorem An Intuitive Explanation of Bayes' Theorem Bayes' Theorem for the curious and bewildered; an excruciatingly gentle introduction. Your friends and colleagues are talking about something called "Bayes' Theorem" or "Bayes' Rule", or something called Bayesian reasoning. It's this equation. So you came here. Why does a mathematical concept generate this strange enthusiasm in its students? Soon you will know. While there are a few existing online explanations of Bayes' Theorem, my experience with trying to introduce people to Bayesian reasoning is that the existing online explanations are too abstract. Or so they claim. And let's begin. Here's a story problem about a situation that doctors often encounter: 1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. What do you think the answer is? Group 1: 100 women with breast cancer.

Interactive Statistical Calculation Pages Writing Better Statistical Programs in R A while back a friend asked me for advice about speeding up some R code that they’d written. Because they were running an extensive Monte Carlo simulation of a model they’d been developing, the poor performance of their code had become an impediment to their work. After I looked through their code, it was clear that the performance hurdles they were stumbling upon could be overcome by adopting a few best practices for statistical programming. This post tries to describe some of the simplest best practices for statistical programming in R. Following these principles should make it easier for you to write statistical programs that are both highly performant and correct. Write Out a DAG Whenever you’re running a simulation study, you should appreciate the fact that you are working with a probabilistic model. Almost certainly the most important concept in probabilistic modeling when you want to write efficient code is the notion of conditional independence. Let’s go through an example. Speed

Resources for Statistical Computing Other Resources for Help with Statistical Computing The primary mission of the IDRE Statistical Consulting Group is to support UCLA researchers in statistical computing using statistical packages like SAS, Stata, SPSS, HLM, MLwiN, Mplus and so forth. We provide this support through our web pages, our walk in consulting services, classes and seminars, and email consulting. Below, we provide a list of commonly used statistical software packages along with sources of support, including newsgroups/mailing lists, web pages provided by the vendors, and the vendor's technical support email address. Other lists news:sci.stat.consult - General issues in statistics. The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.

HyperStat Online: An Introductory Statistics Textbook and Online Tutorial for Help in Statistics Courses Click here for more cartoons by Ben Shabad. Other Sources NIST/SEMATECH e-Handbook of Statistical Methods Stat Primer by Bud Gerstman of San Jose State University Statistical forecasting notes by Robert Nau of Duke University related: RegressIt Excel add-in by Robert Nau CADDIS Volume 4: Data Analysis (EPA) The little handbook of statistical practice by Gerard E. Stat Trek Tutorial Statistics at square 1 by T. Concepts and applications of inferential statistics by Richard Lowry of Vassar College CAST by W. SticiGui by P. StatPrimer © B. Gerstman 2003 StatPrimer (Version 6.4) B. Burt Gerstman (email) Part A (Introductory) (1) Measurement and sampling [Exercises] (2) Frequency distributions [Exercises] (3) Summary statistics [Exercises] (4) Probability [Exercises Part A] [Exercises Part B] (5) Introduction to estimation [Exercises] (6) Introduction to hypothesis testing [Exercises] (7) Paired samples [Exercises] (8) Comparing Independent means [Exercises] (9) Inference about a proportion [Exercises] (9.5) Comparing two proportion (*.ppt) [Exercises] (10) Cross-tabulated counts [Exercises] Part B (Intermediate) (11) Variances and means [Exercises] (12) ANOVA [Exercises] (13) ANOVA topics (post hoc comparisons, Levene's test, Non-parametric tests) [Exercises] (14) Correlation [Exercises] (15) Regression [Exercises] (16) Risk ratios and prevalence ratios [Exercises] (17) Case-control odds ratios [Exercises] Additional notes Power and sample size [Exercises] How To Know What to Use [Exercises]Approaches Toward Data Analysis Data Files

An Introduction to Bayesian Networks with Jayes | Codetrails At Eclipse Code Recommenders, most of our recommendation engines use Bayesian Networks, which are a compact representation of probability distributions. They thus serve to express relationships between variables in a partially observable world. Our recommenders use these networks to predict what the developer wants to use next, based on what he has done previously. When the Code Recommenders project first started, there was a need for a new open-source, pure-Java bayesian network library. As part of my bachelor thesis, I created such a library, called Jayes. Jayes has since become the backend of most Code Recommenders’ recommendation engines. This post describes how to use Jayes for your own inference tasks. Guest Blogger: Michael Michael Kutschke is currently completing his Master of Science at the Computer Science department of Technische Universität Darmstadt. What Jayes is, and what it isn’t Jayes is a library for Bayesian networks and the inference in such networks. How do I use it?

Related: