background preloader

The Endeavour — The blog of John D. Cook

The Endeavour — The blog of John D. Cook
I help people make decisions in the face of uncertainty. Sounds interesting. I’m a data scientist. Not sure what that means, but it sounds cool. I study machine learning. I’m into big data. Even though each of these descriptions makes a different impression, they’re all essentially the same thing. There are distinctions. “Decision-making under uncertainty” emphasizes that you never have complete data, and yet you need to make decisions anyway. “Data science” stresses that there is more to the process of making inferences than what falls under the traditional heading of “statistics.” Despite the hype around the term data science, it’s growing on me. Machine learning, like decision theory, emphasizes the ultimate goal of doing something with data rather than creating an accurate model of the process that generates the data. “Big data” is a big can of worms. Bayesian statistics is much older than what is now sometimes called “classical” statistics.

Fishing in the Bay » Blog Archive » Why I am in favour of logging A colleague recently brought to me some alternative fits he had done for a paper he was writing. The alternative fits looked very strange but had been strongly suggested by a referee. He was fitting a regression model to inter-country trade data and trying to explain patterns in terms of various measures of cultural fit. The referee was pointing to some papers in econometrics that had argued about the relative merits of multiplicative regression models fitted on the direct scale, rather than on the log-scale. The referee wanted a direct fit on the basis that the random errors may be more normal and additive on the direct scale. One of the papers he was pointing to is HERE which contains the unequivocal recommendation Overall, except under very special circumstances, estimation based on the log-linear model cannot be recommended. Sounds like complete bollocks to me. Why is the log-transform better? Leverage effects can be huge on the direct scale. So there you have it.

The R programming language for programmers coming from other programming languages IntroductionAssignment and underscoreVariable name gotchasVectorsSequencesTypesBoolean operatorsListsMatricesMissing values and NaNsCommentsFunctionsScopeMisc.Other resources Ukrainian translation Other languages: Powered by Translate Introduction I have written software professionally in perhaps a dozen programming languages, and the hardest language for me to learn has been R. R is more than a programming language. This document is a work in progress. Assignment and underscore The assignment operator in R is <- as in e <- m*c^2. It is also possible, though uncommon, to reverse the arrow and put the receiving variable on the right, as in m*c^2 -> e. It is sometimes possible to use = for assignment, though I don't understand when this is and is not allowed. However, when supplying default function arguments or calling functions with named arguments, you must use the = operator and cannot use the arrow. At some time in the past R, or its ancestor S, used underscore as assignment. Vectors Sequences

Social Science Statistics Blog 28 April 2013 App Stats: Roberts, Stewart, and Tingley on "Topic models for open ended survey responses with applications to experiments" We hope you can join us this Wednesday, May 1, 2013 for the Applied Statistics Workshop. Molly Roberts, Brandon Stewart, and Dustin Tingley, all from the Department of Government at Harvard University, will give a presentation entitled "Topic models for open ended survey responses with applications to experiments". A light lunch will be served at 12 pm and the talk will begin at 12.15. "Topic models for open ended survey responses with applications to experiments" Molly Roberts, Brandon Stewart, and Dustin Tingley Government Department, Harvard University CGIS K354 (1737 Cambridge St.) Abstract: Despite broad use of surveys and survey experiments by political science, the vast majority of survey analysis deals with responses to options along a scale or from pre-established categories. Posted by Konstantin Kashin at 11:25 PM | Comments (2) 22 April 2013

Impatient R Translations français: Translated by Kate Bondareva. Serbo-Croatian: Translated by Jovana Milutinovich from Geeks Education. Preface This is a tutorial (previously known as “Some hints for the R beginner”) for beginning to learn the R programming language. You are probably impatient to learn R — most people are. This page has several sections, they can be put into the four categories: General, Objects, Actions, Help. General Introduction Blank screen syndrome Misconceptions because of a previous language Helpful computer environments R vocabulary Epilogue Objects Key objects Reading data into R Seeing objects Saving objects Magic functions, magic objects Some file types Packages Actions What happens at R startup Key actions Errors and such Graphics Vectorization Make mistakes on purpose Help Introduction I asked R users what their biggest stumbling blocks were in learning R. > search()

R-statistics blog Data Sorcery with Clojure Statistics for a changing world: Google Public Data Explorer in Labs Last year, we released a public data search feature that enables people to quickly find useful statistics in search. More recently, we expanded this service to include information from the World Bank, such as population data for every region in the world. More and more public agencies, non-profits and other organizations are looking for ways to open up their data and expand global access to this kind of information. We want to help keep that momentum going, so today we're sharing a snapshot of some of the most popular public data search topics on Google. We're also launching the Google Public Data Explorer, an experimental visualization tool in Google Labs. Popular public data topics on GoogleWe know people want to be able to find reliable data and statistics on a variety of subjects. You can read the complete list at this link (PDF), but here's the top 20 to get you started: You'll notice some interesting entries in the list. Animated charts can bring data to life.

Understanding Shakespeare / Approaches A guide to querying 'references' in the Content API | Open Platform We have recently extended the ways that you can search our Content API to include queries with 'references'. You can query the API with an ISBN number, and see articles about the corresponding book, or by a MusicBrainz ID, and see articles about the artist or composer. Here are some answers to frequently asked questions about this feature. Questions Answers What is the 'show-references' parameter? The show-references= parameter has been added on content search, tag search and item endpoints. show-references=isbn => display ISBN references where available. show-references=musicbrainz,isbn => display MusicBrainz and ISBN references where available. show-references=all => display all available references. What is the 'reference-type' parameter? The reference-type= parameter has been added on content search, tag search and item endpoints. reference-type=musicbrainz,isbn => return content which has both an ISBN and MusicBrainz identifier associated. What is the 'reference=' parameter? No. No. search?