background preloader

Simply Statistics

Simply Statistics
Editor's note: This is a guest post by Alyssa Frazee, a graduate student in the Biostatistics department at Johns Hopkins and a participant in the recent rOpenSci hackathon. Last week, I took a break from my normal PhD student schedule to participate in a hackathon in San Francisco. The two-day event was hosted by rOpenSci, an organization committed to developing R tools for open science. Working with several wonderful people from the R community was inspiring, humbling, and incredibly fun. So many great things happened in a two-day whirlwind: it would be impossible now to capture the whole thing in a narrative that would do it justice. So instead of a play-by-play, here are some of the quotes from the event that I've recently been reflecting on:

Related:  RbigdataDATAMININGCoursera Data Science Specialization

statistics by Terry M. Therneau Ph.D.Faculty, Mayo Clinic About a year ago there was a query about how to do "type 3" tests for a Cox model on the R help list, which someone wanted because SAS does it. The SAS addition looked suspicious to me, but as the author of the survival package I thought I should understand the issue more deeply. It took far longer than I expected but has been illuminating. Leo Breiman Leo Breiman passed away on July 5, 2005. Professor Breiman was a member of the National Academy of Sciences. His research in later years focussed on computationally intensive multivariate analysis, especially the use of nonlinear methods for pattern recognition and prediction in high dimensional spaces.

Our top 10 Data Science articles in 2014 2014 has been a year of growth for us. We now get 10x traffic compared to what we used to get 12 months back. It gives us immense satisfaction to be able to create something which is helping more and more people every day. We only hope that we could get some more time to create more content for our audience! Not only we wrote more articles and better articles in 2014, we also started a jobs listing and a trainings listing page Shiny - Tutorial You can teach yourself to use Shiny in two ways. You can watch the “How to Start Shiny” webinar series, or you can work through the self-paced Shiny tutorial below. Who should take the tutorial? You will get the most out of the webinar or tutorial if you already know how to program in R, but not Shiny. If R is new to you, you may want to check out the learning resources at before taking this tutorial.

Must read books for Analysts (or people interested in Analytics) One of the ways I continue my learning is reading. I read for 30 minutes before hitting the bed every day. This not only makes sure that I learn some thing daily, but also ends my day in a fulfilling manner. Over the years, I have read a variety of books on various subjects. Linked Data - Design Issues Up to Design Issues The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data. Like the web of hypertext, the web of data is constructed with documents on the web. However, unlike the web of hypertext, where links are relationships anchors in hypertext documents written in HTML, for data they links between arbitrary things described by RDF,.

A deterministic statistical machine As Roger pointed out the most recent batch of Y Combinator startups included a bunch of data-focused companies. One of these companies, StatWing, is a web-based tool for data analysis that looks like an improvement on SPSS with more plain text, more visualization, and a lot of the technical statistical details “under the hood”. I first read about StatWing on TechCrunch, where the title, “How Statwing Makes It Easier To Ask Questions About Data So You Don’t Have To Hire a Statistical Wizard”. StatWing looks super user-friendly and the idea of democratizing statistical analysis so more people can access these ideas is something that appeals to me. But, as one of the aforementioned statistical wizards, this had me freaked out for a minute.

Standards W3C standards define an Open Web Platform for application development that has the unprecedented potential to enable developers to build rich interactive experiences, powered by vast data stores, that are available on any device. Although the boundaries of the platform continue to evolve, industry leaders speak nearly in unison about how HTML5 will be the cornerstone for this platform. But the full strength of the platform relies on many more technologies that W3C and its partners are creating, including CSS, SVG, WOFF, the Semantic Web stack, XML, and a variety of APIs. W3C develops these technical specifications and guidelines through a process designed to maximize consensus about the content of a technical report, to ensure high technical and editorial quality, and to earn endorsement by W3C and the broader community. If you are learning about Web technology, you may wish to start with the introduction below, and follow links for greater detail.

aggregate {stats} Compute Summary Statistics of Data Subsets Description Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form. Usage aggregate(x, ...) ## S3 method for class 'default': aggregate((x, ...)) ## S3 method for class 'data.frame': aggregate((x, by, FUN, ..., simplify = TRUE)) ## S3 method for class 'formula': aggregate((formula, data, FUN, ..., subset, na.action = na.omit)) ## S3 method for class 'ts': aggregate((x, nfrequency = 1, FUN = sum, ndeltat = 1, ts.eps = getOption("ts.eps"), ...)) Arguments

Installing swirl on Linux · swirldev/swirl Wiki · GitHub swirl and its dependencies require R version 3.0.2 or later as well as a recent version of libcurl. This page is our attempt to collect any information that might be helpful to Linux users wanting to install swirl. Ubuntu and its derivatives These instructions have been successfully tested on: Read Statistical inference for data science About this book This book is written as a companion book to the Statistical Inference Coursera class as part of the Data Science Specialization. However, if you do not take the class, the book mostly stands on its own. A useful component of the book is a series of YouTube videos that comprise the Coursera class. The book is intended to be a low cost introduction to the important field of statistical inference.

How to install R, JGR and Deducer in Ubuntu A step-by-step guide about how to install the free statistical software GNU R, JGR and Deducer in Ubuntu (>= 10.04.4) and others Ubuntu derivative distributions. (see Romanian version / vezi versiunea în română) UPDATED: Mar 19, 2014 The combination of GNU R, JGR and Deducer is a powerful free alternative to all proprietary / commercially distributed statistical programs like SPSS. Together, they provide a wide variety of statistical and graphical techniques, combined with intuitive graphical menus and dialogues that guide the user efficiently through the data manipulation and analysis process. Why you should start by learning data visualization and manipulation One of the biggest issues that comes up when I talk to people who want to get started learning data science is the following: I don’t know where to get started! Recently, I argued that R is the best programming language to learn when you’re getting started with data science. While this helps you select a programming language, it still doesn’t tell you what skills to focus on. Just like when you select a programming language, selecting the skills to start with can be overwhelming.

Related:  Coursera - Data Science Specialization