background preloader

Storytelling with data

Storytelling with data
I often draw a distinction between exploratory and explanatory data analysis. Exploratory analysis is what you do to get familiar with the data. You may start out with a hypothesis or question, or you may just really be delving into the data to determine what might be interesting about it. Exploratory analysis is the process of turning over 100 rocks to find perhaps 1 or 2 precious gemstones. Explanatory analysis is what happens when you have something specific you want to show an audience - probably about those 1 or 2 precious gemstones. In my blogging and writing, I tend to focus mostly on this latter piece, explanatory analysis, when you've already gone through the exploratory analysis and from this have determined something specific you want to communicate to a given audience: in other words, when you want to tell a story with data.

Related:  R ResourcesData VisualizationVisual CommunicationAnalytics - Storytelling

A tidy model pipeline with twidlr and broom (This article was first published on blogR, and kindly contributed to R-bloggers) @drsimonj here to show you how to go from data in a data.frame to a tidy data.frame of model output by combining twidlr and broom in a single, tidy model pipeline. The problem Different model functions take different types of inputs (data.frames, matrices, etc) and produce different types of output! Thus, we’re often confronted with the very untidy challenge presented in this Figure: Visualization Visualizing data through charts, graphs, and diagrams helps you deliver bite-sized information that viewers will understand at a glance and retain for the long run. During my workshops, webinars, and training videos, we focus on researcher-specific considerations: designing with stakeholders’ information needs front and center, using readily available software like Microsoft Excel, and thinking through dozens of chart types—dot plots, small multiples, heat maps, and more—that can be applied to the social sciences. My goal is to equip you with critical thinking skills and technical know-how create visualizations faster and easier than you ever thought was possible. Read my latest articles about selecting appropriate chart types, applying best practices to your charts, and more. View excerpts from my latest conference presentations and read my articles that are guest-published through other organizations’ blogs.

Writing for the Professions Welcome to the course. We shall be using this page in order to organise WFP. Please check this page weekly before you come to class.. All classes will take place in the computer labs but students are invited to bring their own laptops if they so wish. If you are unable to make your tutorial for whatever reason, you are expected to complete the weekly tutorial tasks in your own time. The following are contact details for the unit coordinator Myra Gurney: Why Data Scientists Need to be Good Data Storytellers Guest blog by Khushbu Shah at Storytelling is data with a soul. Data Scientists are extremely good with numbers but numbers alone are not sufficient to convey the results to the end user. Being a good data storyteller is an art as well as a science. Data Scientists take the help of various data visualization tools like Tableau to present the data in visually appealing format. A Data Scientist not only understands the data but also understands the business and the end user very well.

U.S. Residential Energy Use: Machine Learning on the RECS Dataset Contributed by Thomas Kassel. He is currently enrolled in the NYC Data Science Academy remote bootcamp program taking place from January-May 2017. This post is based on his final capstone project, focusing on the use of machine learning techniques learned throughout the course. Introduction The residential sector accounts for up to 40% of annual U.S. electricity consumption, representing a large opportunity for energy efficiency and conservation. Visual Business Intelligence For data sensemakers and others who are concerned with the integrity of data sensemaking and its outcomes, the most important book published in 2016 was Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, by Cathy O’Neil. This book is much more than a clever title. It is a clarion call of imminent necessity. Data can be used in harmful ways.

Simple Visualizations with D3plus I’ve been using D3, a JavaScript library for data visualizations (the three ‘D’s stand for Data-Driven Documents), for my own projects and with my students for some time. It’s a particularly cool tool for working with dynamic data or information from a database and giving it life in a visual format through charts, graphs, and interactive data displays. Information visualization can be a powerful way to represent complex or otherwise inaccessible data. However, the learning curve for it is a little high, so I’ve never recommended it as an entry tool for this type of visualization. Last week I followed Miriam Posner’s mention on Twitter to something that may change that: D3plus. D3plus is described as an extension for D3, but it’s really a simplification of the library’s at times overwhelming options and data structures to make it easy to built visualizations of data sets.

Complete Subset Regressions, simple and powerful Complete Subset Regressions, simple and powerful By Gabriel Vasconcelos The complete subset regressions (CSR) is a forecasting method proposed by Elliott, Gargano and Timmermann in 2013. It is as very simple but powerful technique. Suppose you have a set of variables and you want to forecast one of them using information from the others. The Work of Edward Tufte and Graphics Press Edward Tufte is a statistician and artist, and Professor Emeritus of Political Science, Statistics, and Computer Science at Yale University. He wrote, designed, and self-published 4 classic books on data visualization. The New York Times described ET as the "Leonardo da Vinci of data," and Business Week as the "Galileo of graphics." He is now writing a book/film The Thinking Eye and constructing a 234-acre tree farm and sculpture park in northwest Connecticut, which will show his artworks and remain open space in perpetuity.

No one could see the colour blue until modern times This isn’t another story about that dress, or at least, not really. It’s about the way that humans see the world, and how until we have a way to describe something, even something so fundamental as a colour, we may not even notice that it’s there. Until relatively recently in human history, “blue” didn’t exist. Oakland Real Estate – Full EDA Living in the Bay Area has led me to think more and more about real estate (and how amazingly expensive it is here…) I’ve signed up for trackers on Zillow and Redfin, but the data analyst in me always wants to dive deeper, to look back historically, to quantify, to visualize the trends, etc… With that in mind, here is my first view at Oakland real estate prices over the past decade. I’ll only be looking at multi-tenant units (duplexes, triplexes, etc.) The first plot is simply looking at the number of sales each month: You can clearly see a strong uptick in the number of units sold from 2003 to 2005 and the following steep decline in sales bottoming out during the financial crisis in 2008. Interestingly, sales pick up again very quickly in 2009 and 2010 (a time when I expected to see low sales figures) before stabilizing at the current rate of ~30 properties sold per month. The next plot shows the price distribution for multi-tenant buildings sold in Oakland: