background preloader

Refine - Google Refine, a power tool for working with messy data (formerly Freebase Gridworks)

Refine - Google Refine, a power tool for working with messy data (formerly Freebase Gridworks)

TimelineSetter: Easy Timelines From Spreadsheets, Now Open to All Talking Points Memo used TimelineSetter to create a timeline featuring events in Wisconsin’s public-sector union struggle. Last week we announced TimelineSetter, our new tool for creating beautiful interactive HTML timelines. Today, after a short private beta with some of our fellow news application developers, we’re opening the code to everyone. How to Install If you’ve got Ruby and Rubygems installed, you can get the package by running: sudo gem install timeline_setter

Chapter 1. Using Google Refine to Clean Messy Data Google Refine (the program formerly known as Freebase Gridworks) is described by its creators as a “power tool for working with messy data” but could very well be advertised as “remedy for eye fatigue, migraines, depression, and other symptoms of prolonged data-cleaning.” Even journalists with little database expertise should be using Refine to organize and analyze data; it doesn't require much more technical skill than clicking through a webpage. For skilled programmers, and journalists well-versed in Access and Excel, Refine can greatly reduce the time spent doing the most tedious part of data-management. Other reasons why you should try Google Refine: It’s free.It works in any browser and uses a point-and-click interface similar to Google Docs.Despite the Google moniker, it works offline.

Data Wrangler UPDATE: The Stanford/Berkeley Wrangler research project is complete, and the software is no longer actively supported. Instead, we have started a commercial venture, Trifacta. For the most recent version of the tool, see the free Trifacta Wrangler. Working with data in protovis For the past year or so I have been dabbling with protovis. I don’t have a heavy CS background but protovis is supposedly easy to pick up for people like me, who are vaguely aware that computers can make calculations but who need to check the manual for the most mundane programming instructions. I found was while it’s reasonnably easy to modify the most basic examples to make stuff happen, it is much harder to understand or adapt the more complex ones, let alone to create a fairly complex visualization.

Weka 3 - Data Mining with Open Source Machine Learning Software in Java Weka is tried and tested open source machine learning software that can be accessed through a graphical user interface, standard terminal applications, or a Java API. It is widely used for teaching, research, and industrial applications, contains a plethora of built-in tools for standard machine learning tasks, and additionally gives transparent access to well-known toolboxes such as scikit-learn, R, and Deeplearning4j. DownloadDocsCoursesBook

Data Visualization Platform, Weave, Now Open Source With more and more civic data becoming available and accessible, the challenge grows for policy makers and citizens to leverage that data for better decision-making. It is often difficult to understand context and perform analysis. “Weave”, however, helps. Chapter 2: Reading Data from Flash Sites Flash applications often disallow the direct copying of data from them. But we can instead use the raw data files sent to the web browser. Adobe Flash can make data difficult to extract. Protovis Protovis composes custom views of data with simple marks such as bars and dots. Unlike low-level graphics libraries that quickly become tedious for visualization, Protovis defines marks through dynamic properties that encode data, allowing inheritance, scales and layouts to simplify construction. Protovis is free and open-source, provided under the BSD License. It uses JavaScript and SVG for web-native visualizations; no plugin required (though you will need a modern web browser)!

Mike Bostock December 27, 2014Mapping Every Path to the N.F.L. Playoffs December 20, 2014How Each Team Can Make the N.F.L. PyNGL and PyNIO Introduction This is the place to start if you are new to PyNGL and PyNIO. PyNGL (pronounced "pingle") is a Python language module used to visualize scientific data, with an emphasis on high quality 2D visualizations. A working knowledge of Python is assumed. How to: get started in data journalism using Google Fusion Tables An intensity map showing the population density for different ethnic groups in Texas What is it?Google Fusion Tables allows users to create data visualisations such as maps, charts, graphs and timelines.

Chapter 4: Scraping Data from HTML Web-scraping is essentially the task of finding out what input a website expects and understanding the format of its response. For example, takes a user's zip code as input before returning a page showing federal stimulus contracts and grants in the area. This tutorial will teach you how to identify the inputs for a website and how to design a program that automatically sends requests and downloads the resulting web pages. Pfizer disclosed its doctor payments in March as part of a $2.3 billion settlement - the largest health care fraud settlement in U.S. history - of allegations that it illegally promoted its drugs for unapproved uses. Of the disclosing companies so far, Pfizer's disclosures are the most detailed and its site is well-designed for users looking up individual doctors. However, its doctor list is not downloadable, or easily aggregated.

Google Refine is a great tool for cleaning up (standardizing) large amounts of data at a time. by danielhall66 Jan 21

Related:  Handy ToolsGoogleInformática*