background preloader

Data Wrangler

Data Wrangler
UPDATE: The Stanford/Berkeley Wrangler research project is complete, and the software is no longer actively supported. Instead, we have started a commercial venture, Trifacta. For the most recent version of the tool, see the free Trifacta Wrangler. Why wrangle? Too much time is spent manipulating data just to get analysis and visualization tools to read it. Wrangler is designed to accelerate this process: spend less time fighting with your data and more time learning from it.

Protovis Protovis composes custom views of data with simple marks such as bars and dots. Unlike low-level graphics libraries that quickly become tedious for visualization, Protovis defines marks through dynamic properties that encode data, allowing inheritance, scales and layouts to simplify construction. Protovis is free and open-source, provided under the BSD License. It uses JavaScript and SVG for web-native visualizations; no plugin required (though you will need a modern web browser)! Although programming experience is helpful, Protovis is mostly declarative and designed to be learned by example.

Chapter 1. Using Google Refine to Clean Messy Data Google Refine (the program formerly known as Freebase Gridworks) is described by its creators as a “power tool for working with messy data” but could very well be advertised as “remedy for eye fatigue, migraines, depression, and other symptoms of prolonged data-cleaning.” Even journalists with little database expertise should be using Refine to organize and analyze data; it doesn't require much more technical skill than clicking through a webpage. For skilled programmers, and journalists well-versed in Access and Excel, Refine can greatly reduce the time spent doing the most tedious part of data-management. Other reasons why you should try Google Refine: It’s free.It works in any browser and uses a point-and-click interface similar to Google Docs.Despite the Google moniker, it works offline. There’s no requirement to send anything across the Internet.There’s a host of convenient features, such as an undo function, and a way to visualize your data’s characteristics.

Setting Data Free With Gapminder Last month Hans Rosling, the Swedish global health professor, statician and sword swallower released a desktop version of Gapminder World, his mesmerizing data visualization tool. Named one of Foreign Policy's top 100 global thinkers in 2009, the information design visionary co-founded with his son and daughter-in-law aiming to make the world's most important trends accessible and digestible to global leaders, policy makers and the general public. The software they developed, Trendalyzer, (acquired by Google in 2007) translates static numbers into dynamic, interactive bubbles moving through time. The desktop version of Gapminder, which is still in beta, allows you to create and present graphs without an Internet connection. Emily Cunningham is a research intern at ReadWriteWeb and a design and user experience intern at

Independent expert to push forward Catapult network to new heights - press release display page The Business Secretary made the announcement during the official opening of the Offshore Renewable Energy (ORE) Catapult's offices in Glasgow. This brings the number of Catapult centres to seven. By combining the existing knowledge in the offshore renewable sector with new resources, the ORE Catapult aims to drive innovation and ensure maximum UK benefit from offshore renewable energy resources. A report commissioned by the ORE Catapult estimates this could be worth up to £6.7bn per year to the UK economy by 2020. I am pleased Herman Hauser has agreed to undertake this review because he has already proven he has the vision to set Government bold but achievable goals - Vince Cable

Toxiclibs.js - Open-Source Library for Computational Design There are several areas where toxiclibs.js stands apart to remain more idiomatic and helpful in the javascript environment. For a complete description of the conveniences added to toxiclibs.js, read the sugar file in the repository. Some examples of these differences are: Datasets on Datavisualization Wikileaks US Embassy Cables 29 Nov 2010 Datasets Infographic, Politics Wikileaks began on Sunday November 28th publishing 251,287 leaked United States embassy cables, the largest set of confidential documents ever to be released into the public domain.

SXSW 2014: Monetization Opportunities in a Sharing Economy Melanie White | March 9, 2014 | 0 Comments inShare15 ClickZ spoke to Kurt Abrahamson, chief executive of ShareThis, to discuss the new "sharing currency" and the implications that this has for brands and marketers. At a brunch held in Austin, Texas, at this year’s South By Southwest, ShareThis sat down with Nasdaq OMX to discuss the new phenomenon of the "sharing economy."

Data cleansing After cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores. Data cleansing differs from data validation in that validation almost invariably means data is rejected from the system at entry and is performed at entry time, rather than on batches of data. The actual process of data cleansing may involve removing typographical errors or validating and correcting values against a known list of entities. The validation may be strict (such as rejecting any address that does not have a valid postal code) or fuzzy (such as correcting records that partially match existing, known records).