background preloader

Coding for Journalists 101 : A four-part series

Coding for Journalists 101 : A four-part series
Photo by Nico Cavallotto on Flickr Update, January 2012: Everything…yes, everything, is superseded by my free online book, The Bastards Book of Ruby, which is a much more complete walkthrough of basic programming principles with far more practical and up-to-date examples and projects than what you’ll find here. I’m only keeping this old walkthrough up as a historical reference. I’m sure the code is so ugly that I’m not going to even try re-reading it. So check it out: The Bastards Book of Ruby -Dan Update, Dec. 30, 2010: I published a series of data collection and cleaning guides for ProPublica, to describe what I did for our Dollars for Docs project. So a little while ago, I set out to write some tutorials that would guide the non-coding-but-computer-savvy journalist through enough programming fundamentals so that he/she could write a web scraper to collect data from public websites. DISCLAIMER: The code, data files, and results are meant for reference and example only.

http://danwin.com/2010/04/coding-for-journalists-101-a-four-part-series/

Related:  Data Journalism

Masterclass 20: Getting started in data journalism If you are impatient to get started, and just quickly do some data journalism, click here If you aren't a subscriber, you'll need to sign up before you can access the rest of this masterclass If you want to find out what data journalism is, and what it's for, before you get stuck in, then read on, or click on the video or audio files Video: Are you confused about what data journalism is, how you do it, and what its purpose is? If so, join the club.

Data Scraping Wikipedia with Google Spreadsheets Prompted in part by a presentation I have to give tomorrow as an OU eLearning community session (I hope some folks turn up – the 90 minute session on Mashing Up the PLE – RSS edition is the only reason I’m going in…), and in part by Scott Leslie’s compelling programme for a similar duration Mashing Up your own PLE session (scene scetting here: Hunting the Wily “PLE”), I started having a tinker with using Google spreadsheets as for data table screenscraping. So here’s a quick summary of (part of) what I found I could do. The Google spreadsheet function =importHTML(“”,”table”,N) will scrape a table from an HTML web page into a Google spreadsheet. The URL of the target web page, and the target table element both need to be in double quotes. The number N identifies the N’th table in the page (counting starts at 0) as the target table for data scraping.

An Introduction to Compassionate Screen Scraping Screen scraping is the art of programatically extracting data from websites. If you think it's useful: it is. If you think it's difficult: it isn't. Network Graph - Fusion Tables Help Current limitations include: The visualization will only show relationships from the first 100,000 rows in a table. A filter can include rows from 100,001 or beyond in the table, but the graph will still not display them. Internet Explorer 8 and below are not supported.

What could a journalist do with ScraperWiki? A quick guide For non-programmers, a first look at ScraperWiki’s code could be a bit scary, but we want journalists and researchers to make use of the site, so we’ve set up a variety of initiatives to do that. Firstly, we’re setting up a number of Hacks and Hacker Days around the UK, with Liverpool as our first stop outside of London. You can follow this blog or visit our eventbrite page to find out more details.

Beautiful Soup: We called him Tortoise because he taught us. You didn't write that awful page. You're just trying to get some data out of it. Beautiful Soup is here to help. Since 2004, it's been saving programmers hours or days of work on quick-turnaround screen scraping projects. Scraping for Journalism: A Guide for Collecting Data Photo by Dan Nguyen/ProPublica Our Dollars for Docs news application lets readers search pharmaceutical company payments to doctors. We’ve written a series of how-to guides explaining how we collected the data. Most of the techniques are within the ability of the moderately experienced programmer. Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I A couple of weeks ago, I came across Gephi, a desktop application for visualising networks. And quite by chance, a day or two after I was asked about any tools I knew of that could visualise and help analyse social network activity around an OU course… which I take as a reasonable justification for exploring exactly what Gephi can do :-) So, after a few false starts, here’s what I’ve learned so far… First up, we need to get some graph data – netvizz – facebook to gephi suggests that the netvizz facebook app can be used to grab a copy of your Facebook network in a format that Gephi understands, so I installed the app, downloaded my network file, and then uninstalled the app… (can’t be too careful ;-)

An introduction to data scraping with Scraperwiki Last week I spent a day playing with the screen scraping website Scraperwiki with a class of MA Online Journalism students and a local blogger or two, led by Scraperwiki’s own Anna Powell-Smith. I thought I might take the opportunity to try to explain what screen scraping is through the functionality of Scraperwiki, in journalistic terms. It’s pretty good. Why screen scraping is useful for journalists Screen scraping can cover a range of things but for journalists it, initially, boils down to a few things: information from somewhere

Creating a Scraper for Multiple URLs Using Regular Expressions Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.You should run these to discover the Hub. NOTE: This tutorial was created using version 0.8.2. Statistics: Making Sense of Data About the Course We live in a world where data are increasingly available, in ever larger quantities, and are increasingly expected to form the basis for decisions by governments, businesses, and other organizations, as well as by individuals in their daily lives. To cope effectively, every informed citizen must be statistically literate. This course will provide an intuitive introduction to applied statistical reasoning, introducing fundamental statistical skills and acquainting students with the full process of inquiry and evaluation used in investigations in a wide range of fields.

Related: