background preloader

Coding for Journalists 101 : A four-part series

Coding for Journalists 101 : A four-part series
Photo by Nico Cavallotto on Flickr Update, January 2012: Everything…yes, everything, is superseded by my free online book, The Bastards Book of Ruby, which is a much more complete walkthrough of basic programming principles with far more practical and up-to-date examples and projects than what you’ll find here. I’m only keeping this old walkthrough up as a historical reference. So check it out: The Bastards Book of Ruby -Dan Update, Dec. 30, 2010: I published a series of data collection and cleaning guides for ProPublica, to describe what I did for our Dollars for Docs project. So a little while ago, I set out to write some tutorials that would guide the non-coding-but-computer-savvy journalist through enough programming fundamentals so that he/she could write a web scraper to collect data from public websites. As the tutorials are aimed at people who aren’t experienced programming, the code is pretty verbose, pedantic, and in some cases, a little inefficient. Related:  Data Journalism

Masterclass 20: Getting started in data journalism If you are impatient to get started, and just quickly do some data journalism, click here If you aren't a subscriber, you'll need to sign up before you can access the rest of this masterclass If you want to find out what data journalism is, and what it's for, before you get stuck in, then read on, or click on the video or audio files Video: Are you confused about what data journalism is, how you do it, and what its purpose is? If so, join the club. There is a mystique surrounding data journalism; it’s almost like it’s a dark art and you have to be a wizard to practise it. A very few people are brilliant at it, a number have dabbled in it, loads of journalists think they probably ought to find out about it, but most fear they probably won’t be able to master it. All this throws up a smoke screen about the subject that I hope to dispel in this masterclass. What data journalism is I am to show what data journalism is, what it can do, and how to do it.

What could a journalist do with ScraperWiki? A quick guide | Scraperwiki Data Blog For non-programmers, a first look at ScraperWiki’s code could be a bit scary, but we want journalists and researchers to make use of the site, so we’ve set up a variety of initiatives to do that. Firstly, we’re setting up a number of Hacks and Hacker Days around the UK, with Liverpool as our first stop outside of London. You can follow this blog or visit our eventbrite page to find out more details. Secondly, our programmers are teaching ScraperWiki workshops and classes around the UK. Anna Powell-Smith took ScraperWiki to the Midlands, and taught Paul Bradshaw’s MA students at Birmingham City University the basics. Julian Todd ran a ‘Scraping 101′ session at the Centre for Investigative Journalism summer school last weekend. You can see his slides here at this link. Julian explained just why ScraperWiki is useful… Your options for webscraping1. Number 3 is where ScraperWiki, a place for sharing scrapers, comes in. (Some more general points from the session can be read here)

An introduction to data scraping with Scraperwiki Last week I spent a day playing with the screen scraping website Scraperwiki with a class of MA Online Journalism students and a local blogger or two, led by Scraperwiki’s own Anna Powell-Smith. I thought I might take the opportunity to try to explain what screen scraping is through the functionality of Scraperwiki, in journalistic terms. It’s pretty good. Why screen scraping is useful for journalists Screen scraping can cover a range of things but for journalists it, initially, boils down to a few things: information from somewhere it somewhere that you can get to it later And in a that makes it easy (or easier) to analyse and interrogate So, for instance, you might use a screen scraper to gather information from a local police authority website, and store it in a lovely spreadsheet that you can then sort through, average, total up, filter and so on – when the alternative may have been to print off 80 PDFs and get out the highlighter pens, Post-Its and back-of-a-fag-packet calculations.

Needlebase Telling Better Stories by Designing Custom Maps Using TileMill Plotting information — say survey data in Pakistan’s Federally Administered Tribal Areas or election results in Afghanistan — on any kind of map adds critical geo-context to the data. These maps quickly become move powerful when you start adding more custom overlays, showing data like where different ethnic groups live, high incidents of corruption, or more complex visuals like the number of deaths per drone strike in Pakistan and which U.S. president ordered it. What is really amazing is how accessible it is now for people to make custom maps to be able to tell more complex stories with data. Specifically, tools like Google Maps, OpenLayers, and Polymaps have made basic web mapping ubiquitous by making it simple to drop a map into a website, and their APIs open the door for everyone to customize maps by adding custom layers. The trick now is to radically reduce the barrier to entry for making these overlays and custom base maps.

#media140 – Carlos Alonso’s favourite tools to finds stories behind the data | Editors' Blog Here at we understand data is one of the buzzwords in journalism at the moment, it is why we have built our news:rewired conference around the topic, and its popularity was certainly clear from the packed room at Media140 today, where journalist and online communications specialist Carlos Alonso spoke on the topic. Alonso first discussed why the use of data itself is not new, illustrating this with the use of data in the 1800s to pinpoint deaths of cholera geographically, which then led to the finding that many occurred close to a specific well, or the mapping of revolutions in Scotland or England in 1786 to map where conflict was taking place. The golden age of using data mining was in the 1700s and 1800s. This talk focuses on the first parts of the journalistic process, sourcing and processing of data to find stories. Once you have the data you must first clean it and figure out what the important data is, we’re looking for what is behind this. Similar posts:

Online Journalism Blog How many people live in poverty in America? It's only a month since the official estimates showed 46.2m Americans living below the poverty line, which is 15.2% of the population. But today, new figures from the US Census Bureau show that another 3m people are living below the poverty line, one in six people. What's happened? Basically, the Census Bureau has come up with a new way of counting the poor. Crucially, that measure missed out three things: • The effect of federal programmes to reduce poverty, such as tax credits or food stamps• The huge costs of medical care or the cost of transport in getting to and from work• The changing make-up of families, more single parents and divorced households How different grouops changed. The new measure takes account of those key indicators and is important, says Alan Berube, a senior research fellow at Brookings, who last week published a report on the super poor. Especially in the light of proposed budget cuts to federal programmes. New measure

Introducing the Bitly Media Map! Last March, Bitly teamed up with Forbes to produce a data visualization which looks at how 15 media properties are being disproportionately consumed online on a state-by-state basis over the month of April. We had various preconceived notions of which state’s residents are more likely to consume news sites from certain newspapers, televised news, news magazines and online-only news properties. For example, we believed that Fox News had a stranglehold on the south of the U.S., though maybe CNN might be able to take Georgia with their hometown advantage. While the visualization and analysis over a month’s worth of data answered many of our questions, it inspired us to ask more questions that could not be determined with such a static/bulk data approach. The previous visualization was based on one large, finite dataset. Disproportionality versus Raw Counts The Forget Table Building the Interactive Map Selecting the Media Properties And that is how the Bitly Media Map came together.

The story of getting Twitter data and its “missing middle” We’ve tried hard, but sadly we are not able to bring back our Twitter data tools. Simply put, this is because Twitter have no route to market to sell low volume data for spreadsheet-style individual use. It’s happened to similar services in the past, and even to blog post instructions. There’s lots of confusion in the market about the exact rules, and why they happen. This blog post tries to explain them clearly! How can you get Twitter data? There are four broad ways. 1. There are two problems with this route – firstly to developers it sets the expectation that you can do whatever the API allows. Secondly, it is unfair to non-programmers, who can’t get access to data which programmers easily can. 2. As soon as it gets serious, they should join the Twitter Certified Program to make sure Twitter approve of the app. These applications can’t allow general data analysis and coding by their users – they have to have specific canned dashboards and queries. 3. 4. Why do Twitter restrict data use?