background preloader

How to be a data journalist

How to be a data journalist
Data journalism is huge. I don't mean 'huge' as in fashionable - although it has become that in recent months - but 'huge' as in 'incomprehensibly enormous'. It represents the convergence of a number of fields which are significant in their own right - from investigative research and statistics to design and programming. The idea of combining those skills to tell important stories is powerful - but also intimidating. The reality is that almost no one is doing all of that, but there are enough different parts of the puzzle for people to easily get involved in, and go from there. 1. 'Finding data' can involve anything from having expert knowledge and contacts to being able to use computer assisted reporting skills or, for some, specific technical skills such as MySQL or Python to gather the data for you. 2. 3. 4. Tools such as ManyEyes for visualisation, and Yahoo! How to begin? So where does a budding data journalist start? At these moments some programming knowledge comes in handy.

Journalism Needs Data in 21st Century Journalism has always been about reporting facts and assertions and making sense of world affairs. No news there. But as we move further into the 21st century, we will have to increasingly rely on "data" to feed our stories, to the point that "data-driven reporting" becomes second nature to journalists. The shift from facts to data is subtle and makes perfect sense. You could that say data are facts, with the difference that they can be computed, analyzed, and made use of in a more abstract way, especially by a computer. With this mindset, finding mainstream data-driven stories doesn't take long at all. There is nothing new about pointing out the importance of public data being made available. Thus far, this has made a lot of sense to me, and I have been tracking the publication of linked data and increasing access to public knowledge as emerging trends over at Talis. First, there was and President Obama's call for more access to government data.

Data journalism training – some reflections I recently spent 2 days teaching the basics of data journalism to trainee journalists on a broadsheet newspaper. It’s a pretty intensive course that follows a path I’ve explored here previously – from finding data and interrogating it to visualizing it and mashing – and I wanted to record the results. My approach was both practical and conceptual. Conceptually, the trainees need to be able to understand and communicate with people from other disciplines, such as designers putting together an infographic, or programmers, statisticians and researchers. They need to know what semantic data is, what APIs are, the difference between a database and open data, and what is possible with all of the above. They need to know what design techniques make a visualisation clear, and the statistical quirks that need to be considered – or looked for. But they also need to be able to do it. The importance of editorial drive It’s not long before the journalists raise statistical issues – which is reassuring.

Big Data Technology Evaluation Checklist Anyone who’s been following the rapid-fire technology developments in the world that is becoming known as “big data” sees a new capability, product, or company founded literally every week. The ambition of all of these players, established and newcomer, is tremendous, because the potential value to business is enormous. Each new arrival is aimed at addressing the pain that enterprises are experiencing around unrelenting growth in the velocity, volume, and variety of the data their operations generate. What’s being lost, however, in some of this frothy marketing activity, is that it’s still early for big data technologies. There are vexing problems slowing the growth and the practical implementation of big data technologies. When evaluating big data technology, it can be valuable to ask companies about their ability to deliver some of these fundamental capabilities. In this article, we examine some of the big data requirements that are partially defined or in early stages of maturity.

40 Essential Tools and Resources to Visualize Data One of the most frequent questions I get is, "What software do you use to visualize data?" A lot of people are excited to play with their data, but don't know how to go about doing it or even start. Here are the tools I use or have used and resources that I own or found helpful for data visualization – starting with organizing the data, to graphs and charts, and lastly, animation and interaction. Organizing the Data by sleepy sparrow Data are hardly ever in the format that you need them to be in. PHP was the first scripting language I learned that was well-suited for the Web, so I'm pretty comfortable with it. Python Most computer science types - at least the ones I've worked with - scoff at PHP and opt for Python mostly because Python code is often better structured (as a requirement) and has cooler server-side functions. MySQL When I have a lot of data - like on the magnitude of the tends to hundreds of thousands - I use PHP or Python to stick it in a MySQL database. Ah, good old R.

Data science We’ve all heard it: according to Hal Varian, statistics is the next sexy job. Five years ago, in What is Web 2.0, Tim O’Reilly said that “data is the next Intel Inside.” But what does that statement mean? Why do we suddenly care about statistics and about data? In this post, I examine the many sides of data science — the technologies, the companies and the unique skill sets. The web is full of “data-driven apps.” One of the earlier data products on the Web was the CDDB database. Google is a master at creating data products. Google’s breakthrough was realizing that a search engine could use input other than the text on the page. Flu trends Google was able to spot trends in the Swine Flu epidemic roughly two weeks before the Center for Disease Control by analyzing searches that people were making in different regions of the country. Google isn’t the only company that knows how to use data. In the last few years, there has been an explosion in the amount of data that’s available.

How to: get to grips with data journalism A graph showing the number of IEDs cleared from the Afghanistan War Logs Only a couple of years ago, the idea that journalists would need to know how to use a spreadsheet would have been laughed out of the newsroom. Now those benighted days are way behind us and extracting stories out of data is part of every journalist's toolkit of skills. Some people say the answer is to become a sort of super hacker, write code and immerse yourself in SQL. Of course, you could just ignore the whole thing, hope it'll go away and you can get back to longing to write colour pieces. 1) Sourcing the data This is a much undervalued skill - with many journalists simply outsourcing it to research departments and work experience students. But broadly, the general approach is to look for the most authoritative place for your data. GDP - from the Office for National Statistics. Carbon emissions from different countries - from the US Energy Information Agency. Adobe PDF files are the enemy of open data.

Growing Your Own Data Scientists | CITO Research CIOs and CTOs must learn to address a challenge, involving the divide between the people who know about the vast amount of new sources of data emanating from machines and other devices (“big data”) and the questions in the enterprise whose answers can be monetized. One group of people knows about the technology for analyzing data (they’re usually in IT). The other group understands the pernicious questions that would lead to an answer that is worth money to an organization (they’re usually on the business side). The role of the data scientist is a hybrid role that can solve this problem. While the definition of the role is compelling, it’s a lot easier to define the role than it is to hire someone to fill it, and even when you do, communication problems may persist. See these articles on for definitions of a data scientist from leading experts in the field: This problem statement addresses the challenge of “growing your own” data scientist. Context and Background

Junk Charts This post is part 2 of an appreciation of the chart project by Google Newslab, advised by Alberto Cairo, on the gender and racial diversity of the newsroom. Part 1 can be read here. In the previous discussion, I left out the following scatter bubble plot. This plot is available in two versions, one for gender and one for race. The key question being asked is whether the leadership in the newsroom is more or less diverse than the rest of the staff. The story appears to be a happy one: in many newsrooms, the leadership roughly reflects the staff in terms of gender distribution (even though both parts of the whole compare disfavorably to the gender ratio in the neighborhoods, as we saw in the previous post.) Unfortunately, there are a few execution problems with this scatter plot. First, take a look at the vertical axis labels on the right side. I find this decision confounding. The horizontal axis? Here is the same chart with improved axis labels: Re-labeling serves up a new issue.

Immersive games beats classroom in maths Well designed studyThe tested a hypothesis; that interactive maths games are more effective than classroom instruction. This was a well constructed study; The Effects of Modern Math Computer Games on Learners' Math Achievement and Math Course Motivation in a Public High School Setting, MansurehKebritchi, Ph.D., AtsusiHirumi, Ph.D. and HaiyanBai, Ph.D. They took 193 algebra students, control groups and then did evaluation through pre- and post-study assessments, surveys, classroom observations and interviews. Over 18 weeks, on average, students in the experimental group made gains of 8.07 points (out of 25), while students in the control group made gains of 3.74 points. They used an immersive video game world that engages students in the instruction and learning of mathematics. Teachers and students report improved mathsTeacher and student interviews support the quantitative findings.