background preloader

Data

Facebook Twitter

4 free data tools for journalists (and snoops) Note: The following is an excerpt from Pete Warden’s free ebook “Where are the bodies buried on the web? Big data for journalists.” There’s been a revolution in data over the last few years, driven by an astonishing drop in the price of gathering and analyzing massive amounts of information. It only cost me $120 to gather, analyze and visualize 220 million public Facebook profiles, and you can use 80legs to download a million web pages for just $2.20. Those are just two examples. The technology is also getting easier to use. What does this mean for journalists? Many of you will already be familiar with WHOIS, but it’s so useful for research it’s still worth pointing out. You can also enter numerical IP addresses here and get data on the organization or individual that owns that server.

Blekko The newest search engine in town, one of Blekko’s selling points is the richness of the data it offers. The first tab shows other sites that are linking to the current domain, in popularity order. Introduction to Information Retrieval. This is the companion website for the following book. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. You can order this book at CUP, at your local bookstore or on the internet. The best search term to use is the ISBN: 0521865719.

The book aims to provide a modern approach to information retrieval from a computer science perspective. We'd be pleased to get feedback about how this book works out as a textbook, what is missing, or covered in too much detail, or what is simply wrong. Online resources Apart from small differences (mainly concerning copy editing and figures), the online editions should have the same content as the print edition.

The following materials are available online. Information retrieval resources A list of information retrieval resources is also available. Introduction to Information Retrieval: Table of Contents. A computational journalism reading list. [Last updated: 18 April 2011 -- added statistical NLP book link] There is something extraordinarily rich in the intersection of computer science and journalism. It feels like there’s a nascent field in the making, tied to the rise of the internet. The last few years have seen calls for a new class of “programmer journalist” and the birth of a community of hacks and hackers.

Meanwhile, several schools are now offering joint degrees. But we’ll need more than competent programmers in newsrooms. I’d like to propose a working definition of computational journalism as the application of computer science to the problems of public information, knowledge, and belief, by practitioners who see their mission as outside of both commerce and government. “Computational journalism” has no textbooks yet. Data journalism Data journalism is obtaining, reporting on, curating and publishing data in the public interest. Tamara Munzner’s chapter on visualization is the essential primer. Data | The World Bank. The 70 Online Databases that Define Our Planet. Back in April, we looked at an ambitious European plan to simulate the entire planet.

The idea is to exploit the huge amounts of data generated by financial markets, health records, social media and climate monitoring to model the planet’s climate, societies and economy. The vision is that a system like this can help to understand and predict crises before they occur so that governments can take appropriate measures in advance. There are numerous challenges here. Nobody yet has the computing power necessary for such a task, neither are there models that will can accurately model even much smaller systems. Today, we get a grand tour of this challenge from Dirk Helbing and Stefano Balietti at the Swiss Federal Institute of Technology in Zurich. It turns out that there are already numerous sources of data that could provide the necessary fuel to power Helbing’s Earth Simulator. WikipediaWikipedia is the most famous cooperatively edited encyclopedia.

Where’s George? Tools to help bring data to your journalism « Michelle Minkoff. NOTE: This entry was modified on the evening of 11/9/10 to deal with typos and missing words, resulting from posting this too late the previous night. Sleep deprivation isn’t always a good thing — although it allows one to do things more fun than sleep. Like play with data. Note to self: Be more careful in the future. Many of the stories we do every day, across beats, could benefit from a data component. Luckily, a lot of great design and programming folks have created tools to make it easier to organize, clean and display data. So, here’s a round up of some tools you can use to rapidly produce data pieces without programming knowledge. Prepping tables Tableizer – – Copy and paste cells from your Excel spreadsheet into this tool, and it’ll spit back a formatted HTML table that you can copy and paste into a CMS of your choice.

Interactive viz – no programming Static viz Use programming to make custom charts. Real-Time Data And A More Personalized Web - Smashing Magazine. Advertisement As Web designers, we face a daily struggle to keep pace with advances in technology, new standards and new user expectations. We spend a large part of our working life dipping in and out of recent developments in an attempt to stay both relevant and competitive, and while this is what makes our industry so exciting to be a part of, it often becomes all too easy to get caught up in the finer details. Responsive Web design, improved semantics and rich Web typography have all seen their fair share of the limelight over the last year, but two developments in particular mark true milestones in the maturation of the Web: “real-time data” and a more “personalized Web.” Since the arrival of the new Web, we’ve been enraptured by social media. We share links, we “follow,” we “poke,” we’ve become accustomed to it all.

Through no fault of our own, we’ve become lazy users. Welcome to the new era. Real-Time Data Question: What do Google Analytics and printed newspapers have in common? (al) Open data cook book | Making open data accessible for everyone. Get the Data: Open Data Q&A Forum. Big Data : Making sense at scale. D'un récent voyage dans la Silicon Valley (merci aux amis du Orange Institute), je rentre avec une conviction : tout ce que nous connaissions du web va changer à nouveau avec le phénomène des big data. Il pose à nouveau, sur des bases différentes, presque toutes les questions liées à la transformation numérique. En 2008, l’humanité a déversé 480 milliards de Gigabytes sur Internet. En 2010, ce furent 800 milliards de Gygabytes, soit, comme l’a dit un jour Eric Schmidt, plus que la totalité de ce que l’humanité avait écrit, imprimé, gravé, filmé ou enregistré de sa naissance jusqu’en 2003.

Ces données ne sont pas toutes des oeuvres. Naviguer dans ce nouveau web demande une nouvelle science. L’actualité de la semaine nous a donné une petite illustration de ce qui se passe à grande échelle. Le web était globalement transactionnel. Le web, aujourd'hui, produit aujourd’hui des masses de données, des masses de sens, qui échappent complètement aux principaux acteurs.

Orange - Data Mining Fruitful & Fun. DataMarket - Find and Understand Data — DataMarket. 5 Ways to find, mix and mash your data :: 10,000 Words. One of the most popular trends in online journalism is taking publicly available data and translating it into visualizations or infographics that readers and viewers can quickly and easily understand. A large percentage of the visualizations you see on the web were built from scratch, which can take a considerable amount of time and effort. The following sites allow you to mash your data in record time. Swivel Swivel features more than 15,000 data sets for users to play with in various categories ranging from Economics to Health to Technology. From the data, users have created hundreds of thousands of graphs, charts and infographics, including the one below that visualizes the amount of rainfall in California since 1870. You can get started by copying and pasting your data or uploading an Excel spreadsheet or CSV file to the site.

Socrata Socrata is an online space for data lovers to browse datasets as well as create new visualizations to share with others. Widgenie Verifiable DataMasher. Real-Time Data And A More Personalized Web - Smashing Magazine.