Periodismo de datos

Thirteenth Training Session. Squeezing Statistical Sites. Medialab Prado Español Data Journalism. Thirteenth Training Session. Squeezing Statistical Sites 19.06.2013 17:00h - 20:00h Place: Auditorio (2ª planta/2nd Floor) Current statistical sites allow us to acces a big amount of data to accomplish new journalistic projects of all kinds. In this new session of the Data Journalism Work Group we will invite Open Data and statistics experts to learn how to use this sites in Spain. Other articles Proyecto Censo desarrollado por La Nación de Argentina. Programa Alberto González Yanes (ISTAC): ¿Cómo funciona un instituto de estadística y para qué te puede servir? Esther Minguela (LocaliData): Consejos prácticos para acceder a los datos de un instituto de estadística y aplicaciones que se pueden crear con estos datos.

Carlos Gil Bellosta: La Encuesta de Población Activa y sus limitaciones. Carlos Peña Dorta (Arte Consultores): Stat4You, compartiendo datos estadísticos.

The reality is that almost no one is doing all of that, but there are enough different parts of the puzzle for people to easily get involved in, and go from there. To me, those parts come down to four things: 1. 'Finding data' can involve anything from having expert knowledge and contacts to being able to use computer assisted reporting skills or, for some, specific technical skills such as MySQL or Python to gather the data for you. 2. 3. 4. Tools such as ManyEyes for visualisation, and Yahoo! How to begin? So where does a budding data journalist start?

Finding Stories in the Structure of Data. PolitiFact scores Matt Waite sees structure in unstructured data, and you should too We’ve been telling stories since we invented language. You can imagine hunter-gatherers sitting around the fire talking about the hunt, or the direction of the herd, about the coming winter or whatever it was we needed to talk about in order to survive. We’ve become pretty good at telling stories over the millennia.

Periodista, pregúntate qué puede hacer una buena Ley de Transparencia por ti. Desde el primer día, el propósito de (así como de Access Info Europe y de la Fundación Civio) ha sido tratar de facilitar, infundir curiosidad y propagar la práctica de solicitar información a cualquier ciudadano, no solo a los profesionales de información. Aquí va un ejemplo. No obstante, necesitamos unos medios de comunicación conscientes de la importancia de contar con una buena Ley de Transparencia y de un derecho de acceso a la información plenamente reconocido y garantizado. Y, quizá, menos distraídos por el tira y afloja y las declaraciones partidistas e interesadas que están acompañando a la tramitación del texto.

El derecho de acceso a la información pública, con leyes que de verdad lo protegen, es un filón de noticias para los medios de comunicación en otros países. No hay más enigma: a más clara, específica y ambiciosa la norma, mayor es el deber de la administración de sacar los datos de su alforja, y mayores las salvaguardas para que tenga que cumplirlo.

There is something extraordinarily rich in the intersection of computer science and journalism.

It feels like there's a nascent field in the making, tied to the rise of the internet. The last few years have seen calls for a new class of "programmer journalist" and the birth of a community of hacks and hackers. Meanwhile, several schools are now offering joint degrees. But we'll need more than competent programmers in newsrooms. What are the key problems of computational journalism? I'd like to propose a working definition of computational journalism as the application of computer science to the problems of public information, knowledge, and belief, by practitioners who see their mission as outside of both commerce and government. "Computational journalism" has no textbooks yet.

In this week’s class, we discussed clustering algorithms and their application to journalism. As an example, we built a distance metric to measure the similarity of the voting history between two members of the UK House of Lords, and used it with multi-dimensional scaling to visualize the voting blocs. The data comes from The Public Whip, an independent site that scrapes the British parliamentary proceedings (the “hansard“) and extracts the voting record into a database. The files for the House of Lords are here. They’re tab-delimited, which is not the easiest format for R to read, so I opened them in Excel and re-saved as CSV. I also removed the descriptive header information from votematrix-lords.txt (which, crucially, explains how the vote data is formatted.)

The converted data files plus the scripts I used in class are up on GitHub. Then start R, and enter source(“lords-votes.R”) You should see this (click for larger): And voila! Also, the chart is very abstract. ¿Qué es el periodismo de datos? (Curso periodismo de datos 1/10) Desde hoy y durante las próximas semanas, iremos publicando en Irekia una serie de tutoriales y vídeos elaborados durante el curso de periodismo de datos que impartieron Mar Cabra y David Cabo el pasado mes de junio en las tres capitales vascas al que asistieron más de 70 periodistas.

Estos materiales, disponibles para la ciudadanía en general, buscan profundizar en esta vertiente del periodismo y potenciar el uso, tratamiento y análisis de los datos públicos liberados en open data (open data Euskadi en el caso vasco) para que los periodistas o cualquier ciudadano pueda confeccionar sus propias informaciones.

The risk is that images are easily faked, scraped and manipulated. News organizations and others seeking to source images and information from the crowd therefore have no choice but to push forward with new methods of verification — and to make existing methods quicker and more accurate. So it’s no surprise that we’re seeing initial moves towards automating aspects of the verification process. The Guardian and Scoopshot both recently unveiled new initiatives to bring an element of automation to verification. Authenticity scoring Scoopshot is a crowdsourced photography service that enables news organizations to source (and assign) photographs from their community and from users around the world.

La disciplina de los macrodatos o big data está dando mucho que hablar desde finales del 2012 y principios de 2013.

Aunque estas disciplinas no están muy extendidas todavía, su uso por parte de grandes corporaciones está empezando a levantar mucha expectación dentro del mundo empresarial. Muchos se preguntan cuál es la utilidad de los macrodatos o el bigdata para las empresas. Las estimaciones para el desarrollo de la industria y aplicaciones relacionadas con macrodatos hablan de 23.800 millones de dólares para 2016 y crecimientos del 41% anual en los próximos años.

Y es que, se está demostrando que el uso de datos está convirtiéndose en una de las mayores fuentes de ventajas competitivas para: Comprender el entorno competitivo que les rodea, Entender mejor a sus clientes Y ofrecer nuevos productos y servicios.

Aside from Microsoft Excel (which I've blogged about here and here), these are the tools which are, in my opinion, the most useful for data journalism at the moment. Google Fusion Tables The gateway drug for most data journalists, Google Fusion Tables is a user-friendly mapping tool which allows users to upload their data to the application, select the columns of data they would like to map and simply create the map.

It also allows users to pinpoint various areas on the map which can be interacted with to show, for example, the name of and data for that particular street when exploring the map. Treemap TreeMap provides an easy, yet extremely powerful means of creating beautiful treemaps for analytical and presentation purpose. Importing data from a wide variety of file formats (including Excel), as well as connecting to databases (such as MySQL and SQL Server) it's user friendly and scales to big data.

