background preloader

Data science

Data science
We’ve all heard it: according to Hal Varian, statistics is the next sexy job. Five years ago, in What is Web 2.0, Tim O’Reilly said that “data is the next Intel Inside.” But what does that statement mean? Why do we suddenly care about statistics and about data? In this post, I examine the many sides of data science — the technologies, the companies and the unique skill sets. The web is full of “data-driven apps.” One of the earlier data products on the Web was the CDDB database. Google is a master at creating data products. Google’s breakthrough was realizing that a search engine could use input other than the text on the page. Flu trends Google was able to spot trends in the Swine Flu epidemic roughly two weeks before the Center for Disease Control by analyzing searches that people were making in different regions of the country. Google isn’t the only company that knows how to use data. In the last few years, there has been an explosion in the amount of data that’s available.

Journalism Needs Data in 21st Century Journalism has always been about reporting facts and assertions and making sense of world affairs. No news there. But as we move further into the 21st century, we will have to increasingly rely on “data” to feed our stories, to the point that “data-driven reporting” becomes second nature to journalists. The shift from facts to data is subtle and makes perfect sense. You could that say data are facts, with the difference that they can be computed, analyzed, and made use of in a more abstract way, especially by a computer. With this mindset, finding mainstream data-driven stories doesn’t take long at all. There is nothing new about pointing out the importance of public data being made available. Thus far, this has made a lot of sense to me, and I have been tracking the publication of linked data and increasing access to public knowledge as emerging trends over at Talis. First, there was data.gov and President Obama’s call for more access to government data.

The Joy of Stats with Hans Rosling The Joy of Stats, a one-hour documentary, hosted by none other than the charismatic Hans Rosling, explores the growing importance of statistics: [W]ithout statistics we are cast adrift on an ocean of confusion, but armed with stats we can take control of our lives, hold our rulers to account and see the world as it really is. What's more, Hans concludes, we can now collect and analyse such huge quantities of data and at such speeds that scientific method itself seems to be changing. From the description, it sounds like they'll touch on Crimespotting by Stamen, Google Translation, among other data-driven projects. Below is a four-minute clip of Rosling presenting world development in the context of income versus lifespan. The Joy of Stats airs on the BBC next Tuesday. [BBC via @krees | Thanks, Shawn]

Top 10 on Health Affairs Blog: Implementing Reform and More June 11th, 2010 In the past two months at Health Affairs Blog, law professor Timothy Jost has posted an ongoing series on implementing health reform, analyzing what’s in the new legislation. Other “most-read” posts have summarized key points from Health Affairs’ major theme issue on “Reinventing Primary Care,” and have highlighted the writings of Don Berwick, President Obama’s pick to head the Centers for Medicare and Medicaid Services. We offer here links to the top 10 most-read posts for April and May. Email This Post Print This Post Don't miss the insightful policy recommendations and thought-provoking research findings published in Health Affairs. to the #1 source of health policy research.

RECHERCHE • Big Brother au service des sciences sociales Toutes les informations que nous fournissons sur les réseaux sociaux ou par le biais des téléphones portables constituent des bases de données inespérées pour les chercheurs qui étudient le comportement humain. Every move you make… I’ll be watching you [A chacun de tes gestes… Je te regarderai]. Comme dans la chanson de The Police, chacun de vos mouvements et chacun de vos écrits postés sur Twitter – également appelés des tweets – sont enregistrés quelque part. Vous n’y réfléchissez peut-être pas à deux fois, mais lorsque vous utilisez un réseau social ou un téléphone portable, vous laissez derrière vous une trace numérique qui décrit vos comportements, vos déplacements et vos préférences, dévoile qui sont vos amis et révèle vos humeurs et vos opinions. De leur côté, les sociologues doivent généralement se contenter de simples questionnaires ou d’entretiens pour collecter des données et vérifier leurs théories. Prédire le résultat d’un vote D’après M. Nos déplacements espionnés

Deja VVVu: Others Claiming Gartner’s Construct for Big Data In the late 1990s, while a META Group analyst (Note: META is now part of Gartner), it was becoming evident that our clients increasingly were encumbered by their data assets. While many analysts were talking about, many clients were lamenting, and many vendors were seizing the opportunity of these fast-growing data stores, I also realized that something else was going on. Sea changes in the speed at which data was flowing mainly due to electronic commerce, along with the increasing breadth of data sources, structures and formats due to the post Y2K-ERP application boom were as or more challenging to data management teams than was the increasing quantity of data. In an attempt to help our clients get a handle on how to recognize, and more importantly, deal with these challenges I began first speaking at industry conferences on this 3-dimensional data challenge of increasing data volume, velocity and variety. Date: 6 February 2001 Author: Doug Laney Data Volume Data Velocity Data Variety

How to be a data journalist Data journalism is huge. I don't mean 'huge' as in fashionable - although it has become that in recent months - but 'huge' as in 'incomprehensibly enormous'. It represents the convergence of a number of fields which are significant in their own right - from investigative research and statistics to design and programming. The idea of combining those skills to tell important stories is powerful - but also intimidating. The reality is that almost no one is doing all of that, but there are enough different parts of the puzzle for people to easily get involved in, and go from there. 1. 'Finding data' can involve anything from having expert knowledge and contacts to being able to use computer assisted reporting skills or, for some, specific technical skills such as MySQL or Python to gather the data for you. 2. 3. 4. Tools such as ManyEyes for visualisation, and Yahoo! How to begin? So where does a budding data journalist start? At these moments some programming knowledge comes in handy.

Statisticien : un métier sexy, peu stressant et qui aura la côte en 2020 - Statosphère, les statistiques du web et d'ailleurs Le métier de statisticien serait voué à un magnifique avenir si on en croit une poignée d'articles publiés ces dernières semaines. Sergey Brin, co-fondateur de Google, en est même la preuve vivante puisque depuis sa rencontre avec Larry Page en 1995, son crédo n'a cessé d'être le data mining, c'est à dire l'analyse de données statistiques (comme en témoigne son profil publiée en 1998 sur le site de l'université de Stanford). Les travaux effectués sur le moteur de recherche constituent une preuve magistrale de cet attachement obsessionnel aux données statistiques : Google est aujourd'hui une des meilleures solutions pour tirer des informations pertinentes à partir de milliards de données. Rapides, exhaustives et faciles d'accès, les nouvelles technologies de l'information offrent des possibilités insoupçonnées jusqu'alors. Hal Varian, chef économiste depuis 2002 chez Google, va même plus loin : pour lui, "le job sexy des dix prochaines années sera celui de statisticien".

Art Historian John Berger on Female Objectification by Lisa Wade, PhD, Jul 25, 2010, at 10:23 am Last week I linked to the first episode of the 1972 BBC documentary, Ways of Seeing (thanks again to Christina W.). The second episode, partially embedded below offers an art historian’s perspective on the objectification of women in European art and advertising, starting with paintings of nude women. “To be naked,” he argues, “is to be oneself. To be nude is to be seen naked by others and yet not recognized for oneself. A nude has to be seen as an object in order to be a nude… they are there to feed an appetite, not to have any of their own.” And there’s a very provocative statement about hair and hairlessness (down there) in the midst. Parts One and Two of Four:

Unmeasurable Science On Wednesday PLoS BLOGs launched with a splash. We (both PLoS BLOGs as a whole and me individually) got a lot of positive feedback and words of encouragement – so we are off to a good start. As both our community manager Brian Mossop and myself are currently in London for the Science Online London Conference, we could celebrate the launch in person. With a good pint of British ale Thursday evening. Today I want to talk about something that is sticking in my head since a conversation a few weeks ago with some friends (all esteemed professors in biology or medicine) over another beer. This is all good and well in the sense that researchers should be held accountable for how they are using their funding, often from public sources. We don’t really know how to evaluate science, particularly in numbers that can be used to compare research projects. The evaluation of science is taking up more and more of our time that is then missing time for doing research.

Related: