background preloader

Données

Facebook Twitter

Software for Data Mining, Analytics, and Knowledge Discovery.

Datamining

Statisticien : un métier sexy, peu stressant et qui aura la côte en 2020 - Statosphère, les statistiques du web et d'ailleurs. Le métier de statisticien serait voué à un magnifique avenir si on en croit une poignée d'articles publiés ces dernières semaines. Sergey Brin, co-fondateur de Google, en est même la preuve vivante puisque depuis sa rencontre avec Larry Page en 1995, son crédo n'a cessé d'être le data mining, c'est à dire l'analyse de données statistiques (comme en témoigne son profil publiée en 1998 sur le site de l'université de Stanford).

Les travaux effectués sur le moteur de recherche constituent une preuve magistrale de cet attachement obsessionnel aux données statistiques : Google est aujourd'hui une des meilleures solutions pour tirer des informations pertinentes à partir de milliards de données. Rapides, exhaustives et faciles d'accès, les nouvelles technologies de l'information offrent des possibilités insoupçonnées jusqu'alors. Hal Varian, chef économiste depuis 2002 chez Google, va même plus loin : pour lui, "le job sexy des dix prochaines années sera celui de statisticien". The data analysis path is built on curiosity, followed by action. A traditional view of data analysis involves precision, preparation, and methodical examination of defined datasets. Philipp Janert, author of “Data Analysis with Open Source Tools,” has a somewhat different perspective. Those traditional elements are still important, but Janert also thinks simplicity, experimentation, action, and natural curiosity all shape effective data work.

He expands on these ideas in the following interview. Is data analysis inherently complicated? Philipp Janert: I observe a tendency to do something complicated and fancy; to bring in a statistical concept and other “sophisticated” stuff. Why not just look at the data set? Why do analysts shy from simplicity? PJ: I often perceive a great sense of insecurity in my co-workers when it comes to math. The classic case for me is that usually within the first three minutes of a conversation, people start talking about standard deviations.

What tool or method offers the best starting point for data analysis? Related: The Joy of Stats with Hans Rosling. The Joy of Stats, a one-hour documentary, hosted by none other than the charismatic Hans Rosling, explores the growing importance of statistics: [W]ithout statistics we are cast adrift on an ocean of confusion, but armed with stats we can take control of our lives, hold our rulers to account and see the world as it really is. What's more, Hans concludes, we can now collect and analyse such huge quantities of data and at such speeds that scientific method itself seems to be changing. From the description, it sounds like they'll touch on Crimespotting by Stamen, Google Translation, among other data-driven projects. Whatever they cover, it's bound to be interesting with Rosling at the front. Below is a four-minute clip of Rosling presenting world development in the context of income versus lifespan.

The material is more or less the same as his TED talk, but this time around, the motion chart isn't projected on a screen. The Joy of Stats airs on the BBC next Tuesday. Humanities Scholars Embrace Digital Technology. Humanities Scholars Embrace Digital Technology. Open Public Data, a New Resource for Innovation - une vidéo High-tech et Science. What is data science? We’ve all heard it: according to Hal Varian, statistics is the next sexy job.

Five years ago, in What is Web 2.0, Tim O’Reilly said that “data is the next Intel Inside.” But what does that statement mean? Why do we suddenly care about statistics and about data? In this post, I examine the many sides of data science — the technologies, the companies and the unique skill sets. The web is full of “data-driven apps.” Almost any e-commerce application is a data-driven application. There’s a database behind a web front end, and middleware that talks to a number of other databases and data services (credit card processing companies, banks, and so on). One of the earlier data products on the Web was the CDDB database.

Google is a master at creating data products. Google’s breakthrough was realizing that a search engine could use input other than the text on the page. Flu trends Google isn’t the only company that knows how to use data. Where data comes from 1956 disk drive Photo: Mike Loukides.

Journalisme de données