background preloader

Big Data

Facebook Twitter

Google Flu Trends: The Limits of Big Data. Google Flu Trends, once a poster child for the power of big-data analysis, seems to be under attack.

Google Flu Trends: The Limits of Big Data

This month, in a Science magazine article, four quantitatively adept social scientists reported that Google’s flu-tracking service not only wildly overestimated the number of flu cases in the United States in the 2012-13 flu season — a well-known miss — but has also consistently overshot in the last few years. Google Flu Trends’ estimate for the 2011-12 flu season was more than 50 percent higher than the cases reported by the Centers for Disease Control and Prevention.

And, they wrote, for a period of more than two years ending in September 2013, the Google estimates were high in 100 out of 108 weeks. The article, “The Parable of Google Flu: Traps in Big Data Analysis,” declared that Google was guilty of “big data hubris,” which the authors defined as the implicit assumption that big data sets trump traditional data collection and analysis. “It gives you that near real-time signal,” Mr. MapReduce. Un article de Wikipédia, l'encyclopédie libre.

MapReduce

Les termes « map » et « reduce », et les concepts sous-jacents, sont empruntés aux langages de programmation fonctionnelle utilisés pour leur construction (map et réduction de la programmation fonctionnelle et des langages de programmation tableau). MapReduce permet de manipuler de grandes quantités de données en les distribuant dans un cluster de machines pour être traitées. Ce modèle connaît un vif succès auprès de sociétés possédant d'importants centres de traitement de données telles Amazon ou Facebook. Il commence aussi à être utilisé au sein du Cloud computing. Hadoop. Un article de Wikipédia, l'encyclopédie libre.

Hadoop

Hadoop a été créé par Doug Cutting et fait partie des projets de la fondation logicielle Apache depuis 2009. Historique[modifier | modifier le code] Brazilian Students Dig for Corruption. Sizing Up Big Data, Broadening Beyond the Internet. HOW WE FEEL A visual representation of recent sentiment, as expressed on the Internet.

Sizing Up Big Data, Broadening Beyond the Internet

Good feelings are brighter; negative ones are darker. In his young career, Jeffrey Hammerbacher has been a scout on the frontiers of the data economy. Looking at Facebook's Friend and Relationship Status Through Big Data. Données le vertige. Des flots d’octets, un océan de données, un déluge de connaissances… A mesure qu’Internet tisse sa toile, le volume d’informations numérisées n’en finit plus d’exploser.

Données le vertige

D’ici huit ans, cette masse vertigineuse de «datas» sera 50 fois supérieure à ce qu’elle est aujourd’hui, prédit le cabinet d’études IDC. Et il faudra dix fois plus de serveurs informatiques pour espérer gérer cette déferlante. Pas par crainte d’être submergés, mais plutôt pour être en mesure de retrouver, d’extraire et d’exploiter cette nouvelle manne. Il y a vingt ans, nous stockions encore nos fichiers sur des disques durs de quelques mégaoctets (1 Mo équivaut à 1 000 000 d’octets, soit 106 octets, 1 octet valant 8 bits ; le bit est l’unité de base en informatique, à savoir un 0 ou un 1).

Photo: Emmanuel Pierrot.Vu pour Libération «Capteurs». Big Data Is Great, but Don’t Forget Intuition. Vertigineux "big data" The Human Face of Big Data - Home. Rethinking Privacy in an Era of Big Data. After Facebook fails. At the heart of the Internet business is one of the great business fallacies of our time: that the Web, with all its targeting abilities, can be a more efficient, and hence more profitable, advertising medium than traditional media.

After Facebook fails

Facebook, with its 900 million users, valuation of around $100 billion, and the bulk of its business in traditional display advertising, is now at the heart of the heart of the fallacy.The daily and stubborn reality for everybody building businesses on the strength of Web advertising is that the value of digital ads decreases every quarter, a consequence of their simultaneous ineffectiveness and efficiency.

The nature of people’s behavior on the Web and of how they interact with advertising, as well as the character of those ads themselves and their inability to command real attention, has meant a marked decline in advertising’s impact. One might think all this personalized advertising must be pretty good, or it wouldn’t be such a hot new business category. Big Data’s Parallel Universe Brings Fears, and a Thrill. Le Monde.fr : Souriez, vous êtes achetés.

Data Collectors

Words by the Millions, Sorted by Software. Factual’s Gil Elbaz Wants to Gather the Data Universe. How Big Data Gets Real. The business of Big Data, which involves collecting large amounts of data and then searching it for patterns and new revelations, is the result of cheap storage, abundant sensors and new software.

How Big Data Gets Real

It has become a multibillion-dollar industry in less than a decade. Growing at speed like that, it is easy to miss how much remains to do before the industry has proven standards. Until then, lots of customers are probably wasting much of their money. There is essential work to be done training a core of people in very hard problems, like advanced statistics and software that ensures data quality and operational efficiency. Broad-based literacy in the uses of data should probably happen too, along with new kinds of management, better tools for reading the information, and privacy safeguards for corporate and personal information.