background preloader

Big data

Facebook Twitter

17360_HBR_Cognizant_Report_webview. 17568_HBR_SAS Report_webview. 17441_HBR_Dell_Report_webview. When to Act on a Correlation, and When Not To - David Ritter. By David Ritter | 11:00 AM March 19, 2014 “Petabytes allow us to say: ‘Correlation is enough.’” - Chris Anderson, Wired Magazine, June 23, 2008 The sentiment expressed by Chris Anderson in 2008 is a popular meme in the Big Data community. “Causality is dead,” say the priests of analytics and machine learning. But inquiring whether correlation is enough is asking the wrong question. Confidence that the correlation will reliably recur in the future. The first factor—the confidence that the correlation will recur —is in turn a function of two things: the frequency with which the correlation has historically occurred (the more often events occur together in real life, the more likely it is that they are connected) and the understanding around what is causing that statistical finding.

Understanding the interplay between the confidence level and the risk/reward tradeoff enables sound decisions on what action—if any—makes sense in light of a particular statistical finding. RigourAndOpen | Rigour and Openness in 21st Century Science. The Altmetric API - Altmetric. A manifesto – altmetrics.org. WP-HBR-Pulse-Survey-EN. Big Data Analytics at Thomson Reuters. Interview with Jochen L. Leidner | ODBMS Industry Watch.

Big data. Big Data: Dead By Definition, Alive In Practice. There's a gap between what big data means on paper and what it really means to a business. Big data is at a crossroads. On one hand, big data is dead, the term having been used so often that it's been stripped of tangible value. On the other hand, big data has never been so alive, as more companies than ever are trying to improve so-called big data analytics.

How can such a dichotomy exist? The answer can be found in the enormous gap between what big data means by definition and what it really means in the important practice of data management. Big data by definition The term big data -- by the most commonly-used definition -- refers to data sets that are too large and complex to manage within traditional systems. This data is generally unstructured or semi-structured and require investment in new tools, technologies, skillsets, and team members to manage it. [Data analysis is a do-or-die requirement for today's businesses. In reality, big data is generally not a technical challenge.

Big data is dead, long live big data: Thoughts heading to Strata. A recent VentureBeat article argues that “Big Data” is dead. It’s been killed by marketers. That’s an understandable frustration (and a little ironic to read about it in that particular venue). As I said sarcastically the other day, “Put your Big Data in the Cloud with a Hadoop.” You don’t have to read much industry news to get the sense that “big data” is sliding into the trough of Gartner’s hype curve. That’s natural. Big data is not a term I’m particularly fond of. Whether or not Moore’s Law continues indefinitely, the real importance of the amazing increase in computing power over the last six decades isn’t that things have gotten faster; it’s the size of the problems we can solve has gotten much, much larger. In the next year, we’ll slog through the cynicism that’s a natural outcome of the hype cycle.