background preloader

Six Provocations for Big Data by Danah Boyd, Kate Crawford

Six Provocations for Big Data by Danah Boyd, Kate Crawford
The era of Big Data has begun. Computer scientists, physicists, economists, mathematicians, political scientists, bio-informaticists, sociologists, and many others are clamoring for access to the massive quantities of information produced by and about people, things, and their interactions. Diverse groups argue about the potential benefits and costs of analyzing information from Twitter, Google, Verizon, 23andMe, Facebook, Wikipedia, and every space where large groups of people leave digital traces and deposit data. Significant questions emerge. This essay offers six provocations that we hope can spark conversations about the issues of Big Data. (This paper was presented at Oxford Internet Institute’s “A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society” on September 21, 2011.)

Company - Report - Big data: The next frontier for innovation, competition, and productivity - May 2011 The amount of data in our world has been exploding, and analyzing large data sets—so-called big data—will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, according to research by MGI and McKinsey's Business Technology Office. Leaders in every sector will have to grapple with the implications of big data, not just a few data-oriented managers. The increasing volume and detail of information captured by enterprises, the rise of multimedia, social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future. MGI studied big data in five domains—healthcare in the United States, the public sector in Europe, retail in the United States, and manufacturing and personal-location data globally. Big data can generate value in each. 1. 2. Podcast Distilling value and driving productivity from mountains of data 3. 4. 5. 6. 7.

The evolution of data products In “What is Data Science?,” I started to talk about the nature of data products. Since then, we’ve seen a lot of exciting new products, most of which involve data analysis to an extent that we couldn’t have imagined a few years ago. It’s an old problem: the geeky engineer wants something cool with lots of knobs, dials, and fancy displays. Disappearing data We’ve become accustomed to virtual products, but it’s only appropriate to start by appreciating the extent to which data products have replaced physical products. But while we’re accustomed to the displacement of physical products by virtual products, the question of how we take the next step — where data recedes into the background — is surprisingly tough. A list may be an appropriate way to deliver potential contacts, and a spreadsheet may be an appropriate way to edit music metadata. These projects suggest the next step in the evolution toward data products that deliver results rather than data. We can push even further. Interfaces

Et tu, Citi? Bank Raises Balance Requirements and Fees Bank of America wasn’t the only big national financial institution to announce some changes that might hit customers in the wallet. Citi was quick to bash Bank of America when it rolled out its hugely unpopular debit card fee, but it just announced an overhaul of its checking account options, along with increases in minimum-balance requirements and monthly maintenance fees that kick in Dec. 9. One big change affects the bank’s mid-level checking option. The bank is phasing out its EZ Checking account, which hasn’t been offered to new customers for over a year. Customers who have this account now can keep it, but there are some new rules. (MORE: Bank Accounts: Do the Free Cash Come-ons Outweigh the Fees Sure to Follow?) The mid-tier checking package the bank now offers is called the Citibank Account. (MORE: Was Bank of America Hacked?) (MORE: 111 Pages of Disclosures for the Typical Checking Account?!?)

EMC throws lots of hardware at Hadoop — Cloud Computing News The New Big Data Top scientists from companies such as Google and Yahoo are gathered alongside leading academics at the 17th Association for Computing Machinery (ACM) conference on Knowledge Discovery and Data Mining (KDD) in San Diego this week. They will present the latest techniques for wresting insights from the deluge of data produced nowadays, and for making sense of information that comes in a wider variety of forms than ever before. Twenty years ago, the only people who cared about so-called “big data”—the only ones who had enormous data sets and the motivation to try to process them—were members of the scientific community, says Usama Fayyad, executive chair of ACM’s Special Interest Group on Knowledge Discovery and Data Mining and former chief data officer at Yahoo. The explosive growth of the Internet, however, changed everything. These days, Internet giants make their money from the information they collect about users and the insights they gain from mining it.

Rankur Offers Free Marketing Analysis on The Web - Technorati Advertising Having to decide how to promote your new product? Or what are the main features that people like on your competitors’ gadgets? Having issues with your current ad campaign? There are technologies that help you answer the question and ease the finding of the solution, but now they become free! Consumers’ opinions have always been an important piece of information during the decision making process. new EU-based startup, called Rankur, combines technologies like Opinion Mining and Text Analytics into a cutting edge product that answers the questions “What other people think” and “What are other people talking about”. A marketing or PR professional may stay current to what is being said about a topic, find out what are the related discussions about, where do they come from, discover negative or positive text and filter opinions by language, source or trend. Another application of these recent technologies is the automation of the follow-up of your brand reputation.

Revolution speeds stats on Hadoop clusters High performance access to file storage Revolution Analytics, the company that is extending R, the open source statistical programming language, with proprietary extensions, is making available a free set of extensions that allow its R engine to run atop Hadoop clusters. Now statisticians that are familiar with R can do analysis on unstructured data stored in the Hadoop Distributed File System, the data store used for the MapReduce method of chewing on unstructured data pioneered by Google for its search engine and mimicked and open sourced by rival Yahoo! as the Apache Hadoop project. R can now also run against the HBase non-relational, column-oriented distributed data store, which mimics Google's BigTable and which is essentially a database for Hadoop for holding structured data. Like Hadoop, HBase in an open source project distributed by the Apache Software Foundation. You can download the R connector for Hadoop from GitHub.

Why We Should Learn the Language of Data Illustration: Ellen Lupton How can global warming be real when there’s so much snow?” Hearing that question — repeatedly — this past February drove Joseph Romm nuts. A massive snowstorm had buried Washington, DC, and all across the capital, politicians and pundits who dispute the existence of climate change were cackling. The family of Oklahoma senator Jim Inhofe built an igloo near the Capitol and put up a sign reading “Al Gore’s New Home“. The planet can’t be warming, they said; look at all this white stuff! Romm — a physicist and climate expert with the Center for American Progress — spent a week explaining to reporters why this line of reasoning is so wrong. Statistics is hard. Consider the economy: Is it improving or not? Problem is, to calculate that stat, economists remove stores that have closed from their sample. Or take the raging debate over childhood vaccination, where well-intentioned parents have drawn disastrous conclusions from anecdotal information.

Millions of tweets reveal global mood trends | Health Tech It may not be terribly surprising that many of us find our moods dipping over the course of the day, and that by nightfall we light up again. Or that our moods are perkiest on weekends, regardless of which days our weekends fall on (i.e., Fridays and Saturdays in the United Arab Emirates). What's of note, according to an analysis of 2.4 million tweets in 84 countries by researchers out of Cornell, is that these mood trends hold steady across cultures and borders, hinting at some sort of deeper trend whose basis is in being human, not in belonging to a particular people or place. "We saw the influence of something that's biological- or sleep-based; regardless of the day of the week, the shape of the mood rhythm is the same," Scott Golder, a doctoral student of sociology, said in a news release. "The difference between weekdays and weekends has to do with the average mood, which is higher on the weekends than the weekdays.

Hadoop app specialist Karmasphere scores $6M — Cloud Computing News How much information is there in the world? Think you're overloaded with information? Not even close. A study appearing on Feb. 10 in Science Express, an electronic journal that provides select Science articles ahead of print, calculates the world's total technological capacity -- how much information humankind is able to store, communicate and compute. "We live in a world where economies, political freedom and cultural growth increasingly depend on our technological capabilities," said lead author Martin Hilbert of the USC Annenberg School for Communication & Journalism. So how much information is there in the world? Prepare for some big numbers: Looking at both digital memory and analog devices, the researchers calculate that humankind is able to store at least 295 exabytes of information. Telecommunications grew 28 percent annually, and storage capacity grew 23 percent a year. "These numbers are impressive, but still miniscule compared to the order of magnitude at which nature handles information" Hilbert said.