background preloader

Data

Facebook Twitter

Data Visualization. Experimental isarithmic maps visualise electoral data. David B. Sparks, a fifth-year PhD candidate in the Department of Political Science at Duke University, has today published a fascinating set of experiments using ‘Isarithmic’ maps to visualise US party identification. Isarithmic maps are essentially topographic/contour maps and offer an alternative approach to plotting geo-spatial data using choropleth maps. This is a particularly interesting approach for the US with its extreme population patterns. David uses hue to depict the strength of party identification using data from the 2008 Cooperative Congressional Election Study. Strong reds indicate a dominance of support for Republicans, strong blues indicate stronger support for Democrats, with purples reflecting an independence or general parity in opinion.

This layer is depicted by the ‘lightness’ of a region. I think this is an excellent solution to overcome the challenge of effectively presenting a combined view of opinion data with the spread of respondents. Digital Humanities Spotlight: 7 Important Digitization Projects. By Maria Popova From Darwin’s marginalia to Voltaire’s correspondence, or what Dalí’s controversial World’s Fair pavilion has to do with digital myopia.

Despite our remarkable technological progress in the past century and the growth of digital culture in the past decade, a large portion of humanity’s richest cultural heritage remains buried in analog archives. Bridging the disconnect is a fledgling discipline known as the Digital Humanities, bringing online historical materials and using technologies like infrared scans, geolocation mapping, and optical character recognition to enrich these resources with related information or make entirely new discoveries about them. As Europe’s digital libraries open up their APIs, techno-dystopian pundits lament that these efforts diminish “the mystery of history,” but such views are myopic and plagued by unnecessary nostalgia for a time when knowledge was confined to the privileged cultural elite.

Donating = Loving Share on Tumblr. Big Data Now: Current Perspectives from O'Reilly Radar - O'Reilly Media. How to create sustainable open data projects with purpose. There has been much hand-wringing of late about whether the explosion of government-run app contests over the last couple of years has generated any real value for the public. With only one of the Apps for Democracy projects still running, it’s easy to see the entire movement being written off as an overly optimistic fad. The organisation that I’m lucky enough to lead — mySociety — didn’t come from the world of app contests, but it does build the kind of open-source, open-data-grounded civic apps that such contests are suppose to produce.

I believe that mySociety’s story shows that it’s possible to build meaningful, impactful civic and democratic web apps, to grow them to a scale where they’re unambiguously a good use of time and money, then sustain them for years at a time. You have to be just as focused on user needs as any company (and perhaps more so) People have needs. Sometimes they need to eat, sometimes they need to sleep. Data is your servant, not your master I love open data. Visualization deconstructed: Why animated geospatial data works. In this, my first Visualization Deconstructed post, I’m expanding the scope to examine one of the most popular contemporary visualization techniques: animation of geospatial data over time. The beauty of photo versus the wonder of film In a previous post, Sebastien Pierre provided some excellent analysis about the illuminating visualization produced by Paul Butler, which examined the relationships between Facebook users around the world.

Here, we saw the intricate beauty that comes from a designer who finds the sweet spot of insightful effectiveness and aesthetic elegance. This accomplishment is all the more impressive when demonstrated through a static visualization. Sebastien shared a great quote, attributed to Paul Butler, which read: “Visualizing data is like photography. If the static visualization is a photograph, an interactive visualization, by contrast, can be considered a movie. One of the most powerful examples of interactive visualization is the animation of geospatial data. Data Syndrome. High Scalability - High Scalability. The Three Ages of Google - Batch, Warehouse, Instant.

The world has changed. And some things that should not have been forgotten, were lost. I found these words from the Lord of the Rings echoing in my head as I listened to a fascinating presentation by Luiz André Barroso, Distinguished Engineer at Google, concerning Google's legendary past, golden present, and apocryphal future. His talk, Warehouse-Scale Computing: Entering the Teenage Decade, was given at the Federated Computing Research Conference. Luiz clearly knows his stuff and was early at Google, so he has a deep and penetrating perspective on the technology. There's much to learn from, think about, and build. Lord of the Rings applies at two levels. What is completely new, however, is the combining of Warehouse + Instant, and that's where the opportunities and the future is to be found- the Fourth Age.

The First Age - The Age of Batch The time is 2003. Google is batch oriented. Google was still unsophisticated in their hardware. The Second Age - The Age of the Warehouse. Bigdata (bigdata) Leaflet - a modern, lightweight JavaScript library for interactive maps by CloudMade. Sunlight Labs: Blog - The Coming Government Data Flood.

Government is releasing data at a breakneck pace, and it is just getting started. One interesting side effect of our National Data Catalog is that we're regularly parsing all of the data on data.gov, and we're able to do interesting things with the aggregate metadata. By parsing out the release date for each dataset on data.gov, and grouping each release by quarter though it's easy to see that since the second quarter of 2009-- when Data.gov was released, the federal government has released more raw datasets than it ever has in the past. Take a look at what's happened after Data.gov launched: Now, granted, like all government data-- it's a little messy. These are bulk, aggregate conclusions and haven't been reviewed, but they point to a trend regardless of their accuracy. Keep in mind here that what we're using is the original release date of the data.

As of today, about halfway through the first quarter, government is already on pace to beat its Q3 2009 record of 308 datasets. Why you can't really anonymize your data. One of the joys of the last few years has been the flood of real-world datasets being released by all sorts of organizations. These usually involve some record of individuals’ activities, so to assuage privacy fears, the distributors will claim that any personally-identifying information (PII) has been stripped. The idea is that this makes it impossible to match any record with the person it’s recording. Something that my friend Arvind Narayanan has taught me, both with theoretical papers and repeated practical demonstrations, is that this anonymization process is an illusion. Precisely because there are now so many different public datasets to cross-reference, any set of records with a non-trivial amount of information on someone’s actions has a good chance of matching identifiable public records.

So, what should we do? Keep the anonymization Just because it’s not totally reliable, don’t stop stripping out PII. Acknowledge there’s a risk of de-anonymization Limit the detail Related: Bracing for the Data Deluge. From Facebook to the Department of Motor Vehicles, the world is catalogued in databases. No one knows it better than MIT adjunct professor and entrepreneur Michael Stonebraker, who has spent the last 25 years developing the technology that made it so. He got his big break by inventing and commercializing technology that underlies most of the databases, known as relational databases, that rule today. But Stonebraker now happily calls his earlier inventions largely obsolete. He’s working on a new generation of database technology that can handle the flood of digital data that is starting to overwhelm established methods. “Relational databases are omnipresent as the solution for enterprise data.

One is a database system called C-Store. A special report on managing information: Data, data everywhere. WHEN the Sloan Digital Sky Survey started work in 2000, its telescope in New Mexico collected more data in its first few weeks than had been amassed in the entire history of astronomy. Now, a decade later, its archive contains a whopping 140 terabytes of information.

A successor, the Large Synoptic Survey Telescope, due to come on stream in Chile in 2016, will acquire that quantity of data every five days. Such astronomical amounts of information can be found closer to Earth too. Wal-Mart, a retail giant, handles more than 1m customer transactions every hour, feeding databases estimated at more than 2.5 petabytes—the equivalent of 167 times the books in America's Library of Congress (see article for an explanation of how data are quantified). Facebook, a social-networking website, is home to 40 billion photos.

All these examples tell the same story: that the world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. Dross into gold. Metamarkets Blog » Blog Archive » The Rise of Interactive Data Visualization. The visualization below highlights something only recently possible on the web: a dynamic, interactive canvas.

Titled “Disaster Strikes: A World In Sight”, it visualizes a century of floods, fires, droughts, and earthquakes around the globe. (Below is a snapshot of 1996, an apparently costly year for disasters). It’s not a passively animated graphic, but one that users can actively engage with, freezing or pivoting dimensions to reveal new views of the data. It’s a harbinger of a new class of documents, which digital publishers are beginning to embrace, to provide a richer information experience for readers. Meet the Interactive Frameworks That the above graphic could be built in a single weekend (it was part of a larger hackathon called Data In Sight that Barret Schloerke and his team 13 participated in) is testament to the maturity of tools available. In the last few years, there has been a blossoming of frameworks for creating rich, dynamic infographics.

Progress, In Sight. Enterprise Resilience Management Blog: Big Data Supply Chains. Supply chains come in all shapes and sizes. Supply chain complexity increases as it becomes larger or more geographically extended or more data intensive. Lora Cecere, a partner at Altimeter Group, recently wrote a post focused on "the big data supply chain. " ["User in the Era: Big Data Supply Chains," Supply Chain Shaman, 1 June 2011]. Since "big data" may be a new term for some readers, Cecere begins her post by explaining what she means by "big data. " She writes: "The concept is simple. A recent study by McKinsey & Company asserts, "The scale and scope of the changes that such 'big data' are bringing about have reached an inflection point.

" "This is far different than the world of five years ago when data was shared less often; and when it was ... usually monthly data monthly or weekly data weekly. ... The age of big data began a number of years ago. "Memory-centric data management and OLAP (online analytical processing) technologies and tools are the answer. "#2. "#3. "#4. Machine Learning: What are some introductory resources for learning about large scale machine learning. The next, next big thing. In my old age, at least for the computing industry, I’m getting more irritated by smart young things that preach today’s big thing, or tomorrow’s next big thing, as the best and only solution to my computing problems.

Those that fail to learn from history are doomed to repeat it, and the smart young things need to pay more attention. Because the trends underlying today’s computing should be evident to anyone with a sufficiently good grasp of computing history. Depending on the state of technology, the computer industry oscillates between thin- and thick-client architectures. Either the bulk of our compute power and storage is hidden away in racks of (sometimes distant) servers, or alternatively, into a mass of distributed systems closer to home. Thinking that just couldn’t happen? Yesterday’s next big thing Yesterday’s “next big thing” was the World Wide Web. The next big thing? The machines we grew up with are yesterday’s news. The next, next big thing As for the next, next big thing? With M2M, the machines do all the talking. The shift from transporting voice to delivering data has transformed the business of mobile carriers, but there’s yet another upheaval on the horizon: machine to machine communications (M2M).

In M2M, devices and sensors communicate with each other or a central server rather than with human beings. These devices often use an embedded SIM card for communication over the mobile network. Applications include automotive, smartgrid, healthcare and environmental usages. M2M traffic differs from human-generated voice and data traffic. Mobile carriers are adapting by creating entirely new companies for M2M, such as Telenor’s M2M carrier Telenor Connexion, and m2o city, Orange’s joint venture with water giant Veolia.

Why did Telenor start Telenor Connexion? Göran Brandt: Telenor Connexion was founded in 2008. Why did Orange launch a mobile service operator specifically for water metering data? To be clear, m2o city is not a “mobile operator.” The data also varies between applications. Related: Lessons of the Victorian data revolution. Ken Cukier recently wrote about how useful analogies from the past are in explaining the potential of the current data revolution. Science as we know it was consciously created in the 19th century, and in many ways the current wave of data techniques feels like an echo of that first flood of innovations. It’s fascinating to read histories of the era like “The Philosophical Breakfast Club” and spot the parallels.

Take tides for example. You’ve probably never worried about the timing or height of the sea, but for Victorian sailors figuring out the tides was a life or death problem. The harbor masters were data producers with a business model that excluded many potential users because the transaction costs were too high to be worthwhile. The Victorian solution was another familiar face — crowdsourcing.

Maps like these, along with more detailed tables, allowed navigators to make their journeys without being ambushed by the tides. The limits of data The way forward Related: Why the term "data science" is flawed but useful. Mention “data science” to a lot of the high-profile people you might think practice it and you’re likely to see rolling eyes and shaking heads. It has taken me a while, but I’ve learned to love the term, despite my doubts.

The key reason is that the rest of the world understands roughly what I mean when I use it. After years of stumbling through long-winded explanations about what I do, I can now say “I’m a data scientist” and move on. It is still an incredibly hazy definition, but my former descriptions left people confused as well, so this approach is no worse and at least saves time. With that in mind, here are the arguments I’ve heard against the term, and why I don’t think they should stop its adoption. It’s not a real science I just finished reading “The Philosophical Breakfast Club,” the story of four Victorian friends who created the modern structure of science, as well as inventing the word “scientist.” It’s an unnecessary label The name doesn’t even make sense Related: Who Will Own Local Data? Search Engines, Yellow Pages, Aggegators Or Social Media?

Amazon’s $23,698,655.93 book about flies. IT Looks for New Tools to Exploit 'Big Data' Data hand tools. Big data: Global good or zero-sum arms race? Factual Home - Factual.