background preloader


Facebook Twitter

Data Anonymization, need for every site in production. On one of my previous projects, we wrote a jMeter performance test suite, which runs periodically on performance environment.

Data Anonymization, need for every site in production

Once the application was in production, we enhanced our performance test suite based on actual user behaviours from Apache access logs and Omniture analytics. That provided us a great level of confidence in development for scaling. Now the next step was to get the production dataset so our performance testing becomes almost like production peak load. Also we had few bugs manifesting themselves only in production and we were not able to reproduce the same on our local environment due to the dataset. Sunitparekh/data-anonymization. “Anonymized” data really isn’t—and here’s why not. The Massachusetts Group Insurance Commission had a bright idea back in the mid-1990s—it decided to release "anonymized" data on state employees that showed every single hospital visit.

“Anonymized” data really isn’t—and here’s why not

The goal was to help researchers, and the state spent time removing all obvious identifiers such as name, address, and Social Security number. But a graduate student in computer science saw a chance to make a point about the limits of anonymization. Latanya Sweeney requested a copy of the data and went to work on her "reidentification" quest. It didn't prove difficult. Anonimatron.