The Datawrangling blog was put on the back burner last May while I focused on my startup. Now that I have some bandwidth again, I am getting back to work on several pet projects (including the Amazon EC2 Cluster ). I'm giving an EC2 talk at Pycon in March, so I'm really on the hook to wrap up that series of posts now. The event which prompted this long overdue blog post was another pet project: collecting public datasets. I keep an eye on topics of interest using del.icio.us tag subscriptions , and yesterday my feed was flooded with links to theinfo.org . Theinfo is a new community site/wiki for people working with large datasets and was started by reddit cofounder Aaron Swartz .