background preloader


Facebook Twitter

Rankings - Doing Business - The World Bank Group. Twitter Census: Publishing the First of Many Datasets. As useful as the Twitter API is, developers, designers, and researchers have long clamored for more than the trickle of data that service currently allows.

Twitter Census: Publishing the First of Many Datasets

We agree — some of the sexiest uses of data require processing not just all that is now, but the vast historical record. Twitter doesn’t provide the only use case for this, but until now its historical bulk data has been hard to find. Today we are publishing a few items collected from our large scrape of Twitter’s API. The data was collected, cleaned, and packaged over twelve months and contains almost the entire history of Twitter: 35 million users, one billion relationships, and half a billion Tweets, reaching back to March 2006. The initial datasets are a part of our Twitter Census collection. The first dataset, a Token Count, counts the number of tokens (hashtags, smiley’s and URL’s) that have been tweeted. These datasets are only views from the massive collection we have been growing over the last year.

World Bank public data, now in search. When we first launched public data on, we wanted to make statistics easier to find and to encourage debate based on facts rather than intuition. The day after we launched, a friend who worked at the World Bank called me, her voice filled with enthusiasm, "Did you know that the World Bank also just released an API for their data?

" Excited, I checked it out, and found an amazing treasure trove of statistics for most economies in the world. After some hard work and analysis, today we're happy to announce that 17 World Development Indicators (list below*) are now conveniently available to you in Google search. With today's update, you can quickly access more data with a broad range of queries. Search should be intuitive, so we've done the work to think through queries where public data will be most relevant to you. Clicking on the result will bring you to an interactive chart where you can compare the United States with other regions around the world. DataSF - DataSF - Liberating City Data. Factual. Public Datasets « Elastic Web Mining. This is a page where we list public datasets that we’ve used or come across.

Public Datasets « Elastic Web Mining

Comments, corrections, and additional data sources are welcome! We use datasets for consulting projects, and when we need some juicy data for labs that are part of our big data training courses. There’s also some slightly out-of-date information from an ACM event that you can find here. We’ve also started a separate list of commercial datasets. The information below is organized by the type of data – e.g. Some of this information comes from other lists we’ve found, including: Data Files Wikipedia – complete data dump for site, in MediaWiki data files.

APIs Note that for many these, there are restrictions on number of requests/day and usage of the data. Delicious – social network site for link sharing. Databases Freebase – open database of people, places and things.FLOSSMole – has database of open source projects.ImageNet – an image database organized according to the WordNet hierarchy. Web Pages. FedStats.