Public Data Sets
A corpus of web crawl data composed of over 5 billion web pages. This data set is freely available on Amazon S3 and is released under the Common Crawl Terms of Use. Last Modified: Mar 17, 2014 17:51 PM GMT Three NASA NEX datasets are now available, including climate projections and satellite images of Earth. Last Modified: Nov 12, 2013 13:27 PM GMT The Ensembl project produces genome databases for human as well as over 50 other species, and makes this information freely available. Last Modified: Oct 8, 2013 14:38 PM GMT Last Modified: Oct 8, 2013 14:37 PM GMT Human Microbiome Project Data Set Last Modified: Sep 26, 2013 17:58 PM GMT The 1000 Genomes Project, initiated in 2008, is an international public-private consortium that aims to build the most detailed map of human genetic variation available. Last Modified: Jul 18, 2012 16:34 PM GMT Last Modified: Apr 24, 2012 21:18 PM GMT Last Modified: Mar 4, 2012 3:22 AM GMT Last Modified: Feb 15, 2012 2:22 AM GMT Last Modified: Jan 21, 2012 2:12 AM GMT
Open Data
Finding Data on the Internet
Skip to Content A Community Site for R – Sponsored by Revolution Analytics Home » How to » Finding Data on the Internet Finding Data on the Internet By RevoJoe on October 6, 2011 The following list of data sources has been modified as of 3/18/14. If an (R) appears after source this means that the data are already in R format or there exist R commands for directly importing the data from R. Economics American Economic Ass. Data Science Practice This section contains data sets used in the book "Doing Data Science" by Rachel Schutt and Cathy O'Neil (O'Reilly 2014) Datasets on the book site: Enron Email Dataset: GetGlue (time stamped events: users rating TV shows): Titanic Survival Data Set: Half a million Hubway rides: Finance Government Health Care Gapminder: Machine Learning Networks Science Comments
Data: Where can I get large datasets open to the public
How to access 100M time series in R in under 60 seconds
DataMarket, a portal that provides access to more than 14,000 data sets from various public and private sector organizations, has more than 100 million time series available for download and analysis. (Check out this presentation for more info about DataMarket.) And now with the new package rdatamarket, it's trivially easy to import those time series into R for charting, analysis, or anything. Here's what you need to do: Register an account on DataMarket.com (it's free)Install the rdatamarket package in R with install.packages("rdatamarket")Browse DataMarket.com for a time series of interest (I found this series on unemployment)Copy the URL of the page you're on (the short URL works too, I used " the dmseries function with the URL to extract the time series as a zoo object Here's an example: Created by Pretty R at inside-R.org With this package, you can go from finding interesting data on DataMarket to working with it in R in less than a minute.
Take the tour: Go beyond
Embed tables and charts Embed any chart or table from DataMarket.com on your own web sites, in your blog posts or news articles. Chart appearance can be configured to match your branding and other requirements. Create dashboards Build a destination for your target audience with dashboards that live on your own web site. Your own branded data market Using the same platform that runs DataMarket.com - our Enterprise plan provides you with a hosted eCommerce and publishing platform for your data. Full data access via API Write your own applications and integrate DataMarket data into your own websites using our flexible, REST-ful API. Accessing data through R Access data from DataMarket directly from statistical software R using the rdatamarket package.
Tariffs: Comprehensive tariff data on the WTO website
What are you looking for? Sophisticated, detailed and interactive analysis? > Go to the new Tariff Analysis Online; > Explanation and user guide: browse, pdf, Word Simpler, standardized tariff statistics, mainly for downloading? > Go to WTO Tariff Download Facility; > Explanation and user guide: browse, pdf, Word With both of these services, users can obtain and compare two sets of customs tariffs: the legally bound commitments on customs duty rates, which act as ceilings on the tariffs that member governments can set and are known as “bound rates”, with the rates that governments actually charge on imports, which can be lower, are known as “applied rates” and have a direct impact on trade. Tariff Analysis Online is the most versatile and detailed. However, Tariff Analysis Online does allow a number of options for looking up data and for analysing it online, including tariffs, tariff quotas, imports and countries’ commitments on agricultural subsidies. Data sources back to top 1.