Where can I find large datasets open to the public?

Related: Big Data / Analytics

Publicly Available Big Data Sets :: Hadoop Illuminated Public Data sets on Amazon AWS Amazon provides following data sets : ENSEMBL Annotated Gnome data, US Census data, UniGene, Freebase dump Data transfer is 'free' within Amazon eco system (within the same zone) AWS data sets InfoChimps InfoChimps has data marketplace with a wide variety of data sets. InfoChimps market place Comprehensive Knowledge Archive Network open source data portal platform data sets available on datahub.io from ckan.org Stanford network data collection Open Flights Crowd sourced flight data Flight arrival data

Dis, papa, c’est quoi l’open data ? Nombreux sont ceux qui estiment que le mouvement "open data" aura, à l'instar de l’apparition de l’alphabet, de l'internet ou encore de l'explosion des réseaux sociaux, des répercussions majeures dans nos sociétés. Connu pour ses logiciels non libres, Microsoft a eu la très bonne idée de demander à Regards sur le numérique (RSLN, animé par Spintank), son “laboratoire d’idées, de réflexions et d’expérimentations en ligne“, de se pencher sur la notion d’open data, et donc le partage de données publiques dans des formats ouverts, afin de libérer les données récoltées, ou produites, par les autorités publiques, et de les rendre, si possible gratuitement, à la société, ses citoyens, associations, entreprises privées et administrations publiques. Au menu, très complet, digeste et instructif : une enquête et une trentaine d’articles, que l’on retrouve sur son site ainsi que dans le n° spécial de leur magazine, suivi d’une conférence, intitulée L’Open data, et nous, et nous, et nous ?

Email any web page to any one / EmailTheWeb.com Easy Java Simulations Wiki | Main / Home Page About Easy Java/Javascript Simulations Easy Java/Javascript Simulations, also known as EjsS (and, formerly, EJS or Ejs), is a free authoring tool written in Java that helps non-programmers create interactive simulations in Java or Javascript, mainly for teaching or learning purposes. EjsS has been created by Francisco Esquembre and is part of the Open Source Physics project. A brief historical and naming remark: Before release 5.0, EjsS could only create Java simulations. In this wiki: Science SPORE PrizeNovember 2011 Password only required for helping with the documentation If you follow a link in this wiki and get a ‘Password required’ message, this means the page you tried to visit does not exist yet. Visitors counter This page has been visited times since October 2008.

Databib | Research Data Repositories Finding Data on the Internet Skip to Content A Community Site for R – Sponsored by Revolution Analytics Home » How to » Finding Data on the Internet Finding Data on the Internet By RevoJoe on October 6, 2011 The following list of data sources has been modified as of 3/18/14. If an (R) appears after source this means that the data are already in R format or there exist R commands for directly importing the data from R. Economics American Economic Ass. Data Science Practice This section contains data sets used in the book "Doing Data Science" by Rachel Schutt and Cathy O'Neil (O'Reilly 2014) Datasets on the book site: Enron Email Dataset: GetGlue (time stamped events: users rating TV shows): Titanic Survival Data Set: Half a million Hubway rides: Finance Government Health Care Gapminder: Machine Learning Networks Science Comments

Solvent Solvent Why do I need screen scrapers? Piggy Bank needs web pages to embed information in a format that it can understand. In short, screen scrapers allow you to turn a regular web page into a regular web page plus semantic data, and thus frees the data from the page/site that contains it. How do I use it? Watch a screencast of Solvent scraping the location of Starbucks coffee shops in Cambridge, MA and then use Piggy Bank to show the scraped data on a map. Also read the Piggy Bank screen scraping howto that uses Solvent to write a screen scraper for Piggy Bank. There is another tutorial about using Solvent to scrape web pages containing data about baseball players. What are the main features of Solvent? Writing screen scrapers can be hard and tedious, that's why you need a tool to help you. Where do I find other scrapers to learn from? See the list of Piggy Bank scrapers available. How can I help/complain/thank? There are several ways you can help: Licensing & Legal Issues Credits

SWF Charts > Buy Free License XML/SWF Charts is free to download and use. The free, unregistered version contains all the features except for: Clicking a chart takes the user to the XML/SWF Charts web site. Developing and maintaining XML/SWF Charts takes a lot of effort. Web site developers may use unregistered copies of XML/SWF Charts in client web sites. Software developers may redistribute unregistered copies of XML/SWF Charts within other software products, with the copyright attached. $29 - Single License The single license is for one domain name, all its sub-domains (www.yourdomain.com, sales.yourdomain.com, www.sales.yourdomain.com, tech.yourdomain.com, etc.), all its ports (yourdomain.com, yourdomain.com:8000, etc.), and for localhost ( License for one domain name, all its sub-domains and ports, and "localhost". Make a payment with PayPal, and get a registration code at the end of the payment process. Credit card transactions are processed immediately. $399 - Bulk License

Wall of Films! | Films For Action Just imagine what could become possible if an entire city had seen just one of the documentaries above. Just imagine what would be possible if everyone in the country was aware of how unhealthy the mainstream media was for our future and started turning to independent sources in droves. Creating a better world really does start with an informed citizenry, and there's lots of subject matter to cover. From all the documentaries above, it's evident that our society needs a new story to belong to. The old story of empire and dominion over the earth has to be looked at in the full light of day - all of our ambient cultural stories and values that we take for granted and which remain invisible must become visible. But most of all, we need to see the promise of the alternatives - we need to be able to imagine new exciting ways that people could live, better than anything that the old paradigm could ever dream of providing. So take this library of films and use it.

Public Data Sets on Amazon Web Services (AWS) Click here for the detailed list of available data sets. Here are some examples of popular Public Data Sets: NASA NEX: A collection of Earth science data sets maintained by NASA, including climate change projections and satellite images of the Earth's surfaceCommon Crawl Corpus: A corpus of web crawl data composed of over 5 billion web pages1000 Genomes Project: A detailed map of human genetic variation Google Books Ngrams: A data set containing Google Books n-gram corpusesUS Census Data: US demographic data from 1980, 1990, and 2000 US CensusesFreebase Data Dump: A data dump of all the current facts and assertions in the Freebase system, an open database covering millions of topics The data sets are hosted in two possible formats: Amazon Elastic Block Store (Amazon EBS) snapshots and/or Amazon Simple Storage Service (Amazon S3) buckets. If you have any questions or want to participate in our Public Data Sets community, please visit our Public Data Sets forum .

IT Operations Analytics In the fields of information technology and systems management, IT Operations Analytics (ITOA) is an approach or method applied to application software designed to retrieve, analyze and report data for IT operations. ITOA has been described as applying big data analytics to large datasets where IT operations can extract unique business insights.[1][2] In its Hype Cycle Report, Gartner rated the business impact of ITOA as being ‘high’, meaning that its use will see businesses enjoy significantly increased revenue or cost saving opportunities.[3] By 2017, Gartner predicts that 15% of enterprises will use IT operations analytics technologies to deliver intelligence for both business execution and IT operations.[2] Definition[edit] History[edit] Due the mainstream embrace of cloud computing and the increasing desire for businesses to adopt more Big Data practices, the ITOA industry has grown significantly since 2010. Applications[edit] Types[edit] Tools and ITOA Platforms[edit] See also[edit]

The Big Clean Sandia's Computational Software Site Civil War Battles & Civil War Casualties Interactive Map