background preloader

The 70 Online Databases that Define Our Planet

The 70 Online Databases that Define Our Planet
Back in April, we looked at an ambitious European plan to simulate the entire planet. The idea is to exploit the huge amounts of data generated by financial markets, health records, social media and climate monitoring to model the planet’s climate, societies and economy. The vision is that a system like this can help to understand and predict crises before they occur so that governments can take appropriate measures in advance. There are numerous challenges here. Nobody yet has the computing power necessary for such a task, neither are there models that will can accurately model even much smaller systems. But before any of that is possible, researchers must gather the economic, social and technological data needed to feed this machine. Today, we get a grand tour of this challenge from Dirk Helbing and Stefano Balietti at the Swiss Federal Institute of Technology in Zurich. These and other pursuits are now producing massive amounts of data, many of which are freely available on the web.

Beautiful Code - Búsqueda de libros de Google re3data.org Public Data Sets A corpus of web crawl data composed of over 5 billion web pages. This data set is freely available on Amazon S3 and is released under the Common Crawl Terms of Use. Last Modified: Mar 17, 2014 17:51 PM GMT Three NASA NEX datasets are now available, including climate projections and satellite images of Earth. Last Modified: Nov 12, 2013 13:27 PM GMT The Ensembl project produces genome databases for human as well as over 50 other species, and makes this information freely available. Last Modified: Oct 8, 2013 14:38 PM GMT Last Modified: Oct 8, 2013 14:37 PM GMT Human Microbiome Project Data Set Last Modified: Sep 26, 2013 17:58 PM GMT The 1000 Genomes Project, initiated in 2008, is an international public-private consortium that aims to build the most detailed map of human genetic variation available. Last Modified: Jul 18, 2012 16:34 PM GMT Last Modified: Apr 24, 2012 21:18 PM GMT Last Modified: Mar 4, 2012 3:22 AM GMT Last Modified: Feb 15, 2012 2:22 AM GMT Last Modified: Jan 21, 2012 2:12 AM GMT

UML Activity Diagrams: Detailing User Interface Navigation This is the third and final article in my series for The Rational Edge on using Unified Modeling Language (UML) Activity Diagrams. In the first and second articles, I showed how Activity Diagrams could be used effectively to capture use-case flows of execution and also to detail the dynamic aspects of the Process View.1 In both cases, this involves altering and enhancing the basic semantics of new stereotypes for activities. These stereotypes better capture details of the problem domain for use-case execution flows and the dynamic nature of system processing for the Process View. In this article, which illustrates how to use UML Activity Diagrams to capture and communicate the details of user interface navigation and functionality, I reintroduce three of these stereotypes: presentation, exception, and connector. Most modern software is designed to interact, at one time or another, with sapient, bipedal life forms (aka, users). Capturing Visual, Control, and Navigation Elements Notes

Databib Open Data XQuery Inside SQL Server 2005 : Performing XSLT Transforms on XML Data Stored in SQL Server 2005 A common task when dealing with XML data is to apply an XSLT style-sheet to the raw XML data in order to display it better. In a previous post to this blog, I showed how to append a processing instruction to your XML data in order to get IE to do the transformation of your XML data. This approach required that you know the location of the XSLT transform file and also that you were looking at your data through a file and not extracted directly from the server. Firstly, we need to write the code that will apply the XSLT transformation. Notice the return type of this function, as well as the types of the various input parameters i.e. the type SqlXml. In order to compile this code into an acceptable CLR assembly, make sure you use the version of the CLR that shipped with the SQL Server 2005 version you have installed. create assembly XsltTransform from 'C:/XsltTransform.dll' <assembly name>. At this point, you are now ready to perform your XSLT transformation!

Treebase TreeBASE is a repository of phylogenetic information, specifically user-submitted phylogenetic trees and the data used to generate them. TreeBASE accepts all types of phylogenetic data (e.g., trees of species, trees of populations, trees of genes) representing all biotic taxa. Data in TreeBASE are exposed to the public if they are used in a publication that is in press or published in a peer-reviewed scientific journal, book, conference proceedings, or thesis. Some recent additions: <a href=" The current release includes a host of new features and improvements over the previous TreeBASE prototype. As of December 2011, TreeBASE contains 2,946 publications written by 6,106 different authors.

Finding Data on the Internet Skip to Content A Community Site for R – Sponsored by Revolution Analytics Home » How to » Finding Data on the Internet Finding Data on the Internet By RevoJoe on October 6, 2011 The following list of data sources has been modified as of 3/18/14. If an (R) appears after source this means that the data are already in R format or there exist R commands for directly importing the data from R. Economics American Economic Ass. Data Science Practice This section contains data sets used in the book "Doing Data Science" by Rachel Schutt and Cathy O'Neil (O'Reilly 2014) Datasets on the book site: Enron Email Dataset: GetGlue (time stamped events: users rating TV shows): Titanic Survival Data Set: Half a million Hubway rides: Finance Government Health Care Gapminder: Machine Learning Networks Science Comments

BioMart Data: Where can I get large datasets open to the public How to access 100M time series in R in under 60 seconds DataMarket, a portal that provides access to more than 14,000 data sets from various public and private sector organizations, has more than 100 million time series available for download and analysis. (Check out this presentation for more info about DataMarket.) And now with the new package rdatamarket, it's trivially easy to import those time series into R for charting, analysis, or anything. Here's what you need to do: Register an account on DataMarket.com (it's free)Install the rdatamarket package in R with install.packages("rdatamarket")Browse DataMarket.com for a time series of interest (I found this series on unemployment)Copy the URL of the page you're on (the short URL works too, I used " the dmseries function with the URL to extract the time series as a zoo object Here's an example: Created by Pretty R at inside-R.org With this package, you can go from finding interesting data on DataMarket to working with it in R in less than a minute.

Take the tour: Go beyond Embed tables and charts Embed any chart or table from DataMarket.com on your own web sites, in your blog posts or news articles. Chart appearance can be configured to match your branding and other requirements. Create dashboards Build a destination for your target audience with dashboards that live on your own web site. Your own branded data market Using the same platform that runs DataMarket.com - our Enterprise plan provides you with a hosted eCommerce and publishing platform for your data. Full data access via API Write your own applications and integrate DataMarket data into your own websites using our flexible, REST-ful API. Accessing data through R Access data from DataMarket directly from statistical software R using the rdatamarket package.

Related: