background preloader

Data sources and exploration

Facebook Twitter

Features. Gephi is a tool for data analysts and scientists keen to explore and understand graphs. Like Photoshop™ but for graph data, the user interacts with the representation, manipulate the structures, shapes and colors to reveal hidden patterns. The goal is to help data analysts to make hypothesis, intuitively discover patterns, isolate structure singularities or faults during data sourcing. It is a complementary tool to traditional statistics, as visual thinking with interactive interfaces is now recognized to facilitate reasoning. This is a software for Exploratory Data Analysis, a paradigm appeared in the Visual Analytics field of research.

Real-time visualization Profit from the fastest graph visualization engine to speed-up understanding and pattern discovery in large graphs. Layout Layout algorithms give the shape to the graph. Metrics The statistics and metrics framework offer the most common metrics for social network analysis (SNA) and scale-free networks. Networks over time Input/Output.

Dipity - Find, Create, and Embed Interactive Timelines. Infogr.am. Datawrapper. Pocketbook2012.pdf (application/pdf Object) World Databank. Opendata & données publiques. Données publiques / Open Data. FRANCE - Espérance de vie par départements. Public Data Explorer. Tableaux de l'Économie Française - Édition 2012. International Programs - Region Summary. Processing Request Frequently Asked Questions (FAQ) 1. What is the International Data Base? The International Data Base (IDB) offers a variety of demographic indicators for countries and areas of the world with a population of 5,000 or more. The IDB has provided access to demographic data for over 25 years to governments, academics, other organizations, and the public. It is funded by organizations that sponsor the research of the Census Bureau's International Programs Center for Demographic and Economic Studies. 2.

The IDB provides many types of demographic data, including: · Estimates and projections of: o Birth, death, and growth rates, migration rates, infant mortality, and life expectancy o Fertility rates o Total population and population by age and sex 3. The following ZIP file contains the complete data set which has currently been released and is used by the International Data Base tool. 4. 5. The Data Access page allows you to get information on these topics by country or region. Tableau Public. Google-refine. Creating A Database Table From A Summary Table. Category: General | [Item URL] Many users are familiar with Excel's pivot table feature, which creates a summary table from a database table. But what if you want to perform the opposite operation? This document describes how to create a database table from a simple two-variable summary table.

The worksheet below demonstrates. How to do it The solution to creating this "reverse pivot table" is to use a pivot table! Part 1: Creating a pivot table Activate any cell in your summary table Choose Data - PivotTable and PivotChart Report (the menu command may vary, depending on the version of Excel). Part 2: Finishing up At this point, you will have a small pivot table that shows only the sum of all values: Double-click the cell that contains the total (outlined in yellow, above). A VBA Macro to do it If you do this sort of thing on a regular basis, you may prefer to use a VBA macro. Excel Tips Excel has a long history, and it continues to evolve and change.

All Tips Browse Tips by Category Tip Books. Data exploration tutorial with google refine. Recently, Hugh Stimson published a great article: Data Mining My Old Radio Playlists. His post mix tutorials on php scripting, data cleaning with google refine and data analysis with PostgreSQL. This answer post demonstrate that data analysis is fully doable in google refine using really basic function (I'll be using GREL function only once for the long tail analysis). I guess also this post is a good illustration of my previous post on data exploration using google refine. Count number of play by episode and title To determine the unique number of different episode, in google refine, facet the episodename column, facet > text facet.

For this one I used two different facet on the title column. Facet > customized facet > facet by blank (at the bottom of the drop down list) The second one is the same as for the episodename describe previously. You got it? Most played track, artist and album Select the title column and text facet. Total 10 differents titles. The Long Tail To go further. Blog Archive » Data-Mining My Old Radio Playlists. In which I scrape all my old radio playlists off the web, cook them with Python, Google Refine and PostgreSQL, and discover that I played one heck of a long tail of songs. And J.J. Cale. A friend from the Ann Arbor years was in town for a con­fer­ence, and it put me in mind of my radio days. After every show I used to post up a link to the playlist auto­mat­i­cally gen­er­ated by the WCBN server. Like this for example. I was won­dering that, and I also happen to have a lot of idle time at night while I’m on sleeping-​​baby-​​monitoring duty. Scraping the Webpages with Python The first step was to actually get all that data from the web, prefer­ably in some more data-​​like format than raw HTML.

Python has built in capa­bil­i­ties for con­necting to a website and getting the HTML, but nav­i­gating the raw HTML tags for useful info sounds like a terrible idea. Here’s the Python code I wrote for the web scraping job. De-​​Duping in Google Refine Here’s the output: Analysis in PostgreSQL Damn.