background preloader

MINE: Maximal Information-based Nonparametric Exploration

MINE: Maximal Information-based Nonparametric Exploration
Introduction Many modern data sets, even those considered modestly sized, contain hundreds of thousands or even millions of variable pairs—far too many to examine manually. If you do not already know what kinds of relationships to search for, how do you efficiently identify the important ones? MIC and the MINE family The maximal information coefficient (MIC) is a measure of two-variable dependence designed specifically for rapid exploration of many-dimensional data sets. MIC is part of a larger family of maximal information-based nonparametric exploration (MINE) statistics, which can be used not only to identify important relationships in data sets but also to characterize them.

Pavel Risenberg Nooblast Project inspired by the old days Noösphere concept. Visualization picks the real-time data from public APIs and calculates overall strength of signal (recent network buzz) for two given keywords. Some picked events have geolocation information, so they mapped on the globe in the exact points. The overall strength visualized around the globe as “noo”-cloud, the size of which reflects event streams and shaped by geotagged data, building light abstract visual structures-snapshots in space for each term. It explores abstract visual component of generated crowd sourced info streams as the visual connection attaching you to the pulse of planet.

Ontology Tools Survey, Revisited July 14, 2004 A new survey of ontology editors was conducted as a follow-up to an initial survey conducted in 2002. The results of the survey are summarized in this article. The results of the original survey may be found at Ontologies are a way of specifying the structure of domain knowledge in a formal logic designed for machine processing. How to Use Pivot Tables to Mine Your Data To succeed at Six Sigma or any process improvement effort, you'll often have to analyze and summarize text data. Most companies have lots of transaction data from "flat files" like the one shown below, but because the data consists of text and raw numbers, they sometimes have a hard time figuring out what to do with it. These examples use Excel along with QI Macros for Excel: To summarize and analyze this data, you will want to learn how to use Excel's PivotTable tool. In past incarnations it was known as Crosstab (for cross tabulation). With Pivot Tables and the file above you could:

How to become a data visualization ninja with 3 free tools for non-programmers We noticed many times between the lines of this blog how data visualization is in the hype and how this trend is growing and growing. That’s good news guys! It’s fun and it’s … success! But as more and more people join this wild bunch we have to take care of those who are not as skilled as we are yet. There are many people out there who love data visualization but they think they are out of this business because they are not able to code.

Mapping with Anthracite and OmniGraffle Several people have asked me how I produced the visualizations that I used in my talk on While those visualizations were constructed with a rather eclectic mixture of homebrewed code, assorted applications, and a good bit of elbow grease, I decided to put together a tutorial for people who might be interested in this type of visual exploration, but are not inclined to write custom Perl code to do so. Hence, I have selected a combination of two very nice applications for Mac OS X - Anthracite and OmniGraffle - to produce similar visualizations. Neither application is free, but both have free trial periods, should you be inclined to check them out. As an alternative to OmniGraffle, the excellent open source program Graphviz may be used, although it is less user-friendly than OmniGraffle. The sequence of steps we will go through is as follows:

Dynamite plots: unmitigated evil? - Ecological Models and Data There seems to be a general opinion among statistical graphics nerds (examples here, here, and here) that the traditional way of plotting grouped continuous data (e.g. growth rates across fertilizer treatments) as a bar plot with a unidirectional "whisker" denoting the upper 95% confidence interval is bad. People who don't like them call them "dynamite plots". (Googling for 'dynamite plot' brings up web pages about statistical graphics, the Napoleon Dynamite movie, and terrorism [Basque and 19th-century].) I will review the criticisms of dynamite plots, which I generally agree with, but then want to put forward a couple of their advantages, and suggest that the generally favored Tukey box-and-whisker plot is not a universal solution to graphical problems. (continue) Disadvantages of dynamite plots:

Crossfilter Fast Multidimensional Filtering for Coordinated Views Crossfilter is a JavaScript library for exploring large multivariate datasets in the browser. Crossfilter supports extremely fast (<30ms) interaction with coordinated views, even with datasets containing a million or more records; we built it to power analytics for Square Register, allowing merchants to slice and dice their payment history fluidly. Since most interactions only involve a single dimension, and then only small adjustments are made to the filter values, incremental filtering and reducing is significantly faster than starting from scratch. Crossfilter uses sorted indexes (and a few bit-twiddling hacks) to make this possible, dramatically increasing the perfor­mance of live histograms and top-K lists.

Partiview Documentation | Uses | Binaries | Source Code | GeoWalls & Domes | Publications | Licensing | Mailing List | Misc. | Links Partiview is free, open-source software from the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign. It is an industrial strength, interactive, mono- or stereoscopic viewer for 4-dimensional datasets.

Thinkmap Visual Thesaurus We offer a number of programs that make it easy for institutions to purchase the Visual Thesaurus and provide access to their population at a discounted rate. We also offer special discounts to accredited academic institutions. Site Licenses to the Visual Thesaurus: With a Site License, your entire organization has instant access to the Visual Thesaurus, VocabGrabber, the Visual Thesaurus Spelling Bee, and the online magazine. Real-Time Data: You're Doing It Wrong When it comes to predicting the future, Chartbeat's CEO Tony Haile thinks you're awful. At the Mashable Media Summit, Haile spoke about the importance of real-time data and what your business should be doing with that information. "The more we think we know, the more expert we believe ourselves to be," says Haile, "and the more likely we are to trust our judgment when we shouldn't and get things wrong."

Strategies for critical thinking in learning, Part I Thinking and recall series Strategies for critical thinking in learning and project management Critical thinking studies a topic or problem with open-mindedness.This exercise outlines the first stage of applying a critical thinking approach to developing and understanding a topic. You will: Develop a statement of the topic List what you understand, what you've been told and what opinions you hold about it Identify resources available for research Define timelines and due datesand how they affect the development of your study Print the list as your reference

Emergent Futures Mapping with Futurescaper Futurescaper is an online tool for making sense of the drivers, trends and forces that will shape the future. As a user interface system, it still needs development. As a tool for analyzing and understanding complex systems, it works very well and does something I have yet to see anything else be able to do. Several people asked me about this after my last post, so here is some more detail. Following the logic of collective intelligence (as part of my my PhD), I broke up the the scenario thinking process into discrete chunks, came up with a system for analyzing and relating them together, and then distilled them into key outputs for helping the scenario development process. Emergent Thematic Maps