background preloader

Data Mining: Text Mining, Visualization and Social Media

Data Mining: Text Mining, Visualization and Social Media
Information is Beautiful is a thought provoking labour of love by one of the first true data journalists, David McCandless. It is a simply structured collection of graphical interpretations of a variety of interesting statistics, factoids and opinions. It is compelling in its ability to provoke exclamations of surprise at the relationships between facts (e.g. the financial crisis costing us almost four times more than the expected total cost of the west's adventurism in Iraq and Afghanistan) as well as generating respect for the creativity and design that has gone in to presenting the information. That being said, the book also illustrates the very tricky position of a data journalist (or whatever we eventually call those individuals who render 'information' visually). Visualization of data in the form of graphics and the expression of facts, opinions, processes, etc. in the form of information visualizations is, essentially, a new language.

Related:  Data Viz Trends

Junk Charts Via Twitter, Bart S (@BartSchuijt) sent me to this TechCrunch article, which contains several uninspiring charts. The most disturbing one is this: There is a classic Tufte class here: only five numbers and yet the chart is so confusing. And yes, they reversed the axis. Lower means higher "app abandonment" and higher means lower "app abandonment". Data Mining: What is Data Mining? Overview Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data.

Visual Thinking + Synthesis Photo by Ken Yeung I really enjoy talking complex subjects, processes or business problems and boiling them down to their core essence. This is becoming known as the process of "Visual Thinking". I use visual metaphors and storytelling to do this. My style of visual thinking is immediately recognizable and has helped me build a strong following of influential professionals who use my visuals in their own presentations and documents.

Matthew Ericson – The winners of the 34th Edition of the Best of News Design contest were released today, so I’ve updated my interactive crosstab of SND winners that lets you see at a glance which publications won awards in which categories. One particularly interesting thing to me: There were only 19 awards give in the information graphics categories — 17 for individual works and 2 for portfolios. That’s down from 97 just three years ago. I’d be curious to know how much of the decline comes from fewer print graphics being produced in general in newspapers — and probably also fewer entries in the contest — and how much is from a different, and much tougher, set of judges than in past years. Just pushed out an update to the Adobe Illustrator MultiExporter script that lets you specify if you want to export PNGs and JPGs at a different scale factor so that you can generate versions of the images at double resolution for iPhone retina displays.

Supports de cours Cette page recense les supports utilisés pour mes enseignements de Machine Learning, Data Mining et de Data Science au sein du Département Informatique et Statistique (DIS) de l'Université Lyon 2, principalement en Master 2 Statistique et Informatique (SISE), formation en statistique et informatique, dans le cadre du traitement statistique des données et de la valorisation des big data. Je suis très attentif à la synergie forte entre l'informatique et les statistiques dans ce diplôme, ce sont là les piliers essentiels du métier de data scientist. Attention, pour la majorité, il s'agit de « slides » imprimés en PDF, donc très peu formalisés, ils mettent avant tout l'accent sur le fil directeur du domaine étudié et recensent les points importants. Cette page est bien entendu ouverte à tous les statisticiens, data miner et data scientist, étudiants ou pas, de l'Université Lyon 2 ou d'ailleurs.

Latest As I mentioned in my previous post, our collaboration with the Sabeti Lab is aimed at creating new visual exploration tools to help researchers, doctors, and clinicians discover patterns and associations in large health and epidemiological datasets. These tools will be the first step in a hypothesis-generation process, combining intuition from expert users with visualization techniques and automated algorithms, allowing users to quickly test hypothesis that are “suggested” by the data itself. Researchers and doctors have a deep familiarity with their data and often can tell immediately when a new pattern is potentially interesting or simply the result of noise.

5 of the Best Free and Open Source Data Mining Software The process of extracting patterns from data is called data mining. It is recognized as an essential tool by modern business since it is able to convert data into business intelligence thus giving an informational edge. At present, it is widely used in profiling practices, like surveillance, marketing, scientific discovery, and fraud detection. feltron Mapping Neighborhoods in Boston, San Francisco and New York. Hand-drawn animation of 43 years of the Sun’s weather. (via kottke) William Stone Branching Drawings (identified by wowgreat)

chartsnthings 19 Sketches of Quarterback Timelines On Sunday Eli Manning started his 150th consecutive game for the Giants, the highest active streak in the NFL and the third-longest streak in NFL history. (One of the other two people above him is his brother, Peyton.) The graphics department published an interactive graphic that put Eli’s streak in the context of about 2,000 streaks from about 500 pro quarterbacks. The graphic lets you explore the qbs and search for any quarterback or explore a team to go down memory lane for your team.