background preloader

Data mining

Facebook Twitter

Interactive Visualizations or Small Multiples? Any time you find yourself with a data set with more than a few columns, say for example the 2012 Major League Baseball regular season stats sheet, you have a number of options when it comes to data exploration and discovery.

Interactive Visualizations or Small Multiples?

The data set has all “qualifying” players, their team & position, and their number of games played, at bats, hits, homeruns, doubles, RBIs, etc. for the 2012 regular season. You can find the table online here. Option #1: Table Madness You could just sort and filter the table, and maybe even color-code the cells with some fancy conditional formatting in Excel (Home > Conditional Formatting > Color Scales > Red – White – Blue Color Scale). Interesting approach, but this doesn’t make use of the human brain’s superior capacity to make use of position or size to compare values. I’ll save it for all those dense quarterly financial spreadsheets. A fairly quick scan shows that Homeruns and RBIs are fairly closely correlated in a positive sense (as are At bats and Runs).

Text Analysis of 2012 Presidential Debates. Obama more certain and positive - Romney more negative and direct Lately there’s been a craze in analyzing 140 character Tweets to make all sorts of inferences to in regard to everything from brand affinity to political opinion.

Text Analysis of 2012 Presidential Debates

While I’m generally of the position that the best return on investment of text analytics is on large volumes of comments, I fear we often overlook other interesting data sources in favor of what a small percentage (about 8%) of the population say in tweets or blogs. When the speakers are the current and possibly next president of the US, looking at what if anything can be gained by leveraging text analytics on even very small data sets start becoming more interesting. Therefore ahead of the final presidential debate between Obama and Romney we uploaded the last two presidential debates into our text analytics software, OdinText, to see what if anything political pundits and strategists might find useful.

The Devil in the Detail @TomHCAnderson @OdinText PS. 'When you find someone passionate about data, keep hold of them' Simon Kaffel, head of data and analysis at Zurich, has been in his role for a year after a long stint at Sky.

'When you find someone passionate about data, keep hold of them'

Lucy Handley talks to him about how he is building a data strategy for the business. Marketing Week (MW): When you start a new job with a blank sheet of paper, what do you do first? Simon Kaffel (SK): It is a big shift from where I was previously at Sky. Zurich is an interesting organisation. It is very established and it has a huge history. One of the things that really appealed to me with this role was that they said: ‘Here you go Simon, here is a blank sheet of paper, make the most of what we’ve got.’ I spent 11 years at Sky and the thought of taking a new job was quite daunting. At Sky I was focused on the UK and Republic of Ireland and now I’ve got a lot of internal clients around the globe with very different internal cultures. The focus at Zurich has not previously been on using data to improve our knowledge of you as a customer. So how can we manage that across the globe? Engineers are hard to come by! Here’s some ‘big data’ software to the rescue.

Sit down with execs from any tech company in America and ask them about their biggest obstacle to growth. 99 times out of a 100, they’ll tell you about their ongoing struggle to find technical talent.

Engineers are hard to come by! Here’s some ‘big data’ software to the rescue

Software engineers are more elusive than girls at a Silicon Valley tech conference. Similarly to finding a date, in a bid to attract the most sought-after engineers, recruiters typically turn to the Internet. Entelo, a startup that is announcing its public beta today, has developed an algorithm that connects tech recruiters with likely candidates based on their online activity. It has indexed over 300 million social profiles — the “big data” technology can determine where tech savvy types like to go online and can pickup on their interests and expertise. Most software engineers will exchange ideas and their most sophisticated code on sites like Github or Stackoverflow.

When they are unfulfilled in a current job, chances are they’ll share content on these sites with a renewed vigor. Another Word For It. Km.aifb.kit.edu/ws/semsearch10/Files/umass.pdf. Cran.r-project.org/doc/contrib/usingR.pdf.