background preloader

A Programmer's Guide to Data Mining

A Programmer's Guide to Data Mining
A guide to practical data mining, collective intelligence, and building recommendation systems by Ron Zacharski. About This Book Before you is a tool for learning basic data mining techniques. Most data mining textbooks focus on providing a theoretical foundation for data mining, and as result, may seem notoriously difficult to understand. Don’t get me wrong, the information in those books is extremely important. However, if you are a programmer interested in learning a bit about data mining you might be interested in a beginner’s hands-on guide as a first step.

http://guidetodatamining.com/

Related:  NewEd

What’s the “problem” with MOOCs? « EdTechDev In case the quotes didn’t clue you in, this post doesn’t argue against massive open online courses (MOOCs) such as the ones offered by Udacity, Coursera, and edX. I think they are very worthy ventures and will serve to progress our system of higher education. I do however agree with some criticisms of these courses, and that there is room for much more progress. I propose an alternative model for such massive open online learning experiences, or MOOLEs, that focuses on solving “problems,” but first, here’s a sampling of some of the criticisms of MOOCs. Criticisms of MOOCs

Data Mining Algorithms In R In general terms, Data Mining comprises techniques and algorithms, for determining interesting patterns from large datasets. There are currently hundreds (or even more) algorithms that perform tasks such as frequent pattern mining, clustering, and classification, among others. Understanding how these algorithms work and how to use them effectively is a continuous challenge faced by data mining analysts, researchers, and practitioners, in particular because the algorithm behavior and patterns it provides may change significantly as a function of its parameters. In practice, most of the data mining literature is too abstract regarding the actual use of the algorithms and parameter tuning is usually a frustrating task. On the other hand, there is a large number of implementations available, such as those in the R project, but their documentation focus mainly on implementation details without providing a good discussion about parameter-related trade-offs associated with each of them.

What Americans Keep Ignoring About Finland's School Success - Anu Partanen The Scandinavian country is an education superpower because it values equality more than excellence. Sergey Ivanov/Flickr Everyone agrees the United States needs to improve its education system dramatically, but how? One of the hottest trends in education reform lately is looking at the stunning success of the West's reigning education superpower, Finland. Trouble is, when it comes to the lessons that Finnish schools have to offer, most of the discussion seems to be missing the point. The small Nordic country of Finland used to be known -- if it was known for anything at all -- as the home of Nokia, the mobile phone giant. Python: Inverted Index for dummies An Inverted Index is an index data structure storing a mapping from content, such as words or numbers, to its document locations and is generally used to allow fast full text searches. The first step of Inverted Index creation is Document Processing In our case is word_index() that consist of word_split(), normalization and the deletion of stop words ("the", "then", "that"...). def word_split(text): word_list = [] wcurrent = [] windex = None for i, c in enumerate(text): if c.isalnum(): wcurrent.append(c) windex = i elif wcurrent: word = u''.join(wcurrent) word_list.append((windex - len(word) + 1, word)) wcurrent = [] if wcurrent: word = u''.join(wcurrent) word_list.append((windex - len(word) + 1, word)) return word_list word_split() is quite a long function that does a really simple job split words. You can rewrite it with just one line using something like re.split('\W+', text). Cleanup and Normalize are just to function filters to apply after word_split().

8 Books For a Higher Existence Books are magical inventions. By carrying meaning, they gives us glimpses of experience and knowledge from a different world. Phonetic language, being cut-off from time and place, the Now, helps both to encapsulate the ego more, but also to offer guidance to make it poriferous, letting Eros free. www.dcs.gla.ac.uk/Keith/Preface.html A book by C. J. van RIJSBERGEN B.Sc., Dip. NAAC, Ph.D., M.B.C.S., F.I.E.E., C.Eng., F.R.S.E. Information Retrieval Group, University of Glasgow PREFACE TO THE SECOND EDITION (London: Butterworths, 1979)

A Year in Reading 2012 By C. Max Magee posted at 6:00 am on December 3, 2012 20 The end of another year is here (so soon? Ah, I’m getting old), and with it a flood of valedictory lists and wrap ups, accountings and scorecards. Each year, as these lists spill out across the landscape, the onslaught becomes difficult to parse and begins to feel suspiciously (to us, anyway) like a marketing boondoggle to support the promotional-book-cover-sticker-and-blurb industry. Bruce Eckel's MindView, Inc: Thinking in Python You can download the current version of Thinking in Python here. This includes the BackTalk comment collection system that I built in Zope. The page describing this project is here. The current version of the book is 0.1.

Programs - IDEA IDEA has several programs involving research and development of real-world projects that align with our organizational mission: Promoting scientific, artistic and cultural literacy • WordFlare advances interest in and knowledge of the English language via a mobile app that presents interlinked concepts using SpicyNodes technology and gamefication. Coming Summer 2013.

Related: