background preloader

Blogs

Facebook Twitter

Introduction to Machine Learning with Python and Scikit-Learn. Hello, %username%!

Introduction to Machine Learning with Python and Scikit-Learn

My name is Alex. I deal with machine learning and web graphs analysis (mostly in theory). I also work on the development of Big Data products for one of the mobile operators in Russia. It’s the first time I write a post, so please, don’t judge me too harshly. Nowadays, a lot of people want to develop efficient algorithms and take part in machine learning competitions. Actually, it has been the introduction to Data Science. The most common tools for a Data Scientist today are R and Python. Please note that we will focus on Machine Learning algorithms in the article.

Data Loading. Neglected machine learning ideas. This post is inspired by the “metacademy” suggestions for “leveling up your machine learning.”

Neglected machine learning ideas

They make some halfway decent suggestions for beginners. The problem is, these suggestions won’t give you a view of machine learning as a field; they’ll only teach you about the subjects of interest to authors of machine learning books, which is different. The level-3 and level-4 suggestions they make are not super useful either: they just reflect the tastes of the author. The machine learning literature is vast, techniques are bewilderingly diverse, multidisciplinary and seemingly unrelated. It is extremely difficult to know what is important and useful. Stuff I think is egregiously neglected in books, and in academia in unranked semi-clustered listing below: Online learning: not the “Khan academy” kind, the “exposing your learners to data, one piece at a time, the way the human brain works” kind. Learning From Data - Online Course (MOOC) A real Caltech course, not a watered-down version on YouTube & iTunes Free, introductory Machine Learning online course (MOOC) Taught by Caltech Professor Yaser Abu-Mostafa [article]Lectures recorded from a live broadcast, including Q&APrerequisites: Basic probability, matrices, and calculus8 homework sets and a final examDiscussion forum for participantsTopic-by-topic video library for easy review.

Learning From Data - Online Course (MOOC)

An idiot learns Bayesian analysis: Part 1. I've done a dreadful job of reading The Theory That Would Not Die, but several weeks ago I somehow managed to read the appendix.

An idiot learns Bayesian analysis: Part 1

Here the author gives a short explanation of Bayes' theorem using statistics related to breast cancer and mammogram results. This is the same real world example (one of several) used by Nate Silver. It's profound in its simplicity and- for an idiot like me- a powerful gateway drug. Possibly related to this is my recent epiphany that when we're talking about Bayesian analysis, we're really talking about multivariate probability. The breast cancer/mammogram example is the simplest form of multivariate analysis available. The Theory That Would Not Die is sitting at my desk at work, so I'm going to refer to the figures quoted by Nate Silver on page 246. DataTau. Learning the meaning behind words.

Today computers aren't very good at understanding human language, and that forces people to do a lot of the heavy lifting—for example, speaking "searchese" to find information online, or slogging through lengthy forms to book a trip.

Learning the meaning behind words

Computers should understand natural language better, so people can interact with them more easily and get on with the interesting parts of life. While state-of-the-art technology is still a ways from this goal, we’re making significant progress using the latest machine learning and natural language processing techniques. Deep learning has markedly improved speech recognition and image classification.

For example, we’ve shown that computers can learn to recognize cats (and many other objects) just by observing large amount of images, without being trained explicitly on what a cat looks like. Now we apply neural networks to understanding words by having them “read” vast quantities of text on the web. Active learning, almost black magic. I've written Duke, an engine for figuring out which records represent the same thing.

Active learning, almost black magic

Machine Learning, Neural and Statistical Classification. The above book (originally published in 1994 by Ellis Horwood) is now out of print.

Machine Learning, Neural and Statistical Classification

The copyright now resides with the editors who have decided to make the material freely available on the web. From the Back Cover This book is based on the EC (ESPRIT) project StatLog which compare and evaluated a range of classification techniques, with an assessment of their merits, disadvantages and range of application. Blog : Launching a Democratization of Data Science. February 9, 2012 It’s a sad but true fact that most data that’s generated or collected—even with considerable effort—never gets any kind of serious analysis.

Blog : Launching a Democratization of Data Science

But in a sense that’s not surprising. Because doing data science has always been hard. And even expert data scientists usually have to spend lots of time wrangling code and data to do any particular analysis. I myself have been using computers to work with data for more than a third of a century. The key idea is automation. And what’s amazing to me is that it actually works. In the past, when I’d really been motivated, I’d take some data here or there, read it into Mathematica, and use some of the powerful tools there to do some analysis or another. The basic idea is very much in line with the whole core mission of Wolfram|Alpha: to take expert-level knowledge, and create a system that can apply it automatically whenever and wherever it’s needed.

There are several pieces to the whole problem. There are always tricky issues. Machine learning for cyber security public version 11 oct 11 - (Current Session: WorkingSession) Good Machine Learning Blogs.