background preloader

Haowu2

Facebook Twitter

Hao WU

Обзор наиболее интересных материалов по анализу данных и машинному обучению №19 (20 — 26 октября 2014) So you wanna try Deep Learning? - Exchangeable random experiments. I’m keeping this post quick and dirty, but at least it’s out there.

So you wanna try Deep Learning? - Exchangeable random experiments

The gist of this post is that I put out a one file gist that does all the basics, so that you can play around with it yourself. First of all, I would say that deep learning is simply kernel machines whose kernel we learn. That’s gross but that’s not totally false. Second of all, there is nothing magical about deep learning, just that we can efficiently train (GPUs, clusters) large models (millions of weights, billions if you want to make a Wired headline) on large datasets (millions of images, thousands of hours of speech, more if you’re GOOG/FB/AAPL/MSFT/NSA).

I think a good part of the success of deep learning comes from the fact that practitionners are not affraid to go around beautiful mathematical principles to have their model work on whatever dataset and whatever task. What is a deep neural network? A series of matrix multiplications and non-linearities. Stuff you’ll learn Generic Unsupervised Pre-Training Dropout. The Unreasonable Effectiveness of Recurrent Neural Networks. There’s something magical about Recurrent Neural Networks (RNNs).

The Unreasonable Effectiveness of Recurrent Neural Networks

I still remember when I trained my first recurrent network for Image Captioning. Within a few dozen minutes of training my first baby model (with rather arbitrarily-chosen hyperparameters) started to generate very nice looking descriptions of images that were on the edge of making sense. Sometimes the ratio of how simple your model is to the quality of the results you get out of it blows past your expectations, and this was one of those times.

What made this result so shocking at the time was that the common wisdom was that RNNs were supposed to be difficult to train (with more experience I’ve in fact reached the opposite conclusion). Fast forward about a year: I’m training RNNs all the time and I’ve witnessed their power and robustness many times, and yet their magical outputs still find ways of amusing me. We’ll train RNNs to generate text character by character and ponder the question “how is that even possible?” Deep learning Reading List. Tutorial: How to detect spurious correlations, and how to find the real ones. Specifically designed in the context of big data in our research lab, the new and simple strong correlation synthetic metric proposed in this article should be used, whenever you want to check if there is a real association between two variables, especially in large-scale automated data science or machine learning projects.

Tutorial: How to detect spurious correlations, and how to find the real ones

Use this new metric now, to avoid being accused of reckless data science and even being sued for wrongful analytic practice. In this paper, the traditional correlation is referred to as the weak correlation, as it captures only a small part of the association between two variables: weak correlation results in capturing spurious correlations and predictive modeling deficiencies, even with as few as 100 variables. In short, even nowadays, what makes two variables X and Y seem related in most scientific articles and pretty much all articles written by journalists, is based on ordinary (weak) regression. 1. Formal definition of strong correlation Let's define 2. 3. 4. 5.

Understanding Convolution in Deep Learning. Convolution is probably the most important concept in deep learning right now.

Understanding Convolution in Deep Learning

It was convolution and convolutional nets that catapulted deep learning to the forefront of almost any machine learning task there is. But what makes convolution so powerful? How does it work? In this blog post I will explain convolution and relate it to other concepts that will help you to understand convolution thoroughly. There are already some blog post regarding convolution in deep learning, but I found all of them highly confusing with unnecessary mathematical details that do not further the understanding in any meaningful way.

Synaptic - The javascript neural network library. Take Control By Creating Targeted Lists of Machine Learning Algorithms. Any book on machine learning will list and describe dozens of machine learning algorithms.

Take Control By Creating Targeted Lists of Machine Learning Algorithms

Once you start using tools and libraries you will discover dozens more. This can really wear you down, if you think you need to know about every possible algorithm out there. A simple trick to tackle this feeling and take some control back is to make lists of machine learning algorithms. This ridiculously simple tactic can give you a lot of power. You can use it to give you a list of methods to try when tackling a whole new class of problem. In this post you will discover the benefits of creating lists of machine learning algorithms, how to do it, how to do it well and why you should start creating your first list of algorithms today.

Create a List of Machine Learning AlgorithmsPhoto by Joel Montes de Oca, some rights reserved Dealing with So Many Algorithms There are hundreds of machine learning algorithms. I see this leading to two problems: 1. 2. Favorites are dangerous.