background preloader

Deep Reinforcement Learning: Pong from Pixels

Deep Reinforcement Learning: Pong from Pixels
This is a long overdue blog post on Reinforcement Learning (RL). RL is hot! You may have noticed that computers can now automatically learn to play ATARI games (from raw game pixels!), they are beating world champions at Go, simulated quadrupeds are learning to run and leap, and robots are learning how to perform complex manipulation tasks that defy explicit programming. Examples of RL in the wild. It’s interesting to reflect on the nature of recent progress in RL. Compute (the obvious one: Moore’s Law, GPUs, ASICs), Data (in a nice form, not just out there somewhere on the internet - e.g. Similar to what happened in Computer Vision, the progress in RL is not driven as much as you might reasonably assume by new amazing ideas. Now back to RL. Pong from pixels Left: The game of Pong. The game of Pong is an excellent example of a simple RL task. Policy network. Our policy network is a 2-layer fully-connected net. where in this snippet W1 and W2 are two matrices that we initialize randomly. Related:  RLDeep Learning

A (Long) Peek into Reinforcement Learning In this post, we are gonna briefly go over the field of Reinforcement Learning (RL), from fundamental concepts to classic algorithms. Hopefully, this review is helpful enough so that newbies would not get lost in specialized terms and jargons while starting. [WARNING] This is a long read. A couple of exciting news in Artificial Intelligence (AI) has just happened in recent years. What is Reinforcement Learning? Say, we have an agent in an unknown environment and this agent can obtain some rewards by interacting with the environment. Fig. 1. The goal of Reinforcement Learning (RL) is to learn a good strategy for the agent from experimental trials and relative simple feedback received. Key Concepts Now Let’s formally define a set of key concepts in RL. The agent is acting in an environment. The model defines the reward function and transition probabilities. Know the model: planning with perfect information; do model-based RL. Fig. 2. ST is the terminal state. Model: Transition and Reward Then,

The Truth About Deep Learning - Quantified Come on people — let’s get our shit together about deep learning. I’ve been studying and writing about DL for close to two years now, and it still amazes the misinformation surrounding this relatively complex learning algorithm. This post is not about how Deep Learning is or is not over-hyped, as that is a well documented debate. Rather, it’s a jumping off point for a (hopefully) fresh, concise understanding of deep learning and its implications. The Problem Even the most academic among us mistakenly merge two very different schools of thought in our discussions on deep learning: The benefits of neural networks over other learning algorithms.The benefits of a “deep” neural network architecture over a “shallow” architecture. Much of the debating going on is surprisingly still concerned with the first point instead of the second. The idea I’d like for you to take away here is that we are not asking the right question for the answer which we desire. The Answer (?)

Hello, TensorFlow! The TensorFlow project is bigger than you might realize. The fact that it's a library for deep learning, and its connection to Google, has helped TensorFlow attract a lot of attention. But beyond the hype, there are unique elements to the project that are worthy of closer inspection: The core library is suited to a broad family of machine learning techniques, not “just” deep learning. Linear algebra and other internals are prominently exposed. Cool stuff, but—especially for someone hoping to explore machine learning for the first time—TensorFlow can be a lot to take in. Names and execution in Python and TensorFlow The way TensorFlow manages computation is not totally different from the way Python usually does. The variable names in Python code aren't what they represent; they're just pointing at objects. >>> foo = [] >>> bar = foo>>> foo == bar >>> foo is bar You can also see that id(foo) and id(bar) are the same. >>> foo.append(bar) >>> foo The simplest TensorFlow graph >>> input_value

Write an AI to win at Pong from scratch with Reinforcement Learning We’re now going to follow the code in me_pong.py. Please keep it open and read along! The code starts here: def main(): Initialization First, let’s use OpenAI Gym to make a game environment and get our very first image of the game. Next, we set a bunch of parameters based off of Andrej’s blog post. batch_size: how many rounds we play before updating the weights of our network.gamma: The discount factor we use to discount the effect of old actions on the final result. Then, we set counters, initial values, and the initial weights in our Neural Network. Weights are stored in matrices. Layer 2 is a 200 x 1 matrix representing the weights of the output of the hidden layer on our final output. We initialize each layer’s weights with random numbers for now. Next, we set up the initial parameters for RMSProp (a method for updating weights that we will discuss later). Ok we’re all done with the setup! Phew. Figuring out how to move As you can see, it’s not many steps at all! Learning Awesome!

Sutton & Barto Book: Reinforcement Learning: An Introduction Second Edition (see herefor the first edition) MIT Press, Cambridge, MA, 2018 Buy from Amazon ErrataFull Pdf pdf without margins (good for ipad)New Code Old Code Solutions -- send in your solutions for a chapter, get the official ones back (currently incomplete)Teaching AidsLiterature sources cited in the book Latex Notation -- Want to use the book's notation in your own work? Download this .sty file and this example of its use Help out! If you enjoyed the book, why not give back to the community? I am collecting a public directory with pdf files of the original sources cited in the book.

Guest Post (Part I): Demystifying Deep Reinforcement Learning - Nervana Two years ago, a small company in London called DeepMind uploaded their pioneering paper “Playing Atari with Deep Reinforcement Learning” to Arxiv. In this paper they demonstrated how a computer learned to play Atari 2600 video games by observing just the screen pixels and receiving a reward when the game score increased. The result was remarkable, because the games and the goals in every game were very different and designed to be challenging for humans. It has been hailed since then as the first step towards general artificial intelligence – an AI that can survive in a variety of environments, instead of being confined to strict realms such as playing chess. Still, while deep models for supervised and unsupervised learning have seen widespread adoption in the community, deep reinforcement learning has remained a bit of a mystery. The roadmap ahead: What are the main challenges in reinforcement learning? Consider the game Breakout. Figure 1: Atari Breakout game.

Introducing FBLearner Flow: Facebook's AI backbone Many of the experiences and interactions people have on Facebook today are made possible with AI. When you log in to Facebook, we use the power of machine learning to provide you with unique, personalized experiences. Machine learning models are part of ranking and personalizing News Feed stories, filtering out offensive content, highlighting trending topics, ranking search results, and much more. In some of our earliest work to leverage AI and ML — such as delivering the most relevant content to each person — we noticed that the largest improvements in accuracy often came from quick experiments, feature engineering, and model tuning rather than applying fundamentally different algorithms. To address these points, we wanted a platform with the following properties: We decided to build a brand-new platform, FBLearner Flow, capable of easily reusing algorithms in different products, scaling to run thousands of simultaneous custom experiments, and managing experiments with ease.

A Neural Network in 11 lines of Python (Part 1) - i am trask Summary: I learn best with toy code that I can play with. This tutorial teaches backpropagation via a very simple toy example, a short python implementation. Edit: Some folks have asked about a followup article, and I'm planning to write one. Just Give Me The Code: 01.X = np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ]) 02.y = np.array([[0,1,1,0]]).T 03.syn0 = 2*np.random.random((3,4)) - 1 04.syn1 = 2*np.random.random((4,1)) - 1 05.for j in xrange(60000): 06. l1 = 1/(1+np.exp(-(np.dot(X,syn0)))) 07. l2 = 1/(1+np.exp(-(np.dot(l1,syn1)))) 08. l2_delta = (y - l2)*(l2*(1-l2)) 09. l1_delta = l2_delta.dot(syn1.T) * (l1 * (1-l1)) 10. syn1 += l1.T.dot(l2_delta) 11. syn0 += X.T.dot(l1_delta) Other Languages: D, C++ However, this is a bit terse…. let’s break it apart into a few simple parts. Part 1: A Tiny Toy Network A neural network trained with backpropagation is attempting to use input to predict output. Consider trying to predict the output column given the three input columns. 2 Layer Neural Network:

Rendering OpenAi Gym in Google Colaboratory. – StarAi – StarAi Applied Research Blog By: Paul Steven Conyngham Early this year (2018) Google introduced free GPUs to their machine learning tool “Colaboratory”, making it the perfect platform for doing machine learning work or research. If you are looking at getting started with Reinforcement Learning however, you may have also heard of a tool released by OpenAi in 2016, called “OpenAi Gym”. Unfortunately if you are looking at learning reinforcement learning or even performing research, it is currently impossible to see your agents results “live” in your Colaboratory browser, until now. From September to November 2018, StarAi ran through a Deep Reinforcement Learning course at the Microsoft Reactor in central Sydney. Developed by William Xu, our rendering solution makes use of PyVirtualDisplay, python-opengl, xvfb & the ffmpeg encoder libraries. With all the being said, lets get started. OpenAi Gym Colaboratory Rendering code. First we need to install the relevant libraries to make rendering possible. ! Note that the “!”

Deep Learning Visualisation of Global Cargo Ships | By Kiln and UCL Machine Learning Exercises In Python, Part 5 This post is part of a series covering the exercises from Andrew Ng's machine learning class on Coursera. The original code, exercise text, and data files for this post are available here. Part 1 - Simple Linear RegressionPart 2 - Multivariate Linear RegressionPart 3 - Logistic RegressionPart 4 - Multivariate Logistic RegressionPart 5 - Neural NetworksPart 6 - Support Vector MachinesPart 7 - K-Means Clustering & PCAPart 8 - Anomaly Detection & Recommendation In part four we wrapped up our implementation of logistic regression by extending our solution to handle multi-class classification and testing it on the hand-written digits data set. I'll note up front that the math (and code) in this exercise gets a bit hairy. Since the data set is the same one we used in the last exercise, we'll re-use the code from last time to load the data. Since we're going to need these later (and will use them often), let's create some useful variables up-front. def sigmoid(z): return 1 / (1 + np.exp(-z))

Related: