 # Math

Differentiation: composite, implicit, and inverse functions. Optimization inequalities cheatsheet. Most proofs in optimization consist in using inequalities for a particular function class in some creative way. This is a cheatsheet with inequalities that I use most often. It considers class of functions that are convex, strongly convex and L-smooth. Setting. f is a function Rp→R. Below are a set of inequalities that are verified when f belongs to a particular class of functions and x,y∈Rp are arbitrary elements in its domain. For simplicity I'm assuming that functions are differentiable, but most of these are also true replacing the gradient with a subgradient. f is L-smooth. ∥∇f(y)−∇f(x)∥≤L∥x−y∥ |f(y)−f(x)−⟨∇f(x),y−x⟩|≤L2∥y−x∥2 ∇2f(x)⪯L (assuming f is twice differentiable)

Introduction to Linear Algebra for Applied Machine Learning with Python. Introduction to Linear Algebra for Applied Machine Learning with Python 26 May 2020 Linear algebra is to machine learning as flour to bakery: every machine learning model is based in linear algebra, as every cake is based in flour. It is not the only ingredient, of course. Machine learning models need vector calculus, probability, and optimization, as cakes need sugar, eggs, and butter. Applied machine learning, like bakery, is essentially about combining these mathematical ingredients in clever ways to create useful (tasty?) Calcul matriciel. Dernière mise à jour : 9 Septembre 2008 I. Définitions Une matrice n × m est un tableau de nombres à n lignes et m colonnes : n et m sont les dimensions de la matrice. Une matrice est symbolisée par une lettre en caractères gras, par exemple A. On note [Aij] la matrice d'élément général Aij. Si m = 1, la matrice est appelée vecteur (plus précisément vecteur-colonne) : N.B. : Dans ce chapitre, nous utiliserons des lettres majuscules pour les matrices et des lettres minuscules pour les vecteurs, mais ce n'est pas obligatoire. Lei Mao's Log Book – Principal Component Analysis. Introduction Principal components analysis (PCA) is one of a family of techniques for taking high-dimensional data, and using the dependencies between the variables to represent it in a more tractable, lower-dimensional form, without losing too much information. It has been widely used for data compression and de-noising. However, its entire mathematical process is sometimes ambiguous to the user. In this article, I would like to discuss the entire process of PCA mathematically, including PCA projection and reconstruction, with most of the derivations and proofs provided. At the end of the article, I implemented PCA projection and reconstruction from scratch. Prerequisites Orthogonal Matrix. An Intuitive Guide to Linear Algebra. Despite two linear algebra classes, my knowledge consisted of “Matrices, determinants, eigen something something”. Why? Well, let’s try this course format: Name the course Linear Algebra but focus on things called matrices and vectorsTeach concepts like Row/Column order with mnemonics instead of explaining the reasoningFavor abstract examples (2d vectors! 3d vectors!) And avoid real-world topics until the final week. Companion webpage to the book “Mathematics for Machine Learning”. Copyright 2020 by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. Published by Cambridge University Press. Calculus Explained with pics and gifs - 0a.io - What is (Gaussian) curvature? A previous article already introduced manifolds and some of their properties. The matrix calculus you need for deep learning. Terence Parr and Jeremy Howard (We teach in University of San Francisco's MS in Data Science program and have other nefarious projects underway. You might know Terence as the creator of the ANTLR parser generator. For more material, see Jeremy's fast.ai courses and University of San Francisco's Data Institute in-person version of the deep learning course.) Printable version (This HTML was generated from markup using bookish) Abstract This paper is an attempt to explain all the matrix calculus you need in order to understand the training of deep neural networks.

Introduction. Bayesian Methods for Hackers. An intro to Bayesian methods and probabilistic programming from a computation/understanding-first, mathematics-second point of view. Prologue The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Markov Chain Monte Carlo Without all the Bullshit. I have a little secret: I don’t like the terminology, notation, and style of writing in statistics. I find it unnecessarily complicated. This shows up when trying to read about Markov Chain Monte Carlo methods. Counterintuitive Properties of High Dimensional Space - Lost in Spacetime. Markov Chains explained visually. Explained Visually By Victor Powell with text by Lewis Lehe Markov chains, named after Andrey Markov, are mathematical systems that hop from one "state" (a situation or set of values) to another. For example, if you made a Markov chain model of a baby's behavior, you might include "playing," "eating", "sleeping," and "crying" as states, which together with other behaviors could form a 'state space': a list of all possible states. The matrix calculus you need for deep learning. Introduction To Calculus With Derivatives. Written February 18, 2018 Suppose you need to calculate 1012 but you don't have a calculator handy. How would you estimate it? 1002 is pretty easy: 100∗100=10000. But 1012 seems tougher. Or suppose you had to estimate 4.12.