# ML

Concepts - Metacademy. David M. Blei. Topic modeling Topic models are a suite of algorithms that uncover the hidden thematic structure in document collections. These algorithms help us develop new ways to search, browse and summarize large archives of texts. Below, you will find links to introductory materials, corpus browsers based on topic models, and open source software (from my research group) for topic modeling. Introductory materials I wrote a general introduction to topic modeling . John Lafferty and I wrote a more technical review paper about this field. Here are slides from some recent tutorials about topic modeling: Here is a video from a talk on dynamic and correlated topic models applied to the journal Science .

Corpus browsers based on topic models The structure uncovered by topic models can be used to explore an otherwise unorganized collection. A 100-topic browser of the dynamic topic model fit to Science (1882-2001). Topic modeling software. On-line compression modelling. The Shape of Data | Exploring the geometry behind machine learning, data mining, etc. General regression and over fitting | The Shape of Data.

In the last post, I discussed the statistical tool called linear regression for different dimensions/numbers of variables and described how it boils down to looking for a distribution concentrated near a hyperplane of dimension one less than the total number of variables (co-dimension one). For two variables this hyperplane is just a line, which is what you may usually think of regression as. In this post, I’ll discuss a more flexible version of regression, in which we allow the line or hyperplane to be curved. First, we need to look at regression from a slightly different perspective.

When we originally fit a line to our data set, we treated the x and y coordinates interchangeably. We can also think of a line as a function that takes a value x and outputs a value y = cx + b for some pre-chosen parameters c and b that are determined by the regression algorithm. . , we can describe a hyperplane by a function , where and are parameters that are calculated by the regression algorithm. . . About the Curse of Dimensionality. Introduction In this article, we will discuss the so called 'Curse of Dimensionality', and explain why it is important when designing a classifier.

In the following sections I will provide an intuitive explanation of this concept, illustrated by a clear example of overfitting due to the curse of dimensionality. Consider an example in which we have a set of images, each of which depicts either a cat or a dog. We would like to create a classifier that is able to distinguish dogs from cats automatically. To do so, we first need to think about a descriptor for each object class that can be expressed by numbers, such that a mathematical algorithm, i.e. a classifier, can use these numbers to recognize the object. We could for instance argue that cats and dogs generally differ in color.

If 0.5*red + 0.3*green + 0.2*blue > 0.6 : return cat; else return dog; However, these three color-describing numbers, called features, will obviously not suffice to obtain a perfect classification. Conclusion.

## Data Set Repository

CSE 291. Topics in High Dimensional Data Analysis. Subject An advanced graduate seminar in statistical methods for dimensionality reduction, feature selection, matrix factorization, Bayesian regularization, and distance metric learning. There will be some formal lectures, but most class meetings will be devoted to discussion of recent papers or student projects. Prerequisites Enrollment by permission of the instructor. Students should have completed previous graduate coursework in machine learning. Administrivia Professor: Lawrence Saul Meetings: Tue/Thu 11-12:20, EBU3B 2154 Grading: 50% class participation (oral), 50% course project (written) Project due date: Fri June 15Tentative reading list Syllabus.

9 Free Books for Learning Data Mining and Data Analysis. Whether you are learning data science for the first time or refreshing your memory or catching up on latest trends, these free books will help you excel through self-study. By Alex Ivanovs, CodeCondo, Apr 29, 2014. Data mining, data analysis, these are the two terms that very often make the impressions of being very hard to understand – complex – and that you’re required to have the highest grade education in order to understand them. I can only disagree, and as with anything in this wonderful life of ours, we only need to spend a certain amount of time learning something, practicing it, before we realize that it’s not really all that hard. No doubt that there are very smart people in this World, working for large corporations such as Google, Apple, Microsoft and plenty more (including security agencies), but if we continue to look up to them; we will always think it’s hard, because we have never given ourselves the chance to look at real examples and facts.

Data Mining Algorithms In R. Josephmisiti/machine-learning-module. Machine Learning. Semester 2, 2007 Lecturer: Professor. M. A .Girolami Lectures: Monday, 1.00pm, (204) Thursday, 1.00pm, 515 (4B) Laboratory & Tutorial: Friday, 1.00pm, Boyd Orr, Level 4 Laboratory Module Descriptor Student Ratings of Module 2006 I feel I learned a lot from the course and it pushed me!

Altogether I really enjoyed the course and I learned a lot of useful techniques. I really liked your lectures especially the tutorials and scripts - Clement Rodegast (Exchange Student CS, 2006) Matlab Tutorials Suggested Books The Elements of Statistical Learning: Data Mining, Inference, and Prediction., Hastie, Tibshirani & Friedman, Book Website Pattern Classification, 2nd Edition., Duda, Hart & Stork, Book Website Pattern Recognition & Machine Learning ., Bishop, Book Website Data Repositories UCI Data Repository All the standard data collections used to illustrate various Machine Learning methods are found here DELVE Datasets for Evaluating and Comparing Learning Methods Useful Reference Material. Independent Component Analysis for Dummies. Principal Components Analysis. Principal Components Analysis Suppose you have samples located in environmental space or in species space (See Similarity, Difference and Distance). If you could simultaneously envision all environmental variables or all species, then there would be little need for ordination methods.

However, with more than three dimensions, we usually need a little help. What PCA does is that it takes your cloud of data points, and rotates it such that the maximum variability is visible. Another way of saying this is that it identifies your most important gradients. Let us take a hypothetical example where you have measured three different species, X1, X2, and X3: In this example, it is possible (though it might be difficult) to tell that X1 and X2 are related to each other, and it is less clear whether X3 is related to either X1 or X2. (Note that X2 has negative values, something that will not happen with real species. We have only plotted two PCA Axes. PCA Axis 1: 63% PCA Axis 2: 33% PCA Axis 3: 4% Official VideoLectures.NET Blog » 100 most popular Machine Learning talks at VideoLectures.Net. Enjoy this weeks list! 26971 views, 1:00:45, Gaussian Process Basics, David MacKay, 8 comments7799 views, 3:08:32, Introduction to Machine Learning, Iain Murray16092 views, 1:28:05, Introduction to Support Vector Machines, Colin Campbell, 22 comments5755 views, 2:53:54, Probability and Mathematical Needs, Sandrine Anthoine, 2 comments7960 views, 3:06:47, A tutorial on Deep Learning, Geoffrey E.

Hinto3858 views, 2:45:25, Introduction to Machine Learning, John Quinn, 1 comment13758 views, 5:40:10, Statistical Learning Theory, John Shawe-Taylor, 3 comments12226 views, 1:01:20, Semisupervised Learning Approaches, Tom Mitchell, 8 comments1596 views, 1:04:23, Why Bayesian nonparametrics? , Zoubin Ghahramani, 1 comment11390 views, 3:52:22, Markov Chain Monte Carlo Methods, Christian P.

Robert, 5 comments3153 views, 2:15:00, Data mining and Machine learning algorithms, José L. Good Machine Learning Blogs. Resources | Representation Learning. I would like to know the answer to question 4 of file finalH12en.pdf especially for the cases (b) and (c) . Here are my answers: using the same format (1-with, 1-w/o, 2-with, 2-w/o) a) 1-w/o: decrease 2-w/o:u-shaped curve, 1-with: decrease, 2-with: u-shaped (I expect the test error to increase much later compared to the previous case and over a very long iteration number) b) 1-w/o: increase, 2-w/o: decrease, 1-with: increase, 2-with: decrease c) 1-w/o: no-change (I expect the auto-encoder to reconstruct the same corrupted input w/o applying the denoising criterion. 2-w/o:increase 1-with: increase 2-with:u-shape (I expect the test error to decrease by increasing the corruption level and then increase when it passes some threshold and that is when the corruption level is so high that it may get into the nearby values of other training data.)

## Neural Nets

DTREG -- Predictive Modeling Software. UFLDL Tutorial - Ufldl. From Ufldl Description: This tutorial will teach you the main ideas of Unsupervised Feature Learning and Deep Learning. By working through it, you will also get to implement several feature learning/deep learning algorithms, get to see them work for yourself, and learn how to apply/adapt these ideas to new problems. This tutorial assumes a basic knowledge of machine learning (specifically, familiarity with the ideas of supervised learning, logistic regression, gradient descent). If you are not familiar with these ideas, we suggest you go to this Machine Learning course and complete sections II, III, IV (up to Logistic Regression) first.

Sparse Autoencoder Vectorized implementation Preprocessing: PCA and Whitening Softmax Regression Self-Taught Learning and Unsupervised Feature Learning Building Deep Networks for Classification Linear Decoders with Autoencoders Working with Large Images Note: The sections above this line are stable. Miscellaneous Miscellaneous Topics Advanced Topics: Sparse Coding.

FastML. CS 229: Machine Learning (Course handouts) Lecture notes 1 (ps) (pdf) Supervised Learning, Discriminative Algorithms Lecture notes 2 (ps) (pdf) Generative Algorithms Lecture notes 3 (ps) (pdf) Support Vector Machines Lecture notes 4 (ps) (pdf) Learning Theory Lecture notes 5 (ps) (pdf) Regularization and Model Selection Lecture notes 6 (ps) (pdf) Online Learning and the Perceptron Algorithm. (optional reading) Lecture notes 7a (ps) (pdf) Unsupervised Learning, k-means clustering. Lecture notes 7b (ps) (pdf) Mixture of Gaussians Lecture notes 8 (ps) (pdf) The EM Algorithm Lecture notes 9 (ps) (pdf) Factor Analysis Lecture notes 10 (ps) (pdf) Principal Components Analysis Lecture notes 11 (ps) (pdf) Independent Components Analysis Lecture notes 12 (ps) (pdf) Reinforcement Learning and Control Supplemental notes 1 (pdf) Binary classification with +/-1 labels.

Supplemental notes 2 (pdf) Boosting algorithms and weak learning. CSE 473: Artificial Intelligence I -- Schedule of Lectures and Readings.

## Tools

Journal of Machine Learning Research Homepage. CVPR 2011 Tutorial on Human Activity Recognition. J. K. Aggarwal, Michael S. Ryoo, and Kris Kitani Date: June 20th Monday Human activity recognition is an important area of computer vision research and applications. We briefly review early history of human activity recognition, and discuss methodologies designed for recognition of activities of individual persons. This tutorial is partly based on the following survey paper: J. Syllabus. RL Book. Random forests - classification description. Contents Introduction Overview Features of random forests Remarks How Random Forests work The oob error estimate Variable importance Gini importance Interactions Proximities Scaling Prototypes Missing values for the training set Missing values for the test set Mislabeled cases Outliers Unsupervised learning Balancing prediction error Detecting novelties A case study - microarray data Classification mode Variable importance Using important variables Variable interactions Scaling the data Prototypes Outliers A case study - dna data Missing values in the training set Missing values in the test set Mislabeled cases Case Studies for unsupervised learning Clustering microarray data Clustering dna data Clustering glass data Clustering spectral data References Introduction This section gives a brief overview of random forests and some comments about the features of the method.

Overview We assume that the user knows about the construction of single classification trees. Remarks Gini importance. My Intro to Multiple Classification with Random Forests, Conditional Inference Trees, and Linear Discriminant Analysis. After the work I did for my last post , I wanted to practice doing multiple classification. I first thought of using the famous iris dataset, but felt that was a little boring.

Ideally, I wanted to look for a practice dataset where I could successfully classify data using both categorical and numeric predictors. Unfortunately it was tough for me to find such a dataset that was easy enough for me to understand. The dataset I use in this post comes from a textbook called Analyzing Categorical Data by Jeffrey S Simonoff, and lends itself to basically the same kind of analysis done by blogger “Wingfeet” in his post predicting authorship of Wheel of Time books . In this case, the dataset contains counts of stop words (function words in English, such as “as”, “also, “even”, etc.) in chapters, or scenes, from books or plays written by Jane Austen, Jack London (I’m not sure if “London” in the dataset might actually refer to another author), John Milton, and William Shakespeare.

Machine Learning and Data Mining: 11 Decision Trees.