background preloader

Machine Learning Cheat Sheet (for scikit-learn)

Machine Learning Cheat Sheet (for scikit-learn)

jakevdp/sklearn_scipy2013 enthought/pyql GUESS: The Graph Exploration System Visualizing the stock market structure This example employs several unsupervised learning techniques to extract the stock market structure from variations in historical quotes. The quantity that we use is the daily variation in quote price: quotes that are linked tend to cofluctuate during a day. Learning a graph structure We use sparse inverse covariance estimation to find which quotes are correlated conditionally on the others. Specifically, sparse inverse covariance gives us a graph, that is a list of connection. For each symbol, the symbols that it is connected too are those useful to explain its fluctuations. Clustering We use clustering to group together quotes that behave similarly. Note that this gives us a different indication than the graph, as the graph reflects conditional relations between variables, while the clustering reflects marginal properties: variables clustered together can be considered as having a similar impact at the level of the full stock market. Embedding in 2D space Visualization Script output:

jakevdp/sklearn_pycon2013 a free/open-source library for quantitative finance GitHub - airbnb/caravel: Caravel is a data exploration platform designed to be visual, intuitive, and interactive python - Correcting matplotlib colorbar ticks Multi-armed bandit Resource problem in machine learning In probability theory and machine learning, the multi-armed bandit problem (sometimes called the K-[1] or N-armed bandit problem[2]) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice.[3][4] This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma. In the problem, each machine provides a random reward from a probability distribution specific to that machine, that is not known a-priori. Herbert Robbins in 1952, realizing the importance of the problem, constructed convergent population selection strategies in "some aspects of the sequential design of experiments".[6] A theorem, the Gittins index, first published by John C. .

StatsModels: Statistics in Python — statsmodels 0.6.0.dev-455510c documentation statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are avalable for each estimator. The results are tested against existing statistical packages to ensure that they are correct. The package is released under the open source Modified BSD (3-clause) license. Since version 0.5.0 of statsmodels, you can use R-style formulas together with pandas data frames to fit your models. import numpy as npimport statsmodels.api as smimport statsmodels.formula.api as smf # Load datadat = sm.datasets.get_rdataset("Guerry", "HistData").data # Fit regression model (using the natural log of one of the regressors)results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit() # Inspect the resultsprint results.summary() You can also use numpy arrays instead of formulas:

Related: