background preloader

All entries

All entries
About: ELKI is a framework for implementing data-mining algorithms with support for index structures, that includes a wide variety of clustering and outlier detection methods. Changes: Additions and Improvements from ELKI 0.5.5: Algorithms Clustering: Hierarchical Clustering - the slower naive variants were added, and the code was refactored Partition extraction from hierarchical clusterings - different linkage strategies (e.g.

http://mloss.org/software/

Related:  Machine LearningData science tool & case resources

Probably Approximately Correct — a Formal Theory of Learning In tackling machine learning (and computer science in general) we face some deep philosophical questions. Questions like, “What does it mean to learn?” and, “Can a computer learn?” and, “How do you define simplicity?” Pattern Pattern is a web mining module for the Python programming language. It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization. The module is free, well-document and bundled with 50+ examples and 350+ unit tests. Download Installation

machine learning in Python — scikit-learn v0.9 documentation "We use scikit-learn to support leading-edge basic research [...]" "I think it's the most well-designed ML package I've seen so far." "scikit-learn's ease-of-use, performance and overall variety of algorithms implemented has proved invaluable [...]." Mining of Massive Datasets The book has a new Web site www.mmds.org. This page will no longer be maintained. Your browser should be automatically redirected to the new site in 10 seconds. Occam’s Razor and PAC-learning So far our discussion of learning theory has been seeing the definition of PAC-learning, tinkering with it, and seeing simple examples of learnable concept classes. We’ve said that our real interest is in proving big theorems about what big classes of problems can and can’t be learned. One major tool for doing this with PAC is the concept of VC-dimension, but to set the stage we’re going to prove a simpler theorem that gives a nice picture of PAC-learning when your hypothesis class is small. In short, the theorem we’ll prove says that if you have a finite set of hypotheses to work with, and you can always find a hypothesis that’s consistent with the data you’ve seen, then you can learn efficiently. It’s obvious, but we want to quantify exactly how much data you need to ensure low error.

The IATI Standard Alpha version Please note that the Datastore is currently in its first release. Therefore, data queries may sometimes result in unexpected results. We appreciate your understanding. Projects matching python. About: BayesOpt is an efficient, C++ implementation of the Bayesian optimization methodology for nonlinear-optimization, experimental design and stochastic bandits. In the literature it is also called Sequential Kriging Optimization (SKO) or Efficient Global Optimization (EGO). There are also interfaces for C, Matlab/Octave and Python. Changes: oluolu - Project Hosting on Google Code Oluolu is a open source query log mining tool which works on Hadoop. This tool provides resources to add new features to search engines. Concretely Oluolu supports automatic dictionary creation such as spelling correction, context queries or frequent query n-grams from query log data. The dictionaries are applied to search engines to add features such as 'did you mean' or 'related keyword suggestion' service in search engines.

Machine Learning in 7 Pictures Basic machine learning concepts of Bias vs Variance Tradeoff, Avoiding overfitting, Bayesian inference and Occam razor, Feature combination, Non-linear basis functions, and more - explained via pictures. By Deniz Yuret, Feb 2014. I find myself coming back to the same few pictures when explaining basic machine learning concepts. Below is a list I find most illuminating. 1. Bias vs Variance tradeoff - Test and training error: Why lower training error is not always a good thing: ESL Figure 2.11.

Related: