machinelearning
< Words&text
< Python
< Programming
< Systems & software
< sean.true
Videos This video presentation was shown at the ICML Workshop for Open Source ML Software on June 25, 2010. It explains some of the features and algorithms of PyBrain and gives tutorials on how to install and use PyBrain for different tasks. This video shows some of the learning features in PyBrain in action. Algorithms
1. Introduction LASVM is an approximate SVM solver that uses online approximation. It reaches accuracies similar to that of a real SVM after performing a single sequential pass through the training examples. Further benefits can be achieved using selective sampling techniques to choose which example should be considered next. As show in the graph, LASVM requires considerably less memory than a regular SVM solver.
The scaling of serial algorithms cannot rely on the improvement of CPUs anymore. The performance of classical Support Vector Machine (SVM) implementations has reached its limit and the arrival of the multi core era requires these algorithms to adapt to a new parallel scenario. Graphics Processing Units (GPU) have arisen as high performance platforms to implement data parallel algorithms. In this project, it is described how a naïve implementation of a multiclass classifier based on SVMs can map its inherent degrees of parallelism to the GPU programming model and efficiently use its computational throughput. Empirical results show that the training and classification time of the algorithm can be reduced an order of magnitude compared to a classical solver, LIBSVM, while guaranteeing the same accuracy.
BibTeX @ARTICLE{Shilton05incrementaltraining, author = {A. Shilton and M. Palaniswami and Senior Member and D.
The problem of identifying approximately duplicate records between databases is known, among others, as duplicate detection or record linkage. To this end, typically either rules or a weighted aggregation of distances between the individual attributes of potential duplicates is used. However, choosing the appropriate rules, distance functions, weights, and thresholds requires deep understanding of the application domain or a good representative training set for supervised learning approaches.
This paper presents a novel approach for detecting duplicate records in the context of digital gazetteers, using state-of-the-art machine learning techniques. It reports a thorough evaluation of alternative machine learning approaches designed for the task of classifying pairs of gazetteer records as either duplicates or not, built by using support vector machines or alternating decision trees with different combinations of similarity scores for the feature vectors. Experimental results show that using feature vectors that combine multiple similarity scores, derived from place names, semantic relationships, place types and geospatial footprints, leads to an increase in accuracy.
Milk is a machine learning toolkit in Python. Its focus is on supervised classification with several classifiers available: SVMs (based on libsvm), k-NN, random forests, decision trees. It also performs feature selection.
Package Index > tfclassify > 0.1.2 Not Logged In tfclassify 0.1.2 TFClassify is a document classification algorithm implementation.
em is a package which enables to create Gaussian Mixture Models (diagonal and full covariance matrices supported), to sample them, and to estimate them from data using Expectation Maximization algorithm. It can also draw confidence ellipsoides for multivariate models, and compute the Bayesian Information Criterion to assess the number of clusters in the data. In a near future, I hope to add so-called online EM (ie recursive EM) and variational Bayes implementation. em is implemented in python, and uses the excellent numpy and scipy packages.
Machine Learning Group at National Taiwan University Contributors We recently released LibShortText , a library for short-text classification and analysis. It's built upon LIBLINEAR. Version 1.93 released on January 27, 2013. We fixed some minor issues in this version. An experimental version using 64-bit int is in LIBSVM tools .
Package Index > pcSVM > pre 1.0 Not Logged In pcSVM pre 1.0 pcSVM is a framework for support vector machines pcSVM is a framwork for support vector machines. Support Vector Machines is a new generation of learning algorithms based on recent advances in statistical learning theory, and applied to large number of real-world applications, such as text categorization, hand-written character recognition.
LIBSVM -- A Library for Support Vector Machines Chih-Chung Chang and Chih-Jen Lin Version 3.16 released on January 27, 2013.
em is a package which enables to create Gaussian Mixture Models (diagonal and full covariance matrices supported), to sample them, and to estimate them from data using Expectation Maximization algorithm. It can also draw confidence ellipsoides for multivariate models, and compute the Bayesian Information Criterion to assess the number of clusters in the data. In a near future, I hope to add so-called online EM (ie recursive EM) and variational Bayes implementation. em is implemented in python, and uses the excellent numpy and scipy packages. Numpy is a python packages which gives python a fast multi-dimensional array capabilities (ala matlab and the likes); scipy leverages numpy to build common scientific features for signal processing, linear algebra, statistics, etc...