Machine Learning stats and optimization
Get flash to fully experience Pearltrees
The Weka workbench contains a collection of visualization tools and algorithms for data analysis and predictive modelling, together with graphical user interfaces for easy access to this functionality. The main strengths of Weka are that it is freely available under the GNU General Public License, very portable because it is fully implemented in the Java programming language and thus runs on almost any computing platform, contains a comprehensive collection of data preprocessing and modeling techniques, and is easy to use by a novice due to the graphical user interfaces it contains. Weka supports several standard data mining tasks, more specifically, data preprocessing, clustering, classification, regression, visualization, and feature selection.
WEKA is a comprehensive toolbench for machine learning and data mining. Its main strengths lie in the classification area, where all current ML approaches -- and quite a few older ones -- have been implemented within a clean, object-oriented Java class hierarchy. Regression, Association Rules and clustering algorithms have also been implemented. However, WEKA is also quite complex to handle -- amply demonstrated by many questions on the WEKA mailing list . Concerning the graphical user interface, the WEKA development group offers documentation for the Explorer and the Experimenter. However, there is little documentation on using the command line interface to WEKA, although it is essential for realistic learning tasks.
Here we will work with the Spambase dataset from HW02 , testing your implementations using Fold 1 as described in HW02. Precondition your data. Gradient descent will often perform much better on data that has been "normalized" so that the individual features are on a comparable scale. One commonly used normalization is the z-score , sometimes called the standard score. To compute the z-score corresonding to a feature value, one must first compute the mean and standard deviation of the feature. You should compute these values yourself, in code, but you can check your results against the Spambase page describing various simple statistics over those features.
I have written a few small tutorial notes on various topics that were of interest to me. You can get them below.
Summary This package contains the most recent version of various Matlab codes I released during my PhD work. I would recommend downloading and using this package if you plan on using more than one of my Matlab codes.