background preloader

Data Mining Tools

Facebook Twitter

Top 10 data mining algorithms in plain English. Today, I’m going to explain in plain English the top 10 most influential data mining algorithms as voted on by 3 separate panels in this survey paper.

Top 10 data mining algorithms in plain English

Once you know what they are, how they work, what they do and where you can find them, my hope is you’ll have this blog post as a springboard to learn even more about data mining. What are we waiting for? Let’s get started! Update 16-May-2015: Thanks to Yuval Merhav and Oliver Keyes for their suggestions which I’ve incorporated into the post. Update 28-May-2015: Thanks to Dan Steinberg (yes, the CART expert!) What does it do? Wait, what’s a classifier? What’s an example of this? Now: Given these attributes, we want to predict whether the patient will get cancer. And here’s the deal: Using a set of patient attributes and the patient’s corresponding class, C4.5 constructs a decision tree that can predict the class for new patients based on their attributes.

Cool, so what’s a decision tree? The bottomline is: Opennn 2.2. Artelnics. OpenNN is an open source class library written in C++ programming language which implements neural networks, a main area of deep learning research.

Artelnics

It is intended for advanced users, with high C++ and machine learning skills. The library implements any number of layers of non-linear processing units for supervised learning. This deep architecture allows the design of neural networks with universal approximation properties. On the other hand, it allows multiprocessing programming by means of OpenMP, in order to increase the computer performance. OpenNN contains data mining algorithms as a bundle of classes. OpenNN screenshot The package comes with unit testing, many examples and extensive documentation. Commercial support Artelnics offers commercial support for the development of applications using OpenNN.

Create your own predicitive analytics solution using the most powerful deep learning technology: Tutorial · JohnLangford/vowpal_wabbit Wiki. We did a new version 7.8 tutorial which includes: New tutorials associated with Version 7.4.

Tutorial · JohnLangford/vowpal_wabbit Wiki

This includes: A new version 7.0 tutorial is available. It covers the basics and most common options, how to use VW and the data format for different types of problems, such as Binary Classification, Regression, Multiclass Classification, Cost-Sensitive Multiclass Classification, "Offline" Contextual Bandit and Sequence Predictions.

Many more advanced options in terms of flags and the data format are not covered. The version 6.1 tutorial and various pieces below covers some topics not covered in the version 7 tutorial, as most of these haven't change in the latest version: Older stuff. Tutorial · JohnLangford/vowpal_wabbit Wiki. Vowpal Wabbit (Fast Learning) Vowpal Wabbit (Fast Learning) This is a project started at Yahoo!

Vowpal Wabbit (Fast Learning)

Research and continuing at Microsoft Research to design a fast, scalable, useful learning algorithm. VW is the essence of speed in machine learning, able to learn from terafeature datasets with ease. Via parallel learning, it can exceed the throughput of any single machine network interface when doing linear learning, a first amongst learning algorithms. We primarily use the wiki off github. DownloadCommand lineTutorialExamplesInput ValidatorDiscussionsMailing list. Getting Started with RapidMiner Studio - RapidMiner Documentation. So here you are, a business analyst or developer or jack-of-all-trades.

Getting Started with RapidMiner Studio - RapidMiner Documentation

Things are going well, but it's time to fine-tune your business. You've probably collected a myriad of data points about your customer base and would like to determine which customers will remain loyal and which ones will likely churn. Attracting new customers comes at a price, so you'd really like to maintain your current customer base. How do you predict who may leave? What target audiences do you spend marketing dollars on to entice them to stay? Using RapidMiner's modern enterprise platform, you can quickly and easily create analytic workflows called processes to determine who to target. This series of five Getting Started tutorials will help familiarize you with some basic features and functionality of RapidMiner Studio.

Below are some additional resources available to help you get up and running quickly with RapidMiner Studio. Comparing Editions - RapidMiner. Neo4j Resources. Orange Data Mining. Weka 3 - Data Mining with Open Source Machine Learning Software in Java. Weka is a collection of machine learning algorithms for data mining tasks.

Weka 3 - Data Mining with Open Source Machine Learning Software in Java

The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Found only on the islands of New Zealand, the Weka is a flightless bird with an inquisitive nature. The name is pronounced like this, and the bird sounds like this. Weka is open source software issued under the GNU General Public License. Yes, it is possible to apply Weka to big data!