5 of the Best Free and Open Source Data Mining Software The process of extracting patterns from data is called data mining. It is recognized as an essential tool by modern business since it is able to convert data into business intelligence thus giving an informational edge. At present, it is widely used in profiling practices, like surveillance, marketing, scientific discovery, and fraud detection. There are four kinds of tasks that are normally involve in Data mining: * Classification - the task of generalizing familiar structure to employ to new data* Clustering - the task of finding groups and structures in the data that are in some way or another the same, without using noted structures in the data.* Association rule learning - Looks for relationships between variables.* Regression - Aims to find a function that models the data with the slightest error. For those of you who are looking for some data mining tools, here are five of the best open-source data mining software that you could get for free:

Tutorial More than a HOWTO, this document is a HOW-DO-I use Python to do my image processing tasks. Image processing means many things to many people, so I will use a couple of examples from my research to illustrate. Introduction Python Library From OSGeo Wiki Motivation Several OSGeo software projects support Python. However, a global abstraction layer is lacking which would help to do "OSGeo Python programming". We think of well documented bindings to the various software projects which are handled as plugins (or whatever appropriate). Functionality

Orange (software) Orange is supported on various versions of Linux, Apple's Mac OS X, and Microsoft Windows. A screenshot of Orange. Scikit-learn -- a Python-based toolkit for machine learning Protovis Protovis composes custom views of data with simple marks such as bars and dots. Unlike low-level graphics libraries that quickly become tedious for visualization, Protovis defines marks through dynamic properties that encode data, allowing inheritance, scales and layouts to simplify construction. Protovis is free and open-source, provided under the BSD License. It uses JavaScript and SVG for web-native visualizations; no plugin required (though you will need a modern web browser)! Although programming experience is helpful, Protovis is mostly declarative and designed to be learned by example.

Data Mining Image: Detail of sliced visualization of thirty video samples of Downfall remixes. See actual visualization below. As part of my post doctoral research for The Department of Information Science and Media Studies at the University of Bergen, Norway, I am using cultural analytics techniques to analyze YouTube video remixes. My research is done in collaboration with the Software Studies Lab at the University of California, San Diego.

Python Bindings to the Point Cloud Library This is a small python binding to the pointcloud library. Currently, the following parts of the API are wrapped (all methods operate on PointXYZ) point types I/O and integration; saving and loading PCD filessegmentationSACsmoothingfiltering The code tries to follow the Point Cloud API, and also provides helper function for interacting with numpy. For example (from tests/ import pclp = pcl.PointCloud()p.from_array(np.array([[1,2,3],[3,4,5]], dtype=np.float32)))seg = self.p.make_segmenter()seg.set_model_type(pcl.SACMODEL_PLANE)seg.set_method_type(pcl.SAC_RANSAC)indices, model = seg.segment() PyQL : a new set of Python wrappers for QuantLib | Things and thoughts Hi folks, We are happy to announce the release of PyQL [1], a new set of Python wrappers for QuantLib. The project is available here : * URL: * License: BSD license. * Authors: Didrik Pinte, Enthought and Patrick Henaff, IAE Paris.

RapidMiner RapidMiner is a software platform developed by the company of the same name that provides an integrated environment for machine learning, data mining, text mining, predictive analytics and business analytics. It is used for business and industrial applications as well as for research, education, training, rapid prototyping, and application development and supports all steps of the data mining process including results visualization, validation and optimization.[1] RapidMiner is developed on a business source model which means only the previous version of the software is available under an OSI-certified open source license on Sourceforge.[2] A Starter Edition is available for free download, a Personal Edition is offered for US$999, a Professional Edition is $2,999 and pricing for the Enterprise Edition is available from the developer.[3] History[edit]

gource - software version control visualization Gource is a software version control visualization tool. See more of Gource in action on the Videos page. Introduction Eureqa Eureqa is a breakthrough technology that uncovers the intrinsic relationships hidden within complex data. Traditional machine learning techniques like neural networks and regression trees are capable tools for prediction, but become impractical when "solving the problem" involves understanding how you arrive at the answer. Eureqa uses a breakthrough machine learning technique called Symbolic Regression to unravel the intrinsic relationships in data and explain them as simple math. Using Symbolic Regression, Eureqa can create incredibly accurate predictions that are easily explained and shared with others. Over 35,000 people have relied on Eureqa to answer their most challenging questions, in industries ranging from Oil & Gas through Life Sciences and Big Box Retail.

