R and Data Mining

MSISS ST4003 : Data Mining - Louis Aslett MSISS ST4003 : Data Mining 2010-11 < Back to homepage 2009-2010 ST4003 Data Mining lab material This is the labs page for the fourth year undergraduate course in data mining for MSISS and mathematics students, lectured by Dr Myra O'Reagan. Useful Links Introduction to R R reference card RSeek, Google powered search engine of R resources Labs Lab 1 - Examining Data Lab 2 - A Basic Tree Classifier Lab 3 - More Trees Lab 4 - More Programming Concepts and Model Evaluation Lab 5 - Introduction to Neural Networks Lab 6 - Random Forests Lab 7 - Introduction to Support Vector Machines Data Sets Telecom Customer Churn Data (small version) Titanic Survivor Data Cheese Taste Data ESL SVM simulated data

Visualizing Tables with plot.table Home > R > Visualizing Tables with plot.table plot.table function in the Systematic Investor Toolbox is a flexible table drawing routine. plot.table has a simple interface and takes following parameters: plot.matrix – matrix with data you want to plotsmain – text to draw in (top, left) cell; default value is blank stringhighlight – Either TRUE/FALSE to indicate if you want to color each cell based on its numeric value Or a matrix with colors for each cellcolorbar – TRUE/FALSE flag to indicate if you want to draw colorbar Here is a few examples how you can use plot.table function to create summary reports. First, let’s load Systematic Investor Toolbox: To create basic plot.table: To create plot.table with colorbar: Next, I want to show a more practical example of plot.table function. I will show more examples of plot.table in the future posts. To view the complete source code for this example, please have a look at the plot.table.test() function in plot.table.r at github. Like this:

5 of the Best Free and Open Source Data Mining Software The process of extracting patterns from data is called data mining. It is recognized as an essential tool by modern business since it is able to convert data into business intelligence thus giving an informational edge. At present, it is widely used in profiling practices, like surveillance, marketing, scientific discovery, and fraud detection. There are four kinds of tasks that are normally involve in Data mining: * Classification - the task of generalizing familiar structure to employ to new data* Clustering - the task of finding groups and structures in the data that are in some way or another the same, without using noted structures in the data.* Association rule learning - Looks for relationships between variables.* Regression - Aims to find a function that models the data with the slightest error. For those of you who are looking for some data mining tools, here are five of the best open-source data mining software that you could get for free: Orange RapidMiner Weka JHepWork

R library(stringr) [1] "1 Introduction" [3] "Climate projections of the Intergovernmental Panel on Climate Change (IPCC) forecast a general increase of seasonal temperatures in the present century across the temperate zone, aggravated by decreasing amounts of summer rainfall in certain regions at lower latitudes (Christensen et al. 2007). [5] "In this study, we aim to (1) identify the limiting macroclimatic factors and to (2) predict the future boundaries of beech (Fagus sylvatica L.) and sessile oak (Quercus petraea (Mattuschka) Liebl.) forests in a region highly vulnerable to climatic extremes. [7] "Beech and sessile oak forests of Hungary are to a large extent “trailing edge” populations (Hampe and Petit 2005), which should be preferably modelled using specific modelling strategies (Thuiller et al. 2008). extr1 <- unlist(str_extract_all(txt, pattern = "\\(.*? extr2 <- extr1[grep("[0-9]{4}", extr1)] (str_extract(extr2, "[A-Z].*[0-9]")) [1] "Christensen et al. 2007" [2] "Fischlin et al. 2007"

Polygon Overlay Analysis Download data and R Code for this example Project Requirement: Polygon Overlay operations determine the spatial coincidence (if any) of two polygon data layers, or between polygon and point layer, usually creating a new data layer in the process. Three useful (and widely used) polygon overlay operations are: Intersection (logical AND): The common or shared area between two overlapping polygons. Union (logical OR): The combined areas of two possibly overlapping polygons. Point-in-Polygon (logical AND): Between a point and polygon layer, the subset of points located within the polygon boundary. Here, we demonstrate overlay operations using a collection of point and polygon species range data sets collected in South America, and methods from the PBSmapping package. 1) What is the area of each Species Range? Input Data / Format: Point File: Mammalian Species Sightings (ESRI Point Shape File) from NatureServe data set. Base Map: DIVA-GIS Global Administrative Boundaries. Workflow: Discussion: