background preloader

R and Data Mining

R and Data Mining

Statistics with R Warning Here are the notes I took while discovering and using the statistical environment R. However, I do not claim any competence in the domains I tackle: I hope you will find those notes useful, but keep you eyes open -- errors and bad advice are still lurking in those pages... Should you want it, I have prepared a quick-and-dirty PDF version of this document. The old, French version is still available, in HTML or as a single file. You may also want all the code in this document. 1. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.

Summer 2010 — R: ggplot2 Intro Contents Intro When it comes to producing graphics in R, there are basically three options for your average user. base graphics I've written up a pretty comprehensive description for use of base graphics here, and don't intend to extend beyond that. Both and make creating plots of multivariate data easier. The website for ggplot2 is here: Basics is meant to be an implementation of the Grammar of Graphics, hence gg-plot. Plots convey information through various aspects of their aesthetics. x position y position size of elements shape of elements color of elements The elements in a plot are geometric shapes, like points lines line segments bars text Some of these geometries have their own particular aesthetics. points point shape point size lines line type line weight bars y minimum y maximum fill color outline color text label value The values represented in the plot are the product of various statistics. Layer by Layer Displaying Statistics

MSISS ST4003 : Data Mining - Louis Aslett MSISS ST4003 : Data Mining 2010-11 < Back to homepage 2009-2010 ST4003 Data Mining lab material This is the labs page for the fourth year undergraduate course in data mining for MSISS and mathematics students, lectured by Dr Myra O'Reagan. Useful Links Introduction to R R reference card RSeek, Google powered search engine of R resources Labs Lab 1 - Examining Data Lab 2 - A Basic Tree Classifier Lab 3 - More Trees Lab 4 - More Programming Concepts and Model Evaluation Lab 5 - Introduction to Neural Networks Lab 6 - Random Forests Lab 7 - Introduction to Support Vector Machines Data Sets Telecom Customer Churn Data (small version) Titanic Survivor Data Cheese Taste Data ESL SVM simulated data

Physics The R Project Gapminder: Unveiling the beauty of statistics for a fact based world view. Learning R 5 of the Best Free and Open Source Data Mining Software The process of extracting patterns from data is called data mining. It is recognized as an essential tool by modern business since it is able to convert data into business intelligence thus giving an informational edge. At present, it is widely used in profiling practices, like surveillance, marketing, scientific discovery, and fraud detection. There are four kinds of tasks that are normally involve in Data mining: * Classification - the task of generalizing familiar structure to employ to new data* Clustering - the task of finding groups and structures in the data that are in some way or another the same, without using noted structures in the data.* Association rule learning - Looks for relationships between variables.* Regression - Aims to find a function that models the data with the slightest error. For those of you who are looking for some data mining tools, here are five of the best open-source data mining software that you could get for free: Orange RapidMiner Weka JHepWork

R reference card Curriculum Vitae (mis à jour le: 25/01/2011) Pierre Lafaye de Micheaux Né le 27 mars 1973 à Paris. Marié avec deux enfants. Nationalités : canadienne, française, suisse. Séjours (courts) dans d’autres laboratoires de recherche universitaire Conférences invitées1 Sydney, Australie (2014). Mini-cours et tutoriels Invitation de chercheurs Bourses et subventions Distinction académique Fonctions électives Responsabilités administratives Université de Montréal, Département de Mathématiques et de Statistique Université Pierre Mendès France, Département STID de l’IUT2 Arbitrage d’articles de revues Bernoulli, Canadian Journal of Statistics, Cognitive Computation, Computational Statistics, Computational Statistics and Data Analysis, Frontiers Neuroscience, Journal of Multivariate Analysis, Journal of Statistical Planning and Inference, Journal of Statistical Software, Mathematical Reviews, Medical Image Computing and Computer Assisted Intervention (MICCAI) Proceedings, Statistical Methodology. Comités éditoriaux Thèmes de recherche privilégiés 2014?

developers:projects:gsoc2012:ropensci Summary: Dynamic access and visualization of scientific data repositories Description: rOpenSci is a collaborative effort to develop R-based tools for facilitating Open Science. Projects in rOpenSci fall into two categories: those for working with the scientific literature, and those for working directly with the databases. Visit the active development hub of each project on github, where you can see and download source-code, see updates, and follow or join the developer discussions of issues. Most of the packages work through an API provided by the resource (database, paper archive) to access data and bring it within reach of R’s powerful manipulation. See a complete list of our R packages currently in development. The student could choose to work on a package for a particular data repository of interest, or develop tools for visualization and exploration that could function across the existing packages.

Apache Mahout: Scalable machine learning and data mining

Related:  R-Programming