background preloader R and Data Mining R and Data Mining

Related:  R projectR ResourcesData Science and LA

The PValues Data Table The PValues data table contains a row for each pair of Y and X variables. If you specified a column for Group, the PValues data table contains a first column called Group. A row appears for each level of the Group column and for each pair of Y and X variables. The PValues data table also contains a table variable called Original Data that gives the name of the data table that was used for the analysis. If you specified a By variable, JMP creates a PValues table for each level of the By variable, and the Original Data variable gives the By variable and its level. When To Use Supervised And Unsupervised Data Mining » When To Use Supervised And Unsupervised Data Mining Data mining techniques come in two main forms: supervised (also known as predictive or directed) and unsupervised (also known as descriptive or undirected). Both categories encompass functions capable of finding different hidden patterns in large data sets.

Statistics with R Warning Here are the notes I took while discovering and using the statistical environment R. However, I do not claim any competence in the domains I tackle: I hope you will find those notes useful, but keep you eyes open -- errors and bad advice are still lurking in those pages... Should you want it, I have prepared a quick-and-dirty PDF version of this document.

Summer 2010 — R: ggplot2 Intro Contents Intro When it comes to producing graphics in R, there are basically three options for your average user. base graphics I've written up a pretty comprehensive description for use of base graphics here, and don't intend to extend beyond that. Base graphics are attractive, and flexible, but when it comes to creating more complex plots, like this one, the code to create it become more cumbersome. Blog Revolution - statistics by Terry M. Therneau Ph.D.Faculty, Mayo Clinic About a year ago there was a query about how to do "type 3" tests for a Cox model on the R help list, which someone wanted because SAS does it. The SAS addition looked suspicious to me, but as the author of the survival package I thought I should understand the issue more deeply. It took far longer than I expected but has been illuminating. First off, what exactly is this 'type 3' computation of which SAS so deeply enamored?

Books I like If you’re serious about learning, you probably need to read a book at some point. These days if you want to learn applied statistics and data science tools, you have amazing options in the form of blogs, Q&A sites, and massive open online courses and even videos on You Tube. Wikipedia is also an amazing reference resource on statistics. I use all those things to learn new techniques and understand old ones better, but I also love reading books. No, I’m not one of those sentimental people who go on about the texture of paper; while I do like the look and feel of a “real” book and love having them around, it’s a pain the way the take up space, and the big majority of books I buy these days are on an e-reader. So when I say I like “books”, it’s something the depth and the focus that comes with a good book going deeply into a matter.

Research Blog: Text summarization with TensorFlow Original Text: Alice and Bob took the train to visit the zoo. They saw a baby giraffe, a lion, and a flock of colorful tropical birds. Extractive Summary: Alice and Bob visit the zoo. saw a flock of birds. Above we extract the words bolded in the original text and concatenate them to form a summary. developers:projects:gsoc2012:ropensci Summary: Dynamic access and visualization of scientific data repositories Description: rOpenSci is a collaborative effort to develop R-based tools for facilitating Open Science. Projects in rOpenSci fall into two categories: those for working with the scientific literature, and those for working directly with the databases. Visit the active development hub of each project on github, where you can see and download source-code, see updates, and follow or join the developer discussions of issues. Most of the packages work through an API provided by the resource (database, paper archive) to access data and bring it within reach of R’s powerful manipulation.

Two meanings of priors, part I: The plausibility of models by Angelika Stefan & Felix Schönbrodt When reading about Bayesian statistics, you regularly come across terms like “objective priors“, “prior odds”, “prior distribution”, and “normal prior”. However, it may not be intuitively clear that the meaning of “prior” differs in these terms. In fact, there are two meanings of “prior” in the context of Bayesian statistics: (a) prior plausibilities of models, and (b) the quantification of uncertainty about model parameters. As this often leads to confusion for novices in Bayesian statistics, we want to explain these two meanings of priors in the next two blog posts*. The current blog post covers the the first meaning of priors.

Social Network Analysis Brief Description: "Social network analysis is the mapping and measuring of relationships and flows between people, groups, organisations, computers or other information/knowledge processing entities." (Valdis Krebs, 2002). RStudio Server Amazon Machine Image (AMI) - Louis Aslett Current AMI Quick Reference (27nd Jun 2015)Amazon instance type reference Click to launch through AWS web interface: What’s new recently? Easy Dropbox setup to make syncing files on/off server easy, including selective folder sync. Preinstalled RStudioAMI R package for server control. Model visualisation. This page lists my published software for model visualisation. This work forms the basis for the third chapter of my thesis. classifly: Explore classification boundaries in high dimensions.

Related:  curosos online