background preloader

R and Data Mining

R and Data Mining

Related:  R ResourcesData Science and LA

The Meeting Point Locator Hi Hillary, It’s Donald, would you like to have a beer with me in La Cabra Brewing, in Berwyn, Pensilvania? (Hypothetical utilization of The Meeting Point Locator) Finding a place to have a drink with someone may become a difficult task. It is quite common that one of them does not want to move to the other’s territory. I am sure you have faced to this situation many times.

When To Use Supervised And Unsupervised Data Mining » When To Use Supervised And Unsupervised Data Mining Data mining techniques come in two main forms: supervised (also known as predictive or directed) and unsupervised (also known as descriptive or undirected). Both categories encompass functions capable of finding different hidden patterns in large data sets.

Blog Revolution - statistics by Terry M. Therneau Ph.D.Faculty, Mayo Clinic About a year ago there was a query about how to do "type 3" tests for a Cox model on the R help list, which someone wanted because SAS does it. The SAS addition looked suspicious to me, but as the author of the survival package I thought I should understand the issue more deeply. It took far longer than I expected but has been illuminating. First off, what exactly is this 'type 3' computation of which SAS so deeply enamored? Statistics with R Warning Here are the notes I took while discovering and using the statistical environment R. However, I do not claim any competence in the domains I tackle: I hope you will find those notes useful, but keep you eyes open -- errors and bad advice are still lurking in those pages... Should you want it, I have prepared a quick-and-dirty PDF version of this document.

Summer 2010 — R: ggplot2 Intro Contents Intro When it comes to producing graphics in R, there are basically three options for your average user. base graphics I've written up a pretty comprehensive description for use of base graphics here, and don't intend to extend beyond that. Base graphics are attractive, and flexible, but when it comes to creating more complex plots, like this one, the code to create it become more cumbersome. ggedit 0.0.2: a GUI for advanced editing of ggplot2 objects Guest post by Jonathan Sidi, Metrum Research Group Last week the updated version of ggedit was presented in RStudio::conf2017. First, a BIG thank you to the whole RStudio team for a great conference and being so awesome to answer the insane amount of questions I had (sorry!). For a quick intro to the package see the previous post. To install the package: devtools::install_github("metrumresearchgroup/ggedit",subdir="ggedit")

Research Blog: Text summarization with TensorFlow Original Text: Alice and Bob took the train to visit the zoo. They saw a baby giraffe, a lion, and a flock of colorful tropical birds. Extractive Summary: Alice and Bob visit the zoo. saw a flock of birds. Above we extract the words bolded in the original text and concatenate them to form a summary. developers:projects:gsoc2012:ropensci Summary: Dynamic access and visualization of scientific data repositories Description: rOpenSci is a collaborative effort to develop R-based tools for facilitating Open Science. Projects in rOpenSci fall into two categories: those for working with the scientific literature, and those for working directly with the databases. Visit the active development hub of each project on github, where you can see and download source-code, see updates, and follow or join the developer discussions of issues. Most of the packages work through an API provided by the resource (database, paper archive) to access data and bring it within reach of R’s powerful manipulation.

ggedit – interactive ggplot aesthetic and theme editor Guest post by Jonathan Sidi, Metrum Research Group ggplot2 has become the standard of plotting in R for many users. New users, however, may find the learning curve steep at first, and more experienced users may find it challenging to keep track of all the options (especially in the theme!). ggedit is a package that helps users bridge the gap between making a plot and getting all of those pesky plot aesthetics just right, all while keeping everything portable for further research and collaboration.

Notes 7a Chi-square Goodness-of-fit Chi-Square ( 2 ) Goodness-of-Fit The Z-test, t-test, and Pearson's r all assume that at least one of the variables (usually the dependent variable) is measured on the interval or ratio scale. When variables of interest are nominal or categorical, these statistical tests could be inappropriate and produce misleading information. A chi-square statistic, however, provide a more appropriate assessment for such data. Two types of chi-square tests are covered in this course: goodness-of-fit and the contingency table chisquare (or test of association). Goodness-of-fit tests are designed to assess the distribution of one variable, and contingency table chi-square tests are appropriate for assessing the association between two categorical variables.

RStudio Server Amazon Machine Image (AMI) - Louis Aslett Current AMI Quick Reference (27nd Jun 2015)Amazon instance type reference Click to launch through AWS web interface: What’s new recently? Easy Dropbox setup to make syncing files on/off server easy, including selective folder sync. Preinstalled RStudioAMI R package for server control.

Model visualisation. This page lists my published software for model visualisation. This work forms the basis for the third chapter of my thesis. classifly: Explore classification boundaries in high dimensions.

Related:  curosos online