background preloader

Revolution Analytics - Commercial Software & Support for the R Statistics Language

Revolution Analytics - Commercial Software & Support for the R Statistics Language
Related:  R R info for SAS, SPSS, and Stata Users r - What is the difference between gc() and rm() Product Review – Revolution R 5.0 So I got the email from Revolution R. Version 5.0 is ready for download, and unlike half hearted attempts by many software companies they make it easy for the academics and researchers to get their free copy. Free as in speech and free as in beer. Some thoughts- 1) R ‘s memory problem is now an issue of marketing and branding. The primary advantage 64-bit architectures bring to R is an increase in the amount of memory available to a given R process.The first benefit of that increase is an increase in the size of data objects you can create. 2) The User Interface is best shown as below or at -(but I am still hoping for the GUI ,Revolution Analytics promised us for Christmas) 3) The partnership with Microsoft HPC is quite awesome given Microsoft’s track record in enterprise software penetration but I am also interested in knowing more about the Oracle version of R and what it will do there.

SparkR by amplab-extras SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. SparkR exposes the Spark API through the RDD class and allows users to interactively run jobs from the R shell on a cluster. NOTE: As of April 2015, SparkR has been officially merged into Apache Spark and is shipping in an upcoming release (1.4) due early summer 2015. You can contribute and follow SparkR developments on the Apache Spark mailing lists and issue tracker. NOTE: The API from the upcoming Spark release (1.4) will not have the same API as described here. Features SparkR exposes the RDD API of Spark as distributed lists in R. sc <- sparkR.init("local") lines <- textFile(sc, " wordsPerLine <- lapply(lines, function(line) { length(unlist(strsplit(line, " "))) }) In addition to lapply, SparkR also allows closures to be applied on every partition using lapplyWithPartition. SparkR automatically serializes the necessary variables to execute a function on the cluster. . . . .

PSPP GNU PSPP is a program for statistical analysis of sampled data. It is a Free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions. The most important of these exceptions are, that there are no “time bombs”; your copy of PSPP will not “expire” or deliberately stop working in the future. Neither are there any artificial limits on the number of cases or variables which you can use. There are no additional packages to purchase in order to get “advanced” functions; all functionality that PSPP currently supports is in the core package. PSPP is a stable and reliable application. A brief list of some of the PSPP's features follows below. Support for over 1 billion cases. PSPP is particularly aimed at statisticians, social scientists and students requiring fast convenient analysis of sampled data. Downloading PSPP There are some additional ways you can download or otherwise obtain PSPP. Documentation Further information Getting involved Test releases

Example .  An example of nested downloads using RCurl. This example uses RCurl to download an HTML document and then collect the name of each link within that document. The purpose of the example is to illustrate how we can combine the RCurl package to download a document and use this directly within the XML (or HTML) parser without having the entire content of the document in memory. We start the download and pass a function to the xmlEventParse() function for processing. As that XML parser needs more input, it fetches more data from the HTTP response stream. This is useful for handling very large data that is returned from Web queries. To do this, we need to use the multi interface for libcurl in order to have asynchronous or non-blocking downloading of the document. The remaining part is how we combine these pieces with RCurl and the XML packages to do the parsing in this asynchronous, interleaved manner. The steps in the code are as explained as follows. perform = FALSE . library(RCurl) library(XML)

Step up your R capabilities with new tools for increased productivity I guess a lot of us actually use many tools to accomplish various things in their everyday life. There is the (not that uncommon) case where you have to build something that others will use in their everyday business life to get insights, information and/or take decisions. The basic implementation scenario here would be to build an excel workbook where you will feed the data and have a overview sheet, named Dashboard…If things are on your side you could set-up a connection to a database (an existing one or one you will create for the data in discussion) and pull data from there. You can build powerful and visually elegant things using this approach. A cool resource to generate tears of joy among colleagues is OK, we all love R. But what about interactive results? Unfortunately you will soon realize that building a highly interactive dashboard has limited added value for complex questions, like the ones that predictive analytics would bomb at your inbox. R.

Testing Packages with Experimental R Devel Build for Windows · rwinlib/r-base Wiki Rtools 3.3 is in the process of being updated to use a new compiler toolchain produced by Jeroen Ooms based on GCC 4.9.3 and Mingw-W64 V3. The upcoming R 3.3 release is planning on adopting this new toolchain. Package authors using compiled code should test their packages with the new toolchain to ensure compatability. This document includes instructions for downloading the requisite versions of R, Rtools, and (optionally) RStudio to perform this testing. Step 1: Install R-devel-experimental for Windows There is an experimental build of R-devel that uses the new Rtools 3.3 toolchain available on CRAN. Step 2: Install the latest Rtools 3.3 There is an updated build of Rtools 3.3 that includes the new toolchain available on CRAN. Note that to be compatible with the instructions below you should choose to install Rtools 3.3 to the default location (c:\Rtools). Rterm / RGUI

Learn R Toolkit | Climate Charts & Graphs As a former Excel chart user, I want to help current Excel users make the transition to more advanced charting R with as little difficulty as possible. This post introduces my LearnR Toolkit to help Excel users move up to R in a systematic, step by step fashion. Introduction As an Excel chart user, I wanted to produce panel charts like this: After using VBA to build Excel panel charts (link), I knew I had to use a more advanced charting tool to continue my global warming, citizen climate science studies. LearnR Toolkit I’ve put together a series of instructional PowerPoint, video modules with supporting R scripts and data files to help Excel users learn R. Here’s a list of the modules with links to the Zip and PPT files. When viewing a PPT file, be sure to put PowerPoint in slide show mode to be able to see the embedded videos. Installing Zip Files The full Learn R Toolkit includes an Introduction and 5 modules, with 80 files. You will need to Extract the Zip file to your hard drive.

r - How do I scrape multiple pages with XML and ReadHTMLTable Integration of R, RStudio and Hadoop in a VirtualBox Cloudera Demo VM on Mac OS X Motivation I was inspired by Revolution’s blog and step-by-step tutorial from Jeffrey Breen on the set up of a local virtual instance of Hadoop with R. However, this tutorial describes the implementation using VMware’s application. One downside to using VMware is that it’s not free. I know most of the people including me like to hear the words open-source and free, especially when it is a smooth ride. Description Hadoop Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. R and Hadoop The most common way to link R and Hadoop is to use HDFS (potentially managed by Hive or HBase) as the long-term store for all data, and use MapReduce jobs (potentially submitted from Hive, Pig, or Oozie) to encode, enrich, and sample data sets from HDFS into R. Cloudera Hadoop Demo VM CDH is Cloudera’s 100% open source distribution of Hadoop and related projects, built specifically to meet enterprise demands. Steps…………… - development environment for R package developers