background preloader

Big Data Predictive Analytics with Revolution R Enterprise - Revolution Analytics

Big Data Predictive Analytics with Revolution R Enterprise - Revolution Analytics
Related:  R

SparkR by amplab-extras SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. SparkR exposes the Spark API through the RDD class and allows users to interactively run jobs from the R shell on a cluster. NOTE: As of April 2015, SparkR has been officially merged into Apache Spark and is shipping in an upcoming release (1.4) due early summer 2015. NOTE: The API from the upcoming Spark release (1.4) will not have the same API as described here. Features SparkR exposes the RDD API of Spark as distributed lists in R. sc <- sparkR.init("local") lines <- textFile(sc, " wordsPerLine <- lapply(lines, function(line) { length(unlist(strsplit(line, " "))) }) In addition to lapply, SparkR also allows closures to be applied on every partition using lapplyWithPartition. SparkR automatically serializes the necessary variables to execute a function on the cluster. SparkR also allows easy use of existing R packages inside closures. Installing SparkR . Running sparkR . . .

Gephi, an open source graph visualization and manipulation software Impatient R Translations français: Translated by Kate Bondareva. Serbo-Croatian: Translated by Jovana Milutinovich from Geeks Education. Preface This is a tutorial (previously known as “Some hints for the R beginner”) for beginning to learn the R programming language. You are probably impatient to learn R — most people are. This page has several sections, they can be put into the four categories: General, Objects, Actions, Help. General Introduction Blank screen syndrome Misconceptions because of a previous language Helpful computer environments R vocabulary Epilogue Objects Key objects Reading data into R Seeing objects Saving objects Magic functions, magic objects Some file types Packages Actions What happens at R startup Key actions Errors and such Graphics Vectorization Make mistakes on purpose Help Introduction I asked R users what their biggest stumbling blocks were in learning R. > search()

Testing Packages with Experimental R Devel Build for Windows · rwinlib/r-base Wiki Rtools 3.3 is in the process of being updated to use a new compiler toolchain produced by Jeroen Ooms based on GCC 4.9.3 and Mingw-W64 V3. The upcoming R 3.3 release is planning on adopting this new toolchain. Package authors using compiled code should test their packages with the new toolchain to ensure compatability. This document includes instructions for downloading the requisite versions of R, Rtools, and (optionally) RStudio to perform this testing. Step 1: Install R-devel-experimental for Windows There is an experimental build of R-devel that uses the new Rtools 3.3 toolchain available on CRAN. Step 2: Install the latest Rtools 3.3 There is an updated build of Rtools 3.3 that includes the new toolchain available on CRAN. Note that to be compatible with the instructions below you should choose to install Rtools 3.3 to the default location (c:\Rtools). Rterm / RGUI

The Endeavour | John D. Cook I help people make decisions in the face of uncertainty. Sounds interesting. I’m a data scientist. Not sure what that means, but it sounds cool. I study machine learning. I’m into big data. Even though each of these descriptions makes a different impression, they’re all essentially the same thing. There are distinctions. “Decision-making under uncertainty” emphasizes that you never have complete data, and yet you need to make decisions anyway. “Data science” stresses that there is more to the process of making inferences than what falls under the traditional heading of “statistics.” Despite the hype around the term data science, it’s growing on me. Machine learning, like decision theory, emphasizes the ultimate goal of doing something with data rather than creating an accurate model of the process that generates the data. “Big data” is a big can of worms. Bayesian statistics is much older than what is now sometimes called “classical” statistics.

Integration of R, RStudio and Hadoop in a VirtualBox Cloudera Demo VM on Mac OS X Motivation I was inspired by Revolution’s blog and step-by-step tutorial from Jeffrey Breen on the set up of a local virtual instance of Hadoop with R. However, this tutorial describes the implementation using VMware’s application. Description Hadoop Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. R and Hadoop The most common way to link R and Hadoop is to use HDFS (potentially managed by Hive or HBase) as the long-term store for all data, and use MapReduce jobs (potentially submitted from Hive, Pig, or Oozie) to encode, enrich, and sample data sets from HDFS into R. Cloudera Hadoop Demo VM CDH is Cloudera’s 100% open source distribution of Hadoop and related projects, built specifically to meet enterprise demands. Steps…………… Platforms used in this tutorial: 1. Give a name to the VM. Pick type: Linux Choose version: Linux 2.6 (64 bit) Click ‘Continue’. Click ‘Create’. 6. Click on ‘System’ category. 7.

The R programming language for programmers coming from other programming languages IntroductionAssignment and underscoreVariable name gotchasVectorsSequencesTypesBoolean operatorsListsMatricesMissing values and NaNsCommentsFunctionsScopeMisc.Other resources Ukrainian translation Other languages: Powered by Translate Introduction I have written software professionally in perhaps a dozen programming languages, and the hardest language for me to learn has been R. R is more than a programming language. This document is a work in progress. Assignment and underscore The assignment operator in R is <- as in e <- m*c^2. It is also possible, though uncommon, to reverse the arrow and put the receiving variable on the right, as in m*c^2 -> e. It is sometimes possible to use = for assignment, though I don't understand when this is and is not allowed. However, when supplying default function arguments or calling functions with named arguments, you must use the = operator and cannot use the arrow. At some time in the past R, or its ancestor S, used underscore as assignment. Vectors Sequences

Microsoft’s New Data Science Virtual Machine Earlier this week, Andrie showed you how to set up and provision your own virtual machine (VM) to run R and RStudio in Azure. Another option is to use the new Microsoft Data Science Virtual Machine, a pre-configured instance that includes a suite of tools useful to data scientists, including: Revolution R Open (performance-enhanced R)Anaconda PythonVisual Studio Community EditionPower BI Desktop (with R capabilities)SQL Server Express (with R integration)Azure SDK (including the ability to run R experiments) There's no software charge associated with using this VM, you'll pay only the standard Azure infrastructure fees (starting at about 2 cents an hour for basic instances; more for more powerful instances). By the way, if you're not familiar with these tools in the Data Science VM, Jan Mulkens provides a backgrounder on Data science with Microsoft, including an overview of the Microsoft components. Microsoft Azure Blog: Provision the Microsoft Data Science Virtual Machine Related

Data Sorcery with Clojure What is R? R is data analysis software: data scientists, statisticians, analysts, quants, and others who need to make sense of data use R for statistical analysis, data visualization, and predictive modeling. R is a programming language: you do data analysis in R by writing scripts and functions in the R programming language. R is a complete, interactive, object-oriented language: designed by statisticians, for statisticians. R is an environment for statistical analysis: Available in the R language are functions for virtually every data manipulation, statistical model, or chart that the data analyst could ever need. R is an open-source software project. R is a community. Next: Why use R? This article has been translated to Serbo-Croatian by Jovana Milutinovich from

Why is R so useful | Lee Hawthorn As an Excel power user (someone called me a guru recently!) I know Excel can be used to do pretty much anything – I’ve even seen Excel being used to play the Game of Life. If this is the case why do we need R? In this post I’ll tell you why and then show you. Reproducible We can write an R script once to do any of the following : Acquire dataCleanTransformAnalyseModelReportPublish If the R is written in the correct way it’s reproducible by default. Flexible Excel is flexible as mentioned above. How about R? At the time of writing there are over 6000 packages available for use. Scalability and Availability You can run R in lots of different places: LaptopTabletCloudIn databasesOn really big machines It’s important to know that R runs in-memory. Publish With R we can send the output to lots of different places. SlidesDocumentWeb pageApplication Okay, enough theory. As you can see in the Excel screenshot below, I’ve pulled in data from a database and grouped customers based on demographics.

R in Insurance