R codeline_2

Facebook Twitter

Example .  An example of nested downloads using RCurl. This example uses RCurl to download an HTML document and then collect the name of each link within that document.

Example .  An example of nested downloads using RCurl.

The purpose of the example is to illustrate how we can combine the RCurl package to download a document and use this directly within the XML (or HTML) parser without having the entire content of the document in memory. We start the download and pass a function to the xmlEventParse() function for processing. As that XML parser needs more input, it fetches more data from the HTTP response stream. This is useful for handling very large data that is returned from Web queries. To do this, we need to use the multi interface for libcurl in order to have asynchronous or non-blocking downloading of the document. The remaining part is how we combine these pieces with RCurl and the XML packages to do the parsing in this asynchronous, interleaved manner.

The steps in the code are as explained as follows. Perform = FALSE . Library(RCurl) library(XML) Algorithmic Trading with IBrokers. Kyle Matoba is a Finance PhD student at the UCLA Anderson School of Management.

Algorithmic Trading with IBrokers

He gave a presentation on Algorithmic Trading with R and IBrokers at a recent meeting of the Los Angeles R User Group. The discussion of IBrokers begins near the 12-minute mark. To leave a comment for the author, please follow the link and comment on his blog: FOSS Trading. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more... Models Collecting Dust? How to Transform Your Results from Interesting to Impactful. Leading expert James Taylor, author of Decision Management Systems: A Practical Guide to Business Rules and Predictive Analytics, has developed a practical approach you can use to improve adoption and elevate your organization.

Models Collecting Dust? How to Transform Your Results from Interesting to Impactful

In this webinar, James will show you proven framework for putting predictive analytics to work: How to begin model-building with the decision in mind to establish consensus with business process owners;Proven ways to tie decisions to organizations, metrics, systems and business processes and;Pitfalls that prevent success and how to avoid them. Join this webinar to increase your team’s value to the organization and come away with an approach that ensures buy in from the beginning of the process through the implementation of recommendations.

Pretty R syntax highlighter. R - What is the difference between gc() and rm() Memory Available for Data Storage. Description How R manages its workspace.

Memory Available for Data Storage

Details R has a variable-sized workspace. There are (rarely-used) command-line options to control its minimum size, but no longer any to control the maximum size. R maintains separate areas for fixed and variable sized objects. The default values are (currently) an initial setting of 350k cons cell sand 6Mb of vector heap. How much time R spends in the garbage collector will depend on these initial settings and on the trade-off the memory manager makes, when memory fills up, between collecting garbage to free up unused memory and growing these areas. You can find out the current memory consumption (the heap and cons cells used as numbers and megabytes) by typing gc() at the R prompt.

The command-line option --max-ppsize controls the maximum size of the pointer protection stack. See Also An Introduction to R for more command-line options. Memory-limits for the design limitations. Models Collecting Dust? How to Transform Your Results from Interesting to Impactful. Revolution R Enterprise 5.0 now available for free academic download. Revolution Analytics - Commercial Software & Support for the R Statistics Language. “Credit to whom credit is due” – Bloganalysen mit Google und R « LIBREAS.Library Ideas. Untitled. The Comprehensive Perl Archive Network - www.cpan.org.

Extracting comments from a Blogger.com blog post with R. Note #1: Check out this very useful post by Najko Jahn describing how to extract links to blogs via Google Blog Search .

Extracting comments from a Blogger.com blog post with R

Note #2: I’ll update the code below once I find the time using Najko’s cleaner XPath-based solution. Recently I’ve been working with comments as part of the project on science blogging we’re doing at the Junior Researchers Group “Science and the Internet” . I wrote the script below to quickly extract comments from Atom feeds, such as those generated by Blogger.com . The code isn’t exactly pretty, mostly because I didn’t use an XML parser to properly read the data, instead resorting to brute-force pattern matching, but it gets the job done. Two easier (and cleaner) routes would have been to a) get the data directly from the Google Data API (doesn’t work as far as I can tell, since there seems to be no implementation for R*) or b) parse the data specifically as Atom (doesn’t work as — annoyingly — there is no specific parsing support for Atom in R). HtmlToText(): Extracting Text from HTML via XPath. Converting HTML to plain text usually involves stripping out the HTML tags whilst preserving the most basic of formatting.

htmlToText(): Extracting Text from HTML via XPath

I wrote a function to do this which works as follows (code can be found on github): The above uses an XPath approach to achieve it’s goal. Another approach would be to use a regular expression. These two approaches are briefly discussed below: Regular Expressions One approach to achieving this is to use a smart regular expression which matches anything between “<” and “>” if it looks like a tag and rips it out e.g I got the regular expression in “pattern” in the code above from a quick google search which gave this webpage from 2004. I’m still learning regex and I must confess to finding this one slightly intimidating. This approach would require building more and more sophsiticated regular expressions, or filtering through a series of different regular expressions, to get the desired result when taking into account these diversions.

XPath It returned only three lines. GScholarXScraper: Hacking the GScholarScraper function with XPath. Kay Cichini recently wrote a word-cloud R function called GScholarScraper on his blog which when given a search string will scrape the associated search results returned by Google Scholar, across pages, and then produce a word-cloud visualisation.

GScholarXScraper: Hacking the GScholarScraper function with XPath

This was of interest to me because around the same time I posted an independent Google Scholar scraper function get_google_scholar_df() which does a similar job of the scraping part of Kay’s function using XPath (whereas he had used Regular Expressions). My function worked as follows: when given a Google Scholar URL it will extract as much information as it can from each search result on the URL webpage into different columns of a dataframe structure. In the comments of his blog post I figured it’d be fun to hack his function to provide an XPath alternative, GScholarXScraper.

I think that’s pretty much everything I added. Anyway, here’s how it works (link to full code at end of post): // image //image Not bad. Code: JGR « Fells Stats. A GUI for R - Downloading And Installing Deducer. A Spatial Data Analysis GUI for R « Fells Stats. Eclipse IDE for R. Background: Eclipse is an open source Integrated Development Environment (IDE).

Eclipse IDE for R

As with Microsoft's Visual Studio product, Eclipse is programming language-agnostic and supports any language having a suitable plugin for the IDE platform. For Eclipse, the R language plugin is StatET. Figure 1 (above): Eclipse, StatET with R, and the R debugger (bottom window) at work. The R debugger is an R package library and has its own graphical output window separate from Eclipse. The following three (3) part procedure installs Eclipse onto a Windows platform (XP or Windows 7) and adds StatET (R) language support. Part-I Install Eclipse- Download the latest stable Eclipse release (I use Eclipse Classic which is at version 3.5.2 (163 MB) as of 10-April-2010). If you are running a 64-bit Windows and you still do not see the 64-bit offerings on java.com then switch to another browser (64-bit). Part-II Install StatET- Now Eclipse will install StatET, but it will take a few minutes. Part-III Testing -Mark Qu. RForge.net - development environment for R package developers.

Free Development software downloads. R-Extension. Web scraping - Extract Links from Webpage using R. R - extracting node information. Pretty R syntax highlighter. Pretty R syntax highlighter. Questions containing '[r] xml xpath' R - How do I scrape multiple pages with XML and ReadHTMLTable. Xml - Web scraping with R over real estate ads. R preferred by Kaggle competitors. Kaggle, the predictive-analytics competition site, has analyzed the preferences of the 2,500 data scientists who participate in its competitions, and R was the most-preferred software of the competitors at 22.5%.

R preferred by Kaggle competitors

The next-nearest alternative was Matlab, at 16%. On a related note, the premier of the Australian state of New South Wales has just launched a competition on Kaggle to predict the traffic on Sydney's M4 motorway. It's great to see government promoting the use of data analysis to solve (or at least better understand) civic problems, and this competition comes with some serious prizemoney: AUD$10,000 (about the same in $USD). Might be worth your time spending the Thanksgiving break doing a little modeling in R... No Free Hunch: Profiling Kaggle’s user base To leave a comment for the author, please follow the link and comment on his blog: Revolutions. Blog-Reference-Functions/R at master · tonybreyal/Blog-Reference-Functions.

Blog-Reference-Functions/R/googleScholarXScraper/googleScholarXScraper.R at master · tonybreyal/Blog-Reference-Functions. Facebook Graph API Explorer with R (on Windows) « Consistently Infrequent. Library(RCurl) library(RJSONIO) Facebook_Graph_API_Explorer <- function() { get_json_df <- function(data) { l <- list( post.id = lapply(data, function(post) post$id), from.name = lapply(data, function(post) post$to$data[[1]]$name), from.id = lapply(data, function(post) post$to$data[[1]]$id), to.name = lapply(data, function(post) post$to$data[[1]]$name), to.id = lapply(data, function(post) post$to$data[[1]]$id), to.category = lapply(data, function(post) post$to$data[[1]]$category), created.time = lapply(data, function(post) as.character(as.POSIXct(post$created_time, origin="1970-01-01", tz="GMT"))), message = lapply(data, function(post) post$message), type = lapply(data, function(post) post$type), likes.count = lapply(data, function(post) post$likes$count), comments.count = lapply(data, function(post) post$comments$count), sample.comments = lapply(data, function(post) paste(sapply(post$comments$data, function(comment) comment$message), collapse = " [next>>] ")), return(df) ID <- gsub(". df.list <- list()

Facebook Graph API Explorer with R (on Windows) « Consistently Infrequent

Good GUI for R suitable for a beginner wanting to learn programming in R? - Statistical Analysis - Stack Exchange. A Spatial Data Analysis GUI for R. I am excited to announce the addition of DeducerSpatial to the Deducer plug-in ecosystem. DeducerSpatial is a graphical user interface for the visualization and analysis of spatial data, built on Deducer's plug-in platform. In a previous post I illustrated how to user DeducerSpatial from the command line to add Open Street Map images to your R plots. In the video below, I provide a quick tour of the GUI. To try it out for yourself: Install Deducer (Instructions)Open JGREnter the following into the console: install.packages("DeducerSpatial",," DeducerSpatial is loaded ( library(DeducerSpatial) ), you can type data(states) or data(LA_places) to bring in some data to play around with. video link To leave a comment for the author, please follow the link and comment on his blog: Fells Stats » R.

R] Downloading data from from internet. Web scraping. Web scraping You are encouraged to solve this task according to the task description, using any language you may know. Create a program that downloads the time from this URL: and then prints the current UTC time by extracting just the UTC time from the web page's HTML. If possible, only use libraries that come at no extra monetary cost with the programming language and that are widely available and popular such as CPAN for Perl or Boost for C++.

[edit] Ada [edit] AutoHotkey UrlDownloadToFile, time.htmlFileRead, timefile, time.htmlpos := InStr(timefile, "UTC")msgbox % time := SubStr(timefile, pos - 9, 8) [edit] AWK This is inspired by GETURL example in the manual for gawk. #! [edit] ALGOL 68 STRING domain="tycho.usno.navy.mil", page="cgi-bin/timer.pl"; STRING # search for the needle in the haystack # needle = "UTC", hay stack = " re success="^HTTP/[0-9.]* 200", re result description="^HTTP/[0-9.]* [0-9]+ [a-zA-Z ]*", re doctype ="\s\s<!

Sample output: <BR>Sep. 26, 21:51:17 UTC Universal Time. Sorenmacbeth/googleanalytics4r. R - How to transform XML data into a data.frame. Web Scraping Google Scholar (Partial Success) « Consistently Infrequent. Library(XML) library(RCurl) get_google_scholar_df <- function(u, omit.citation = TRUE) { html <- getURL(u) doc <- htmlParse(html) df <- data.frame( title = xpathSApply(doc, "/html/body/div[@class='gs_r']/div[@class='gs_rt']/h3", xmlValue), url = xpathSApply(doc, "//html//body//div[@class='gs_r']//h3", function(x) ifelse(is.null(xmlChildren(x)$a), NA, xmlAttrs(xmlChildren(x)$a, 'href'))), publication = xpathSApply(doc, "//html//body//div[@class='gs_r']//font//span[@class='gs_a']", xmlValue), description = xpathSApply(doc, "//html//body//div[@class='gs_r']//font", xmlValue), type = xpathSApply(doc, "//html//body//div[@class='gs_r']//h3", function(x) xmlValue(xmlChildren(x)$span)), footer = xpathSApply(doc, "/html/body/div[@class='gs_r']/font/span[@class='gs_fl']", xmlValue), stringsAsFactors=FALSE) df$title <- sub(".*\\] ", "", xx) df$description <- sapply(1:dim(df)[1], function(i) gsub(df$publication[i], "", df$description[i], fixed = TRUE)) df$type <- gsub("\\]", "", gsub("\\[", "", df$type))

Web Scraping Google Scholar: Part 2 (Complete Success) « Consistently Infrequent. Library(RCurl) library(XML) get_google_scholar_df <- function(u) { html <- getURL(u) doc <- htmlParse(html) GS_xpathSApply <- function(doc, path, FUN) { path.base <- "/html/body/div[@class='gs_r']" nodes.len <- length(xpathSApply(doc, "/html/body/div[@class='gs_r']")) paths <- sapply(1:nodes.len, function(i) gsub( "/html/body/div[@class='gs_r']", paste("/html/body/div[@class='gs_r'][", i, "]", sep = ""), path, fixed = TRUE)) xx <- sapply(paths, function(xpath) xpathSApply(doc, xpath, FUN), USE.NAMES = FALSE) xx[sapply(xx, length)<1] <- NA xx <- as.vector(unlist(xx)) return(xx) df <- data.frame( footer = GS_xpathSApply(doc, "/html/body/div[@class='gs_r']/font/span[@class='gs_fl']", xmlValue), title = GS_xpathSApply(doc, "/html/body/div[@class='gs_r']/div[@class='gs_rt']/h3", xmlValue), type = GS_xpathSApply(doc, "/html/body/div[@class='gs_r']/div[@class='gs_rt']/h3/span", xmlValue), publication = GS_xpathSApply(doc, "/html/body/div[@class='gs_r']/font/span[@class='gs_a']", xmlValue), df <- df[,-1]

Comment faire pour transformer les données XML dans un data.frame? [BioC] PostForm() with KEGG. Blog-Reference-Functions/R/googlePlusXScraper/googlePlusXScraper.R at master · tonybreyal/Blog-Reference-Functions. Untitled. Untitled. Re: [R] Need help extracting info from XML file using XML package. Wacek Kusnierczyk wrote: > Don MacQueen wrote: >> I have an XML file that has within it the coordinates of some polygons >> that I would like to extract and use in R. The polygons are nested >> rather deeply. For example, I found by trial and error that I can >> extract the coordinates of one of them using functions from the XML >> package: >> >> doc <- xmlInternalTreeParse('doc.kml') >> docroot <- xmlRoot(doc) >> pgon <- > > try > > lapply( > xpathSApply(doc, '//Polygon', > xpathSApply, '//coordinates', function(node) > strsplit(xmlValue(node), split=',|\\s+')), > as.numeric) Just for the record, I the xpath expression in the second xpathSApply would need to be ".

//coordinates" to start searching from the previously matched Polygon node. Otherwise, the search starts from the top of the document again. However, it would seem that xpathSApply(doc, "//Polygon//coordinates", function(node) strsplit(.....)) would be more direct, i.e. fetch the coordinates nodes in single XPath expression. XML package help. Library(XML) url <- " On research, visualization and productivity.

Web Scraping Google Scholar (Partial Success) Web Scraping Google Scholar: Part 2 (Complete Success) When Venn diagrams are not enough – Visualizing overlapping data with Social Network Analysis in R. Untitled. A Short Introduction to the XML package for R. Memory Management in the the XML Package. The XML package. It's crantastic! Grabbing Tables in Webpages Using the XML Package. The Omega Project for Statistical Computing. RCurl. RStudio. Romain Francois, Professional R Enthusiast. R: Web Scraping R-bloggers Facebook Page « Consistently Infrequent. R: A Quick Scrape of Top Grossing Films from boxofficemojo.com « Consistently Infrequent.

Untitled. [R] Need help extracting info from XML file using XML package from Don MacQueen on 2009-03-02 (R help archive) Package XML. CRAN - Package somplot.