background preloader

R codeline_2

Facebook Twitter

Example .  An example of nested downloads using RCurl. This example uses RCurl to download an HTML document and then collect the name of each link within that document. The purpose of the example is to illustrate how we can combine the RCurl package to download a document and use this directly within the XML (or HTML) parser without having the entire content of the document in memory. We start the download and pass a function to the xmlEventParse() function for processing.

As that XML parser needs more input, it fetches more data from the HTTP response stream. This is useful for handling very large data that is returned from Web queries. To do this, we need to use the multi interface for libcurl in order to have asynchronous or non-blocking downloading of the document. The idea is quite simple. The remaining part is how we combine these pieces with RCurl and the XML packages to do the parsing in this asynchronous, interleaved manner.

The steps in the code are as explained as follows. Perform = FALSE . Library(RCurl) library(XML) Algorithmic Trading with IBrokers. Models Collecting Dust? How to Transform Your Results from Interesting to Impactful. Data science, machine learning, and analytics have re-defined how we look at the world. The R community plays a vital role in that transformation and the R language continues to be the de-facto choice for statistical computing, data analysis, and many machine learning scenarios. The importance of R was first recognized by the SQL Server team back in 2016 with the launch of SQL ML Services and R Server. Over the years we have added Python to SQL ML Services in 2017 and Java support through our language extensions in 2019. Earlier this year we also announced the general availability of SQL ML Services into Azure SQL Managed Instance. SparkR, sparklyr, and PySpark are also available as part of SQL Server Big Data Clusters.

With that said, much has changed in the world of data science and analytics since 2016. Today we are making the following announcements to clearly state our direction and intent for R within Azure SQL and SQL Server. Looking to the future with the R community. Pretty R syntax highlighter. R - What is the difference between gc() and rm() Memory Available for Data Storage. Models Collecting Dust? How to Transform Your Results from Interesting to Impactful. Revolution R Enterprise 5.0 now available for free academic download. Revolution Analytics - Commercial Software & Support for the R Statistics Language. “Credit to whom credit is due” – Bloganalysen mit Google und R « LIBREAS.Library Ideas.

Angeregt vom wachsenden Interesse quantitativen Untersuchungen über die Wirkung von Bloginhalten, wie zuletzt im Beitrag Blogs als Quellen in der bibliothekarischen Fachkommunikation, lässt sich ebenfalls die Verlinkung innerhalb von Blogs näher explorieren. Um schnell an möglichen Daten zu gelangen, erscheint vielversprechend. Dank R sind die Daten für die weitere statistische Untersuchung der Bloglinks auf den LIBREAS Blog auch ohne Programmierkenntnisse schnell gewonnen: library(XML) google <- " para <- "&num=100&hl=de&lr=&safe=off&output=atom" blog <- "libreas.wordpress.com" #blog url url<-paste(google,blog,para)#query doc<-xmlTreeParse(url,useInternal=T) name=xpathApply(doc, "//r:uri",xmlValue,namespaces=c(r=" name=as.character(unlist(name)) lib<-as.character(rep(blog,times=length(name))) df<-as.data.frame(cbind(lib,name)) Created by Pretty R at inside-R.org Gefällt mir:

Untitled. The Comprehensive Perl Archive Network - www.cpan.org. Extracting comments from a Blogger.com blog post with R. Note #1: Check out this very useful post by Najko Jahn describing how to extract links to blogs via Google Blog Search . Note #2: I’ll update the code below once I find the time using Najko’s cleaner XPath-based solution. Recently I’ve been working with comments as part of the project on science blogging we’re doing at the Junior Researchers Group “Science and the Internet” . I wrote the script below to quickly extract comments from Atom feeds, such as those generated by Blogger.com .

The code isn’t exactly pretty, mostly because I didn’t use an XML parser to properly read the data, instead resorting to brute-force pattern matching, but it gets the job done. Two easier (and cleaner) routes would have been to a) get the data directly from the Google Data API (doesn’t work as far as I can tell, since there seems to be no implementation for R*) or b) parse the data specifically as Atom (doesn’t work as — annoyingly — there is no specific parsing support for Atom in R). HtmlToText(): Extracting Text from HTML via XPath. Converting HTML to plain text usually involves stripping out the HTML tags whilst preserving the most basic of formatting. I wrote a function to do this which works as follows (code can be found on github): The above uses an XPath approach to achieve it’s goal.

Another approach would be to use a regular expression. These two approaches are briefly discussed below: Regular Expressions One approach to achieving this is to use a smart regular expression which matches anything between “<” and “>” if it looks like a tag and rips it out e.g., I got the regular expression in “pattern” in the code above from a quick google search which gave this webpage from 2004. I’m still learning regex and I must confess to finding this one slightly intimidating. This approach would require building more and more sophsiticated regular expressions, or filtering through a series of different regular expressions, to get the desired result when taking into account these diversions. XPath. GScholarXScraper: Hacking the GScholarScraper function with XPath. Kay Cichini recently wrote a word-cloud R function called GScholarScraper on his blog which when given a search string will scrape the associated search results returned by Google Scholar, across pages, and then produce a word-cloud visualisation.

This was of interest to me because around the same time I posted an independent Google Scholar scraper function get_google_scholar_df() which does a similar job of the scraping part of Kay’s function using XPath (whereas he had used Regular Expressions). My function worked as follows: when given a Google Scholar URL it will extract as much information as it can from each search result on the URL webpage into different columns of a dataframe structure.

In the comments of his blog post I figured it’d be fun to hack his function to provide an XPath alternative, GScholarXScraper. I think that’s pretty much everything I added. Anyway, here’s how it works (link to full code at end of post): // image //image Not bad. Code: JGR « Fells Stats. A GUI for R - Downloading And Installing Deducer. A Spatial Data Analysis GUI for R « Fells Stats. Eclipse IDE for R. Background: Eclipse is an open source Integrated Development Environment (IDE). As with Microsoft's Visual Studio product, Eclipse is programming language-agnostic and supports any language having a suitable plugin for the IDE platform. For Eclipse, the R language plugin is StatET. Figure 1 (above): Eclipse, StatET with R, and the R debugger (bottom window) at work. The R debugger is an R package library and has its own graphical output window separate from Eclipse. The following three (3) part procedure installs Eclipse onto a Windows platform (XP or Windows 7) and adds StatET (R) language support.

The three parts of the procedure are (1)Install Eclipse, (2)Install StatET, (3)Configure Eclipse / R. We then follow this procedure with tests to confirm the installation and configuration. Part-I Install Eclipse- Download the latest stable Eclipse release (I use Eclipse Classic which is at version 3.5.2 (163 MB) as of 10-April-2010). Part-II Install StatET- Part-III Install R, if you need to.

RForge.net - development environment for R package developers. Tinn-R | Free Development software downloads. R-Extension. Web scraping - Extract Links from Webpage using R. R - extracting node information. Pretty R syntax highlighter. Pretty R syntax highlighter. Questions containing '[r] xml xpath' R - How do I scrape multiple pages with XML and ReadHTMLTable. Xml - Web scraping with R over real estate ads.

R preferred by Kaggle competitors. Blog-Reference-Functions/R at master · tonybreyal/Blog-Reference-Functions. Blog-Reference-Functions/R/googleScholarXScraper/googleScholarXScraper.R at master · tonybreyal/Blog-Reference-Functions. Facebook Graph API Explorer with R (on Windows) « Consistently Infrequent. Library(RCurl) library(RJSONIO) Facebook_Graph_API_Explorer <- function() { get_json_df <- function(data) { l <- list( post.id = lapply(data, function(post) post$id), from.name = lapply(data, function(post) post$to$data[[1]]$name), from.id = lapply(data, function(post) post$to$data[[1]]$id), to.name = lapply(data, function(post) post$to$data[[1]]$name), to.id = lapply(data, function(post) post$to$data[[1]]$id), to.category = lapply(data, function(post) post$to$data[[1]]$category), created.time = lapply(data, function(post) as.character(as.POSIXct(post$created_time, origin="1970-01-01", tz="GMT"))), message = lapply(data, function(post) post$message), type = lapply(data, function(post) post$type), likes.count = lapply(data, function(post) post$likes$count), comments.count = lapply(data, function(post) post$comments$count), sample.comments = lapply(data, function(post) paste(sapply(post$comments$data, function(comment) comment$message), collapse = " [next>>] ")), return(df) ID <- gsub(". df.list <- list()

Good GUI for R suitable for a beginner wanting to learn programming in R? - Statistical Analysis - Stack Exchange. A Spatial Data Analysis GUI for R. R] Downloading data from from internet. Web scraping. Web scraping You are encouraged to solve this task according to the task description, using any language you may know. Create a program that downloads the time from this URL: and then prints the current UTC time by extracting just the UTC time from the web page's HTML. If possible, only use libraries that come at no extra monetary cost with the programming language and that are widely available and popular such as CPAN for Perl or Boost for C++. [edit] Ada [edit] AutoHotkey UrlDownloadToFile, time.htmlFileRead, timefile, time.htmlpos := InStr(timefile, "UTC")msgbox % time := SubStr(timefile, pos - 9, 8) [edit] AWK This is inspired by GETURL example in the manual for gawk. #! [edit] ALGOL 68 Sample output: <BR>Sep. 26, 21:51:17 UTC Universal Time [edit] App Inventor App Inventor has a Web component that contains code blocks which simplify Web scraping.

A picture of the graphical program/ [edit] BBC BASIC [edit] C. Sorenmacbeth/googleanalytics4r. R - How to transform XML data into a data.frame. Web Scraping Google Scholar (Partial Success) « Consistently Infrequent. Library(XML) library(RCurl) get_google_scholar_df <- function(u, omit.citation = TRUE) { html <- getURL(u) doc <- htmlParse(html) df <- data.frame( title = xpathSApply(doc, "/html/body/div[@class='gs_r']/div[@class='gs_rt']/h3", xmlValue), url = xpathSApply(doc, "//html//body//div[@class='gs_r']//h3", function(x) ifelse(is.null(xmlChildren(x)$a), NA, xmlAttrs(xmlChildren(x)$a, 'href'))), publication = xpathSApply(doc, "//html//body//div[@class='gs_r']//font//span[@class='gs_a']", xmlValue), description = xpathSApply(doc, "//html//body//div[@class='gs_r']//font", xmlValue), type = xpathSApply(doc, "//html//body//div[@class='gs_r']//h3", function(x) xmlValue(xmlChildren(x)$span)), footer = xpathSApply(doc, "/html/body/div[@class='gs_r']/font/span[@class='gs_fl']", xmlValue), stringsAsFactors=FALSE) df$title <- sub(".*\\] ", "", xx) df$description <- sapply(1:dim(df)[1], function(i) gsub(df$publication[i], "", df$description[i], fixed = TRUE)) df$type <- gsub("\\]", "", gsub("\\[", "", df$type))

Web Scraping Google Scholar: Part 2 (Complete Success) « Consistently Infrequent. Library(RCurl) library(XML) get_google_scholar_df <- function(u) { html <- getURL(u) doc <- htmlParse(html) GS_xpathSApply <- function(doc, path, FUN) { path.base <- "/html/body/div[@class='gs_r']" nodes.len <- length(xpathSApply(doc, "/html/body/div[@class='gs_r']")) paths <- sapply(1:nodes.len, function(i) gsub( "/html/body/div[@class='gs_r']", paste("/html/body/div[@class='gs_r'][", i, "]", sep = ""), path, fixed = TRUE)) xx <- sapply(paths, function(xpath) xpathSApply(doc, xpath, FUN), USE.NAMES = FALSE) xx[sapply(xx, length)<1] <- NA xx <- as.vector(unlist(xx)) return(xx) df <- data.frame( footer = GS_xpathSApply(doc, "/html/body/div[@class='gs_r']/font/span[@class='gs_fl']", xmlValue), title = GS_xpathSApply(doc, "/html/body/div[@class='gs_r']/div[@class='gs_rt']/h3", xmlValue), type = GS_xpathSApply(doc, "/html/body/div[@class='gs_r']/div[@class='gs_rt']/h3/span", xmlValue), publication = GS_xpathSApply(doc, "/html/body/div[@class='gs_r']/font/span[@class='gs_a']", xmlValue), df <- df[,-1]

Comment faire pour transformer les données XML dans un data.frame? | TecHerald.com. J'essaie d'apprendre R XML paquet. J'essaie de créer un data.frame échantillon books.xml fichier de données XML. C'est ce que j'obtiens: library(XML) books <- " doc <- xmlTreeParse(books, useInternalNodes = TRUE) doc xpathApply(doc, "//book", function(x) do.call(paste, as.list(xmlValue(x)))) xpathSApply(doc, "//book", function(x) strsplit(xmlValue(x), " ")) xpathSApply(doc, "//book/child::*", xmlValue) Chacun d'eux est xpathSApply moi même pas proche de mon intention. Comment dois-je procéder à un data.frame bien formés? Membres Shane a dit: En général, je vous suggère d'essayer le xmlToDataFrame() la fonction, mais je pense que ce sera assez difficile car il n'est pas bien structuré au départ. Je recommande de travailler avec cette fonction: xmlToList(books) Un des problèmes est qu'il ya un certain nombre d'auteurs par livre, donc vous devez décider comment gérer cela quand vous êtes la structuration de votre trame de données.

[BioC] PostForm() with KEGG. Blog-Reference-Functions/R/googlePlusXScraper/googlePlusXScraper.R at master · tonybreyal/Blog-Reference-Functions. Untitled. Untitled. Re: [R] Need help extracting info from XML file using XML package. XML package help. Library(XML) url <- " Solomon Messing | On research, visualization and productivity. Web Scraping Google Scholar (Partial Success) Web Scraping Google Scholar: Part 2 (Complete Success)

Get_google_scholar_df <- function(u) { html <- getURL(u) doc <- htmlParse(html) GS_xpathSApply <- function(doc, path, FUN) { path.base <- "/html/body/div[@class='gs_r']" nodes.len <- length(xpathSApply(doc, "/html/body/div[@class='gs_r']")) paths <- sapply(1:nodes.len, function(i) gsub( "/html/body/div[@class='gs_r']", paste("/html/body/div[@class='gs_r'][", i, "]", sep = ""), path, fixed = TRUE)) xx <- sapply(paths, function(xpath) xpathSApply(doc, xpath, FUN), USE.NAMES = FALSE) xx[sapply(xx, length)<1] <- NA xx <- as.vector(unlist(xx)) return(xx) df <- data.frame( footer = GS_xpathSApply(doc, "/html/body/div[@class='gs_r']/font/span[@class='gs_fl']", xmlValue), title = GS_xpathSApply(doc, "/html/body/div[@class='gs_r']/div[@class='gs_rt']/h3", xmlValue), type = GS_xpathSApply(doc, "/html/body/div[@class='gs_r']/div[@class='gs_rt']/h3/span", xmlValue), publication = GS_xpathSApply(doc, "/html/body/div[@class='gs_r']/font/span[@class='gs_a']", xmlValue), stringsAsFactors = FALSE) df <- df[,-1]

When Venn diagrams are not enough – Visualizing overlapping data with Social Network Analysis in R. Untitled. Abstract The idea here is to provide simple examples of how to get started with processing XML in R using some reasonably straightforward "flat" XML files and not worrying about efficiency. Here is an example of a simple file in XML containing grades for students for three different tests. <? Xml version="1.0" ?

><TABLE><GRADES><STUDENT> Fred </STUDENT><TEST1> 66 </TEST1><TEST2> 80 </TEST2><FINAL> 70 </FINAL></GRADES><GRADES><STUDENT> Wilma </STUDENT><TEST1> 97 </TEST1><TEST2> 91 </TEST2><FINAL> 98 </FINAL></GRADES></TABLE> We might want to turn this into a data frame in R with a row for each student and four variables, the name and the scores on the three tests. Since this is a small file, let's not worry about efficiency in any way. Doc = xmlRoot(xmlTreeParse("generic_file.xml")) We use xmlRoot() to get the top-level node of the tree rather than holding onto the general document information since we won't need it. function(node) xmlSApply(node, xmlValue) xmlSApply(doc[[1]], xmlValue)

A Short Introduction to the XML package for R. To parse an XML document, you can use xmlInternalTreeParse() or xmlTreeParse() (with useInternalNodes specified as TRUE or FALSE) or xmlEventParse() . If you are dealing with HTML content which is frequently malformed (i.e. nodes not terminated, attributes not quoted, etc.), you can use htmlTreeParse() . You can give these functions the name of a file, a URL (HTTP or FTP) or XML text that you have previously created or read from a file.

If you are working with small to moderately sized XML files, it is easiest to use xmlInternalTreeParse() to first read the XML tree into memory. #" doc = xmlInternalTreeParse("Install/Web/index.html.in") Then you can traverse the tree looking for the information you want and putting it into different forms. There are two ways to do this iteration. Many people find recursion confusing, and when coupled with the need for non-local variables and mutable state, a different approach can be welcome.

Or while(! Memory Management in the the XML Package. The XML package. It's crantastic! Grabbing Tables in Webpages Using the XML Package. The Omega Project for Statistical Computing. RCurl. RStudio. Romain Francois, Professional R Enthusiast. R: Web Scraping R-bloggers Facebook Page « Consistently Infrequent. R: A Quick Scrape of Top Grossing Films from boxofficemojo.com « Consistently Infrequent. Untitled. [R] Need help extracting info from XML file using XML package from Don MacQueen on 2009-03-02 (R help archive) Package XML. CRAN - Package somplot.