Game of Friendship Paradox. People on average have fewer friends than their friends download.file(" GoT=read.csv("got.csv") library(networkD3) simpleNetwork(GoT[,1:2]) Because it is difficult for me to incorporate some d3js script in the blog, I will illustrate with a more basic graph, Consider a vertex v∈V in the undirected graph G=(V,E) (with classical graph notations), and let d(v) denote the number of edges touching it (i.e. v has d(v) friends). M=(rbind(as.matrix(GoT[,1:2]),as.matrix(GoT[,2:1]))) nodes=unique(M[,1]) and we each of them, we can get the list of friends, and the number of friends friends = function(x) as.character(M[which(M[,1]==x),2]) nb_friends = Vectorize(function(x) length(friends(x))) as well as the number of friends friends have, and the average number of friends friends_of_friends = function(y) (Vectorize(function(x) length(friends(x)))(friends(y))) nb_friends_of_friends = Vectorize(function(x) mean(friends_of_friends(x)))
Flow charts in R | Insights of a PhD. Flow charts are an important part of a clinical trial report. Making them can be a pain though. One good way to do it seems to be with the grid and Gmisc packages in R. X and Y coordinates can be designated based on the center of the boxes in normalized device coordinates (proportions of the device space – 0.5 is this middle) which saves a lot of messing around with corners of boxes and arrows. A very basic flow chart, based very roughly on the CONSORT version, can be generated as follows… Sections of code to make the boxes are wrapped in brackets to print them immediately. For detailed info, see the Gmisc vignette. Like this: Like Loading... Networks with R. In order to practice with network data with R, we have been playing with the Padgett (1994) Florentine’s wedding dataset (discussed in the lecture).
The dataset is available from > library ( network ) > data(flo) > nflo plot(nflo, displaylabels = TRUE, + boxed.labels = + FALSE) The next step was to move from the network package to igraph. Since we have the adjacency matrix, we can use it > iflo=graph_from_adjacency_matrix(flo, + mode = "undirected") > plot(iflo) The good thing is that a lot of functions are available, for instance we can get shortest paths, between two specific nodes.
> AP=all_shortest_paths(iflo, + from="Peruzzi", + to="Ginori") > L=AP$res[] > V(iflo)$color="yellow" > V(iflo)$color[L[2:4]]="light blue" > V(iflo)$color[L[c(1,5)]]="blue" > plot(iflo) We can also visualize edges, but I found it slightly more complicated (to extract edges from the output) But it works. > library( networkD3 ) > simpleNetwork (df) Then the next question was to add a vertice to the network. Ggnetwork: Network geometries for ggplot2. In-depth analysis of Twitter activity and sentiment, with R. Astronomer and budding data scientist Julia Silge has been using R for less than a year, but based on the posts using R on her blog has already become very proficient at using R to analyze some interesting data sets. She has posted detailed analyses of water consumption data and health care indicators from the Utah Open Data Catalog, religious affiliation data from the Association of Statisticians of American Religious Bodies, and demographic data from the American Community Survey (that's the same dataset we mentioned on Monday).
In a two-part series, Julia analyzed another interesting dataset: her own archive of 10,000 tweets. (Julia provides all the R code for her analyses, so you can download your own Twitter archive and follow along.) In part one, Julia uses just a few lines of R to import her Twitter archive into R — in fact, that takes just one line of R code: tweets <- read.csv(". /tweets.csv", stringsAsFactors = FALSE) mySentiment <- get_nrc_sentiment(tweets$text) Static and dynamic network visualization with R. [June 2017 update] This tutorial is continuously updated and expanded.
The latest version includes additional information (more on multiplex graphs, interactive JS networks, geographic data, etc). If you want to see earlier versions, they are still available here: 2015 and 2016. You can also get the new tutorial PDF and code here or on GithHub. If you find the tutorial useful, please cite it in your work – this helps me make the case that open publishing of digital materials like this is a meaningful academic contribution: Ognyanova, K. (2017) Network visualization with R.
Retrieved from www.kateto.net/network-visualization. Visualizing Twitter history with streamgraphs in R. I was exploring ways to visualize my Twitter history, and ended up creating this interactive streamgraph of my 20 most used hashtags in Twitter: The graph shows how my Twitter activity has varied a lot. The top three hashtags are #datascience, #rstats and #opendata (no surprises there). There are also event-related hashtags that show up only once, such as #tomorrow2015 and #iccss2015, and annually repeating ones, such as #apps4finland. Twitter has quite a strict policy for obtaining data, but they do allow one to download the full personal Twitter history, i.e. all tweets as a convenient csv file (instructions here), so that’s what I did.
Timely Portfolio: visNetwork, Currencies, and Minimum Spanning Trees. # get MST using code from this post# currencies<-na.omit(currencies) colnames(currencies)<-c("Korea", "Malaysia", "Singapore", "Taiwan", "China", "Japan", "Thailand", "Brazil", "Mexico", "India", "USDOther", "USDBroad")#get daily percent changescurrencies <- currencies/lag(currencies)-1 currencies[1,] <- 0 cor.distance <- cor(currencies)corrplot::corrplot(cor.distance) library(igraph)g1 <- graph.adjacency(cor.distance, weighted = T, mode = "undirected", add.colnames = "label")mst <- minimum.spanning.tree(g1)plot(mst) library(visNetwork)mst_df <- get.data.frame( mst, what = "both" )visNetwork( data.frame( id = 1:nrow(mst_df$vertices) ,label = mst_df$vertices ) , mst_df$edges) %>% visOptions( highlightNearest = TRUE, navigation = T )
Mapping Flows in R. Last year I published the above graphic, which then got converted into the below for the book London: The Information Capital. I have had many requests for the code I used to create the plot so here it is! The data shown is the Office for National Statistics flow data. See here for the latest version. The file I used for the above can be downloaded here (it is >109 mb uncompressed so you need a decent computer to load/plot it all at once in R). You will also need this file of area (MSOA) codes and their co-ordinates. The code used is pasted below with comments above each segment. Load the flow data required – origin and destination points are needed. The UK Census file above didn't have coordinates just area codes.
Now for plotting with ggplot2.This first step removes the axes in the resulting plot. xquiet<- scale_x_continuous("", breaks=NULL) yquiet<-scale_y_continuous("", breaks=NULL) quiet<-list(xquiet, yquiet) Let's build the plot. Nnet – R is my friend. I’ve made quite a few blog posts about neural networks and some of the diagnostic tools that can be used to ‘demystify’ the information contained in these models. Frankly, I’m kind of sick of writing about neural networks but I wanted to share one last tool I’ve implemented in R.
I’m a strong believer that supervised neural networks can be used for much more than prediction, as is the common assumption by most researchers. I hope that my collection of posts, including this one, has shown the versatility of these models to develop inference into causation. To date, I’ve authored posts on visualizing neural networks, animating neural networks, and determining importance of model inputs. This post will describe a function for a sensitivity analysis of a neural network. Specifically, I will describe an approach to evaluate the form of the relationship of a response variable with the explanatory variables used in the model. Here’s what the model looks like: Cheers, Marcus 1Garson GD. 1991.
Beautiful network diagrams with ggplot2. Visualizing neural networks from the nnet package – R is my friend. Neural networks have received a lot of attention for their abilities to ‘learn’ relationships among variables. They represent an innovative technique for model fitting that doesn’t rely on conventional assumptions necessary for standard models and they can also quite effectively handle multivariate response data. A neural network model is very similar to a non-linear regression model, with the exception that the former can handle an incredibly large amount of model parameters. For this reason, neural network models are said to have the ability to approximate any continuous function.
I’ve been dabbling with neural network models for my ‘research’ over the last few months. I’ll admit that I was drawn to the approach given the incredible amount of hype and statistical voodoo that is attributed to these models. R has a few packages for creating neural network models (neuralnet, nnet, RSNNS). In this blog I present a function for plotting neural networks from the nnet package. Like this: Organizational Network visualization in R with the igraph package | Rules of Reason. In this post I showed a visualization of the organizational network of my department. Since several people asked for details how the plot has been produced, I will provide the code and some extensions below. The plot has been done entirely in R (2.14.01) with the help of the igraph package.
It is a great package but I found the documentation somewhat difficult to use, so hopefully this post can be a helpful introduction to network visualization with R. Here we go: # Load the igraph package (install if needed) require(igraph) # Data format. The data is in 'edges' format meaning that each row records a relationship (edge) between two people (vertices). # Additional attributes can be included. Here is the result: Not very informative indeed. #Subset the data. Still not perfect, but much more informative and aesthetically pleasing. Additional information can be found on this guide to igraph which is in development, the examples here, and the official CRAN documentation of the package.