background preloader

Twitter et social data

Facebook Twitter

Twitter influence. Papadopoulos communitydetection dami2012. Twitterrank: Finding Topic Sensitive Influential Twitterers. Cha: the million follower fallacy. A Survey of Data Mining Techniques for SNA: TOP ! PPT Social Network Analysis. Data Mining in Social Networks. Twitter Data + Python (and JS): Geolocation. Geolocation is the process of identifying the geographic location of an object such as a mobile phone or a computer. Twitter allows its users to provide their location when they publish a tweet, in the form of latitude and longitude coordinates. With this information, we are ready to create some nice visualisation for our data, in the form of interactive maps. This article briefly introduces the GeoJSON format and Leaflet.js, a nice Javascript library for interactive maps, and discusses its integration with the Twitter data we have collected in the previous parts of this tutorial (see Part 4 for details on the rugby data set).

Tutorial Table of Contents: GeoJSON GeoJSON is a format for encoding geographic data structures. The format supports a variety of geometric types that can be used to visualise the desired shapes onto a map. In GeoJSON, we can also represent objects such as a Feature or a FeatureCollection. This is how the JSON structure looks like: From Tweets to GeoJSON Summary. Data Twitter: ppt avec Topic modelling & clustering. Mining the Social Web, 2E: bonne intro.

This chapter kicks off our journey of mining the social web with Twitter, a rich source of social data that is a great starting point for social web mining because of its inherent openness for public consumption, clean and well-documented API, rich developer tooling, and broad appeal to users from every walk of life. Twitter data is particularly interesting because tweets happen at the "speed of thought" and are available for consumption as they happen in near real time, represent the broadest cross-section of society at an international level, and are so inherently multifaceted. Tweets and Twitter's "following" mechanism link people in a variety of ways, ranging from short (but often meaningful) conversational dialogues to interest graphs that connect people and the things that they care about. Since this is the first chapter, we'll take our time acclimating to our journey in social web mining. However, given that Twitter data is so accessible and open to public scrutiny, ???

See ??? Slides from my R tutorial on Twitter text mining #rstats | Things I tend to forget. Update: An expanded version of this tutorial will appear in the new Elsevier book Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications by Gary Miner et. al which is now available for pre-order from Amazon. In conjunction with the book, I have cleaned up the tutorial code and published it on github. Last month I presented this introduction to R at the Boston Predictive Analytics MeetUp on Twitter Sentiment.

The goal of the presentation was to expose a first-time (but technically savvy) audience to working in R. The scenario we work through is to estimate the sentiment expressed in tweets about major U.S. airlines. Even with a tiny sample and a very crude algorithm (simply counting the number of positive vs. negative words), we find a believable result. Jeff Gentry’s twitteR package makes it easy to fetch the tweets. Here is the slimmed-down version of the slides: And here’s a PDF version to download. Like this: Like Loading... Récupérer des tweets et en faire un graphe, puis l’analyser (introduction) Pas besoin d’être un informaticien pour générer quelque chose du genre… … mais un minimum de patience est tout de même souhaité :-) Précision : les couleurs ne représentent rien, c’est juste pour faire joli.

Plus tard dans l’analyse, par contre… Ce graphe est composé des échanges entre le mercredi 4 avril, midi environ et le jeudi 5 avril, même heure. Les tweets récupérés (174, pour 37 contributeurs, on est à la veille des vacances) ont comme point commun lehashtag « #EnLD » propre à l’émission « En Ligne Directe » de la radio RTS La Première. Le débat s’écoute ici. Cette émission, diffusée en direct de 8h à 8h30 tous les matins de la semaine, est une émission de débat comportant des invités (choisis par la rédaction), des téléphones d’auditeurs pris en direct, et des messages pré-enregistrés d’auditeurs (via une application pour smartphones ou email). Dans les posts qui vont suivre celui-ci, je vais détailler : Mais je vais commencer par aller manger.

J'aime : J'aime chargement… Scraping Twitter with R – a “How to…” | sytpp | the datablog. To analyse what is going on in the Twittersphere, Twitter provides an API (Application Programming Interface) for mining tweets and metadata of users and locations. If you want to analyse tweets quantitatively, the online interface is not an option but luckily there are more tools out there to help you mine Twitter than I can possibly list. One of them is PeopleBrowsr: Amongst other social media platforms it mines Twitter, collects and saves Tweets for 1500 days back in time – for all keywords and hashtags.

For private use you can work with tweets that date back 60 days (Here is a ‘How To..’ from Jack), beyond that there is a pay wall. Then there is ScraperWiki, a open source library based on Python. The analysis programme of my choice is R – because it is a powerful tool for statistics, it has some very beautiful visualisation packages and, importantly, it is free. Twitter has two APIs to get the data: > install.packages('twittR') > install.packages('streamR') Ok. Step #2: Authentiction. Analyze Twitter Data Using R. Twitter data available through its API provides a wealth of real time information.

This article demonstrates a graph of user relationships and an analysis of tweets returned in a search using R. Keep in mind, Twitter has announced that basic authentication removal is going to occur on August 16, 2010. I am not sure how this code will work after that point... it depends upon the state of the twitteR library at that time and API specifics that Twitter implements. library(twitteR) library(igraph) twitterGraph = function (username, password, userToPlot) sess <- initSession(username, password) friends.object <- userFriends(userToPlot,n=20, sess) followers.object <- userFollowers(userToPlot,n=20, sess) friends <- sapply(friends.object,name) followers <- sapply(followers.object,name) relations <- merge(data.frame(User=userToPlot, Follower=friends), data.frame(User=followers, Follower=userToPlot), all=T) g <- graph.data.frame(relations, directed = T) V(g)$label <- V(g)$name g Vertices: 16 Edges: 16 Edges:

R: Applying a function to every row of a data frame at Mark Needham. In my continued exploration of London’s meetups I wanted to calculate the distance from meetup venues to a centre point in London. I’ve created a gist containing the coordinates of some of the venues that host NoSQL meetups in London town if you want to follow along: Now to do the calculation. I’ve chosen the Centre Point building in Tottenham Court Road as our centre point. We can use the distHaversine function in the geosphere library allows us to do the calculation: Now we can calculate the distance from Skillsmatter to our centre point: That works pretty well so now we want to apply it to every row in the venues data frame and add an extra column containing that value.

This was my first attempt… …which didn’t work quite as I’d imagined! I eventually found my way to the by function which allows you to ‘apply a function to a data frame split by factors’. I wired everything up like so: We can now add the distances to our venues data frame: Et voila!