background preloader

Data mining

Facebook Twitter

Datamining Twitter. On its own, Twitter builds an image for companies; very few are aware of this fact. When a big surprise happens, it is too late: a corporation suddenly sees a facet of its business — most often a looming or developing crisis — flare up on Twitter. As always when a corporation is involved, there is money to be made by converting the problem into an opportunity: Social network intelligence is poised to become a big business. In theory, when it comes to assessing the social media presence of a brand, Facebook is the place to go. But as brands flock to the dominant social network, the noise becomes overwhelming and the signal — what people really say about the brand — becomes hard to extract. By comparison, Twitter more swiftly reflects the mood of users of a product or service.

Datamining Twitter is not trivial. Companies such as DataSift (launched last month) exploit the Twitter fire hose by relying on the 40-plus metadata included in a post. …is a rich trove of data. How to use LinkedIn for data miners. If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting! After the article How to use twitter for data miners, let me propose advices on using LinkedIn. First, you may already know that your LinkedIn account can be linked to display your tweets (see this link). Continue by adding the right keywords in your summary, so that other data miners can find you easily. Example of terms are data mining, predictive analytics, knowledge discovery and machine learning. Continue by searching for other people with the same interests (use the same keywords as above). To introduce yourself to people you don’t know, you may use the “friend” type of connection.

The next step is to participate to data mining groups, such as: Scraper un site en Ruby pour les nuls (ou presque) # encoding: UTF-8 require 'open-uri' require 'nokogiri' require 'csv' # Nettoie les caractères inutiles dans une chaine def clean str str.strip.gsub("\n", ' ').gsub(' ', ' ').gsub(' ', ' ').gsub(' ', ' ').gsub(' ', ' ').gsub(' ', ' ').gsub(' ', ' ') end # les types de décisions # on va écrire dans ce fichier CSV.open("conseil_constitutionel.csv", "w") do |csv| # l'entête csv << ["Année", "Numéro", "Date", "N°", "Type", "Intitulé", "Décision", "URL"] # le point d'entrée main_url = " # dans cette page on récupère tous les liens qui sont dans le div #articlesArchives qui vont correspondre aux pages listant les décisions Nokogiri::HTML(open(main_url)).search('#articlesArchives a').each do |a| # le contenu du lien corespond à l'année year = a.inner_text Nokogiri::XML(open(url_decision), nil, 'UTF-8').search('#articles li').each do |decision| if index_id.

Data Mining Map. Scraping for Journalism: A Guide for Collecting Data. Photo by Dan Nguyen/ProPublica Our Dollars for Docs news application lets readers search pharmaceutical company payments to doctors. We’ve written a series of how-to guides explaining how we collected the data. Most of the techniques are within the ability of the moderately experienced programmer. The most difficult-to-scrape site was actually a previous Adobe Flash incarnation of Eli Lilly’s disclosure site. Lilly has since released their data in PDF format. These recipes may be most helpful to journalists who are trying to learn programming and already know the basics. If you are a complete novice and have no short-term plan to learn how to code, it may still be worth your time to find out about what it takes to gather data by scraping web sites -- so you know what you’re asking for if you end up hiring someone to do the technical work for you.

The tools With the exception of Adobe Acrobat Pro, all of the tools we discuss in these guides are free and open-source. A Guide to the Guides. Rip and read: 6 OCR tools put to the test. Data Mining Community's Top Resource.