Scraper un site en Ruby pour les nuls (ou presque) # encoding: UTF-8 require 'open-uri' require 'nokogiri' require 'csv' # Nettoie les caractères inutiles dans une chaine def clean str str.strip.gsub("\n", ' ').gsub(' ', ' ').gsub(' ', ' ').gsub(' ', ' ').gsub(' ', ' ').gsub(' ', ' ').gsub(' ', ' ') end # les types de décisions # on va écrire dans ce fichier"conseil_constitutionel.csv", "w") do |csv| # l'entête csv << ["Année", "Numéro", "Date", "N°", "Type", "Intitulé", "Décision", "URL"] # le point d'entrée main_url = " # dans cette page on récupère tous les liens qui sont dans le div #articlesArchives qui vont correspondre aux pages listant les décisions Nokogiri::HTML(open(main_url)).search('#articlesArchives a').each do |a| # le contenu du lien corespond à l'année year = a.inner_text Nokogiri::XML(open(url_decision), nil, 'UTF-8').search('#articles li').each do |decision| if index_id

Data mining Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.[1] Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use.[1][2][3][4] Data mining is the analysis step of the "knowledge discovery in databases" process or KDD.[5] Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.[1] Etymology[edit] In the 1960s, statisticians and economists used terms like data fishing or data dredging to refer to what they considered the bad practice of analyzing data without an a-priori hypothesis.

Scraping for Journalism: A Guide for Collecting Data Photo by Dan Nguyen/ProPublica Our Dollars for Docs news application lets readers search pharmaceutical company payments to doctors. We’ve written a series of how-to guides explaining how we collected the data. Most of the techniques are within the ability of the moderately experienced programmer. The most difficult-to-scrape site was actually a previous Adobe Flash incarnation of Eli Lilly’s disclosure site. These recipes may be most helpful to journalists who are trying to learn programming and already know the basics. If you are a complete novice and have no short-term plan to learn how to code, it may still be worth your time to find out about what it takes to gather data by scraping web sites -- so you know what you’re asking for if you end up hiring someone to do the technical work for you. The tools With the exception of Adobe Acrobat Pro, all of the tools we discuss in these guides are free and open-source. Ruby – The programming language we use the most at ProPublica.

Classroom 2.0 Introduction au Text-mining Les outils de text-mining ont pour vocation d’automatiser la structuration des documents peu ou faiblement structurés. Ainsi, à partir d’un document texte, un outil de text-mining va générer de l’information sur le contenu du document. Cette information n’était pas présente, ou explicite, dans le document sous sa forme initiale, elle va être rajoutée, et donc enrichir le document. A quoi cela peut bien servir ? à classifier automatiquement des documentsà avoir un aperçu du contenu d’un document sans le lireà alimenter automatiquement des bases de donnéesà faire de la veille sur des corpus documentaires importantsà enrichir l’index d’un moteur de recherche pour améliorer la consultation des documents Bref, plusieurs usages et plusieurs services peuvent découler des solutions de text-mining. Comment çà marche ? Il y a quelques règles de base que les outils de text-mining se doivent de respecter dans leur traitement. une approche statistiqueune approche sémantique 1. 2. Les désavantages : 3.

How to use LinkedIn for data miners If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting! After the article How to use twitter for data miners, let me propose advices on using LinkedIn. First, you may already know that your LinkedIn account can be linked to display your tweets (see this link). Continue by adding the right keywords in your summary, so that other data miners can find you easily. Example of terms are data mining, predictive analytics, knowledge discovery and machine learning. Continue by searching for other people with the same interests (use the same keywords as above). The next step is to participate to data mining groups, such as: ACM SIGKDDAdvanced Business Analytics, Data Mining and Predictive ModelingAnalyticBridgeBusiness AnalyticsCRISP-DMCustomers DNAData MinersData Mining TechnologyData Mining, Statistics, and Data VisualizationMachine Learning ConnectionOpen Source Data MiningSmartData Collective

Top 10 Free Online Mind Mapping Tools As the name, mind mapping means that to draw your mind or ideas as a map, which are well-known for brainstorm, exploring your brain for many ideas. For mind mapping, you can just use a pen and one paper, but it will be funny and easier if you are using below tools, which are all available for you to create mind maps online for free without anything to download or install. 1. Bubblus Bubblus is very simple and easy to use, you just need to enter and drag. The mind maps can be exported as image, XML or HXML files, and you can also share the mind maps with your friends or embed them into your blogs. Go to Bubblus 2. The mind mapping tool Mindomo lets you search YouTube videos, add images, videos or audio with the exist URLs, upload attachment, and add a lot of symbols. You can export the mind map as PDF, Image, RTF and some other format files. Go to Mindomo 3. You can add many interesting icons on the Mind Map with MindMeister easily. Go to MindMeister 4. Go to Mind42 5. Go to Dabbleboard 6. 7. 8. 9.