background preloader

Data Scraping Wikipedia with Google Spreadsheets

Data Scraping Wikipedia with Google Spreadsheets
Prompted in part by a presentation I have to give tomorrow as an OU eLearning community session (I hope some folks turn up – the 90 minute session on Mashing Up the PLE – RSS edition is the only reason I’m going in…), and in part by Scott Leslie’s compelling programme for a similar duration Mashing Up your own PLE session (scene scetting here: Hunting the Wily “PLE”), I started having a tinker with using Google spreadsheets as for data table screenscraping. So here’s a quick summary of (part of) what I found I could do. The Google spreadsheet function =importHTML(“”,”table”,N) will scrape a table from an HTML web page into a Google spreadsheet. The URL of the target web page, and the target table element both need to be in double quotes. The number N identifies the N’th table in the page (counting starts at 0) as the target table for data scraping. Grab the URL, fire up a new Google spreadsheet, and satrt to enter the formula “=importHTML” into one of the cells: =ImportHtml(“ Why CSV?

http://blog.ouseful.info/2008/10/14/data-scraping-wikipedia-with-google-spreadsheets/

Related:  Jornalismo de Dados

How to Scrape Websites for Data without Programming Skills Searching for data to back up your story? Just Google it, verify the accuracy of the source, and you’re done, right? Not quite. Coding for Journalists 101 : A four-part series Photo by Nico Cavallotto on Flickr Update, January 2012: Everything…yes, everything, is superseded by my free online book, The Bastards Book of Ruby, which is a much more complete walkthrough of basic programming principles with far more practical and up-to-date examples and projects than what you’ll find here. I’m only keeping this old walkthrough up as a historical reference. I’m sure the code is so ugly that I’m not going to even try re-reading it. So check it out: The Bastards Book of Ruby -Dan

How to be a data journalist Data journalism is huge. I don't mean 'huge' as in fashionable - although it has become that in recent months - but 'huge' as in 'incomprehensibly enormous'. It represents the convergence of a number of fields which are significant in their own right - from investigative research and statistics to design and programming. The idea of combining those skills to tell important stories is powerful - but also intimidating. Who can do all that? The reality is that almost no one is doing all of that, but there are enough different parts of the puzzle for people to easily get involved in, and go from there. Creating a Scraper for Multiple URLs Using Regular Expressions Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.You should run these to discover the Hub. NOTE: This tutorial was created using version 0.8.2. The Scraper Editor interface has changed a long time ago. More features were included and some controls now have a new name. The following can still be a good complement to get acquainted with scrapers.

Playing with heat-mapping UK data on OpenHeatMap Last night OpenHeatMap creator Pete Warden announced that the tool now allowed you to visualise UK data . I’ve been gleefully playing with the heat-mapping tool today and thought I’d share some pointers on visualising data on a map. This is not a tutorial for OpenHeatMap – Pete’s done a great job of that himself (video below) – but rather an outline of the steps to get some map-ready data in the first place. [youtube:

An Introduction to Compassionate Screen Scraping Screen scraping is the art of programatically extracting data from websites. If you think it's useful: it is. If you think it's difficult: it isn't. And if you think it's easy to really piss off administrators with ill-considered scripts, you're damn right. This is a tutorial on not just screen scraping, but socially responsible screen scraping. Its an amalgam of getting the data you want and the Golden Rule, and reading it is going to make the web a better place.

Data journalism pt5: Mashing data (comments wanted) This is a draft from a book chapter on data journalism (part 1 looks at finding data; part 2 at interrogating data; part 3 at visualisation, and 4 at visualisation tools). I’d really appreciate any additions or comments you can make – particularly around tips and tools. UPDATE: It has now been published in The Online Journalism Handbook. Mashing data Beautiful Soup: We called him Tortoise because he taught us. You didn't write that awful page. You're just trying to get some data out of it. Beautiful Soup is here to help. Data Journalism As our governments and businesses become increasingly flush with information, more and bigger data are becoming available from across the globe. Increasingly, investigative reporters need to know how to obtain, clean, and analyze “structured information” in this digital world. Here is a list of resources to get you started, but we want to keep updating our community with the best resources available. Do you know of a great data tutorial we haven't listed, perhaps in a language other than English? Help us keep this resource guide comprehensive by sending your favorite resource to: hello@gijn.org. ¿Habla español?

Branded journalists battle newsroom regulations With social media a big part of newsroom life, individual journalists often find their personal brands attractive selling points for future employers. But lately many of these same social media superstars are questioning whether newsrooms are truly ready for the branded journalist. In late January, Matthew Keys, Deputy Social Media Editor at Reuters, wrote a blog post in which he criticized his former employer (ABC affiliate KGO-TV in San Francisco) for taking issue with his use of social media. Keys says his supervisors questioned the language, tone and frequency of his tweets, as well as his judgment when he retweeted his competitors. Not long after Keys’ post went live, CNN’s Roland Martin was suspended for comments he tweeted during the Super Bowl.

Automated Form Submissions and Data Scraping - MySQL Hello Everyone! I'm working on a project that should help me to automate some processes that are extremely time dependent, with a mySQL database. I'm presently working with 2 developers on this project on a contract basis to complete the job. I'm finding my developer hesitant to come up with a solution on how to possibly implement what I'm requesting be done. Development of an automated climatic data scraping, filtering and display system 10.1016/j.compag.2009.12.006 : Computers and Electronics in Agriculture Abstract One of the many challenges facing scientists who conduct simulation and analysis of biological systems is the ability to dynamically access spatially referenced climatic, soil and cropland data. Over the past several years, we have developed an Integrated Agricultural Information and Management System (iAIMS), which consists of foundation class climatic, soil and cropland databases.

Automated Data extraction/Web scraping Services Web scraping or data extraction is also referred to as “crawling” or ”web scraping”. Web scraping is the process of pulling information or content from disparate websites and organising this data to your requirements, whether it be in a form that allows it to be displayed on a website or used for offline purposes. ..Automated Data Collection.. Some clients need to collect data on a scheduled basis or on-demand.

Data Feed Scraping Product Feed Creation, Automated Website Data Extraction and Scraping. Feed Optimise™ specialises in the master product feed creation which then can be used as a data backbone to feed into price comparison engines, affiliate networks, shopping channels and more. We deliver high quality, data rich product feeds extracted from your website's data.

Related: