background preloader

Welcome

http://scraperwiki.com/

Desktop This software has been renamed to Gapminder World Offline Because of technical problems the software on this page is no longer being maintained! Please visit Gapminder World Offline (Beta) instead. Gapminder Desktop With Gapminder Desktop you can show animated statistics from your own laptop! Install the free software and watch the how-to video with Hans Rosling. CourseWiki - CS448B Data Visualization The world is awash with increasing amounts of data, and we must keep afloat with our relatively constant perceptual and cognitive abilities. Visualization provides one means of combating information overload, as a well-designed visual encoding can supplant cognitive calculations with simpler perceptual inferences and improve comprehension, memory, and decision making. Furthermore, visual representations may help engage more diverse audiences in the process of analytic thinking. In this course we will study techniques and algorithms for creating effective visualizations based on principles from graphic design, visual art, perceptual psychology, and cognitive science. The course is targeted both towards students interested in using visualization in their own work, as well as students interested in building better visualization tools and systems. There are no prerequisites for the class and the class is open to graduate students as well as advanced undergraduates.

Solvent Solvent Why do I need screen scrapers? Piggy Bank needs web pages to embed information in a format that it can understand. This format is called RDF (Resource Description Framework) and its main advantage is that makes machine processing a lot easier. Unfortunately, at these very early stages, not many web pages embed or link to such "purer" RDF information. Piggy Bank, however, is capable of executing a particular screen scraper on particular pages in order to "extract" the information it needs. ie7-js - Project Hosting on Google Code IE7.js is a JavaScript library to make Microsoft Internet Explorer behave like a standards-compliant browser. It fixes many HTML and CSS issues and makes transparent PNG work correctly under IE5 and IE6. Current version: 2.1 beta4. IE7.js Upgrade MSIE5.5-6 to be compatible with MSIE7. IE8.js

Nokogiri: A Faster, Better HTML and XML Parser for Ruby (than Hpricot) Yesterday, Aaron Patterson (@tenderlove) and Mike Dalessio released Nokogiri (Github repository), a new HTML and XML parser for Ruby. It "parses and searches XML/HTML faster than Hpricot" (Hpricot being the current de facto Ruby HTML parser) and boasts XPath support, CSS3 selector support (a big deal, because CSS3 selectors are mega powerful) and the ability to be used as a "drop in" replacement for Hpricot. On an Hpricot vs Nokogiri benchmark, Nokogiri clocked in at 7 times faster at initially loading an XML document, 5 times faster at searching for content based on an XPath, and 1.62 times faster at searching for content via a CSS-based search. These are impressive results, since Hpricot was previously considered to be quite speedy itself. (Update - November 3, 2008: WHY FIGHTS BACK!

Awesome: DIY Data Tool Needlebase Now Available to Everyone If you've been within shouting distance of me over the last month, you've probably heard me singing the praises of Needlebase, a great new point-and-click tool for extracting, sorting and visualizing data from across pages around the web. I've been using it for all kinds of things and now you can too. When we first reviewed Needle here on ReadWriteWeb, it was in closed beta and new users had to request an account. Now it's open and available for all: free for personal use or by subscription for commercial use. List of search engines Wikimedia list article By content/topic General * Powered by Bing Html5 cross browser polyfills - Modernizr - GitHub The No-Nonsense Guide to HTML5 Fallbacks So here we're collecting all the shims, fallbacks, and polyfills in order to implant HTML5 functionality in browsers that don't natively support them. The general idea is that: We, as developers, should be able to develop with the HTML5 APIs, and scripts can create the methods and objects that should exist. Developing in this future-proof way means as users upgrade, your code doesn't have to change but users will move to the better, native experience cleanly. Looking to conditionally load these scripts (client-side), based on feature detects?

ScRUBYt! Ruby's Web Scraping Power Tools It's easy to get started on web scraping with ScRUBYt. Peter Cooper scRUBYt! is, officially, "a simple to learn and use, yet very powerful web extraction framework written in Ruby," developed by Peter Szinek. In reality, it's a lot more exciting than the description suggests, and it makes it easy to perform complex scraping procedures like scraping Amazon listings, eBay auctions, Google results, Digg entries, and so forth, in just a few lines of code. scRUBYt! "extractors" can be initially programmed by example and then used over and over with dynamic, unfixed data in future.

Related: