Search / information retrieval

FacebookTwitter
Folksonomy / social tag

Web scraping ( web harvesting or web data extraction ) is a computer software technique of extracting information from websites . Usually, such software programs simulate human exploration of the World Wide Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding a fully-fledged web browser, such as Internet Explorer or Mozilla Firefox . Web scraping is closely related to web indexing , which indexes information on the web using a bot or web crawler and is a universal technique adopted by most search engines. In contrast, web scraping focuses more on the transformation of unstructured data on the web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet. Web scraping is also related to web automation, which simulates human browsing using computer software.

Web scraping

http://en.wikipedia.org/wiki/Web_scraping
faceted search

http://experiencinginformation.wordpress.com/2011/07/17/roi-of-faceted-navigation/

ROI Of Faceted Navigation?

17 July 2011 Faceted navigation is widespread on the web (a.k.a faceted search and faceted browse). It’s become an expected standard.
http://www.miskatonic.org/library/facet-web-howto.html Update February 2011: This has been translated into Dutch: Hoe maak je een facetclassificatie en hoe plaats je haar op het web? Many thanks to Janette Shew and the Information Architecture Institute's Translations Initiative for doing this. Also, How to Reuse a Faceted Classification and Put It On the Semantic Web , by Bene Rodriguez-Castro, Hugh Glaser and Les Carr, takes my example of dishwashing detergents and extends it into ontologies and RDF.

How to Make a Faceted Classification and Put It On the Web | Miskatonic University Press

Metadata is information about information : more precisely, it's structured information about resources . This can be a single set of hierarchical subject labels, such as a Yahoo or Open Directory Project category. More often, the metadata has several facets : attributes in various orthogonal sets of categories. This is often stored in database record fields and tables, especially for product catalogs. Examples of faceted metadata include: Music catalog: songs have attributes such as artist, title, length, genre, date... http://www.searchtools.com/info/faceted-metadata.html

Faceted Metadata Search - Search Tools Report

http://www.searchtools.com/info/database-search.html

Full-Text Searching and Database Content: SearchTools Report

As of January, 2012, this site is no longer being updated, due to work and health issues Databases provide the content storage for many sites, which dynamically create web pages around them, including ecommerce catalog sites, online news, and even entertainment sites. Intranets often contain large amounts of text stored in databases as well. These databases generally have their own search functions, which may appear to take the place of a full-text search engine. But that's not always the case.