background preloader

WARC

Facebook Twitter

WARC Files: A Challenge for Historians, and Finding Needles in Haystacks « Ian Milligan. This is the first part in a series dealing with Internet Archive WARC files. This introduces the issue. The second introduces WARC Tools, and the third moves from that into a discussion of how to create a full-text searchable database. All rough estimates, but you get the picture. :) WARC, Web ARChive file format. WarcManager - Adapt. The Warc Manager is a tool to help archives quickly browse, search, and analyze archives of web crawl data. The manager is lightweight database web application which indexes and provides a nice browsing interface to a collection of warc data. The warc manager offers two ways to search collections. 1. If you know the page you want to view, start typing the full URL into the search box starting with 'http'. You will see a drop-down box containing URL's that match what you have typed. Archiving11-smorul.pdf (Objet application/pdf)