background preloader

Digitization

Facebook Twitter

Digitisation. Maps tools libraries docs. CyArk. Digital Collections.

Metadata

Namespaces. ISO Archiving Standards - Overview. ISO has encouraged the development of standards in support of the long term preservation of digital information obtained from observations of the terrestrial and space environments. ISO requested that the Consultative Committee for Space Data SystemsPanel 2 coordinate the development of those standards.

(CCSDS has subsequently reorganized and the work is now situated in the Data Archive Ingest (DAI) Working Group.) The initial effort has been the development of a Reference Model for an Open Archival Information System (OAIS). The OAIS Reference Model has been reviewed and pending some editorial updates, It has been approved as an ISO Standard and as a CCSDS Recommendation. The development history of this effort can be seen by surveying the many past US, French, British and International workshops. AWIICS identified significant interest in starting new standardization efforts, and provided starting documents related to Ingest, Identification, and Certification of Archives. Past Workshops. 10 Universities with Amazing Online Collections. Virtual Library. Links to full text books, pamphlets, newspapers and historical documents on the World Wide Web Contents: Methodology: The Costs of Digital Imaging This article, by NARA staff member Steve Puglia, appeared in the October 15, 1999 issue of RLG DigiNews.

Links to several recent reports on the costs of digitization are provided by the author. Digital Preservation The National Digital Information Infrastructure and Preservation Program, or Digital Preservation Program for short, addresses questions being raised about the preservation and accessibility of electronic media. Digital Strategy for the Library of Congress Commissioned by the National Research Council and made available by the National Academy Press. Top of Page Digital Collections: The Avalon Project (Yale Law School) Documents relevant to the fields of Law, History, Economics, Politics, Diplomacy and Government. Subject-oriented Virtual Collections: E-Book Collections (Full-text): PREMIS: Preservation Metadata Maintenance Activity. The PREMIS Data Dictionary for Preservation Metadata is the international standard for metadata to support the preservation of digital objects and ensure their long-term usability. Developed by an international team of experts, PREMIS is implemented in digital preservation projects around the world, and support for PREMIS is incorporated into a number of commercial and open-source digital preservation tools and systems.

The PREMIS Editorial Committee coordinates revisions and implementation of the standard, which consists of the Data Dictionary, an XML schema, and supporting documentation. Data Dictionaries & Schemas Maintenance Guidelines and Conformance Implementation and Tools Supporting Documentation The PREMIS maintenance activity is responsible for maintaining, supporting, and coordinating future revisions to the PREMIS data dictionary. And RLG The PREMIS 3.0 data dictionary was issued in June 2015. DCMI Home: Dublin Core® Metadata Initiative (DCMI) DCMI Metadata Terms. This document is an up-to-date, authoritative specification of all metadata terms maintained by the Dublin Core Metadata Initiative. Included are the fifteen terms of the Dublin Core Metadata Element Set, which have also been published as IETF RFC 5013 [RFC5013], ANSI/NISO Standard Z39.85-2007 [NISOZ3985], and ISO Standard 15836:2009 [ISO15836].

Each term is specified with the following minimal set of attributes: Where applicable, the following attributes provide additional information about a term: This release of DCMI Metadata Terms reflects changes described more fully in the document "Maintenance changes to DCMI Metadata Terms" [REVISIONS]. Further information about change history is documented in "DCMI Metadata Terms: A complete historical record" [HISTORY]. References. Publishes Themed Issue of Information Standards Quarterly on Linked Data for Libraries, Archives, and Museums. For release: 10 Sep 2012 Baltimore, MD - September 10, 2012 - The National Information Standards Organization (NISO) announces the publication of a special themed issue of the Information Standards Quarterly (ISQ) magazine on Linked Data for Libraries, Archives, and Museums.

ISQ Guest Content Editor, Corey Harper, Metadata Services Librarian, New York University has pulled together a broad range of perspectives on what is happening today with linked data in cultural institutions. He states in his introductory letter, "As the Linked Data Web continues to expand, significant challenges remain around integrating such diverse data sources. As the variance of the data becomes increasingly clear, there is an emerging need for an infrastructure to manage the diverse vocabularies used throughout the Web-wide network of distributed metadata. Four "in practice" articles illustrate the growth in the implementation of linked data in the cultural sector. For More Information, Contact: Video: How dirty is the cloud? | Need to Know. You’ve heard about the Foxconn factory in China where your iPad is assembled. But have you ever considered the energy required to store your emails, photos, and videos in the cloud? As worldwide demand for data storage skyrockets, so do the power needs of the servers where all our digital archives live.

While some companies (like Facebook) have made great progress in ditching dirty fossil-fuel energy for cleaner renewables, a few internet giants lag far behind. Climate Desk visited Maiden, N.C., for a close-up view of what will soon be one of the world’s biggest data centers — owned by Apple and powered by the coal-heavy power behemoth Duke Energy. Apple’s new Maiden, N.C., data center is only one of many coal-fueled server farms across the country. See a full-screen version here. The figures in the map are for individual data centers. 2. 3. 4. 5. 6. 7. 8. 9. 10. 12. Additional reporting and production was provided by Alyssa Battistoni, Azeen Ghorayshi and Tasneem Raja. The Evolution of the Web.

Digital Archives

Google Cultural Institute could be part of a national digital library. Imagine a public library that focused on providing access to digital versions of books, photographs, historical records, audio recordings and videos to anyone with access to the Internet. The library’s mission would be to efficiently store a huge amount of digital content and make it easily searchable and accessible. There is an organization trying to create such a library known as the Digital Public Library of America (DPLA). The idea is that universities, museums, libraries and anyone else with historic or cultural data would have a repository for storing their digitized content and sharing it with the world. Google seems to think such a library is a great idea too. That’s why they launched the Google Cultural Institute in 2010. A quick visit to the Cultural Institute’s website reveals an immersive experience. Yesterday, Google announced that they expanded their cultural database by adding 42 new exhibitions.

Google Books Library Project. The Google Books Library Project is an effort by Google to scan and make searchable the collections of several major research libraries.[1] The project, along with Google's Partner Program, comprise Google Books (formerly Google Book Search). Along with bibliographic information, snippets of text from a book are often viewable. If a book is out of copyright and in the public domain, the book is fully available to read or to download.[2] In March 2011 a New York federal judge rejected a $125 million legal settlement that Google had worked out with the authors and publishers over the copyright issues.[3] On November 14, 2013, the same Judge issued a ruling saying that Google's use of the works was a "fair use" under copyright law.[4] The authors said they would appeal.[5] Participants[edit] The Google Books Library Project continues to evolve;[6] however, only some of the institutional partners are listed on the web page currently maintained by Google:[7] Initial Project Partners[edit]

NLNZ_RosettaCaseStudy. National library collections case study. Internet Archaeology. MIME. Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email to support: Text in character sets other than ASCIINon-text attachmentsMessage bodies with multiple partsHeader information in non-ASCII character sets Although MIME was designed mainly for SMTP protocol, its use today has grown beyond describing the content of email and now often includes descriptions of content type in general, including for the web (see Internet media type) and as a storage for rich content in some commercial products (e.g., IBM Lotus Domino and IBM Lotus Quickr).

Virtually all human-written Internet email and a fairly large proportion of automated email is transmitted via SMTP in MIME format. Internet email is so closely associated with the SMTP and MIME standards that it is sometimes called SMTP/MIME email.[1] MIME is specified in six linked RFC memoranda: RFC 2045, RFC 2046, RFC 2047, RFC 4288, RFC 4289 and RFC 2049, which together define the specifications. [edit] Internet Archive: DAT File Format Reference. Internet Archive: CDX File Format Reference. A CDX file consists of individual lines of text, each of which summarizes a single web document.

The first line in the file is a legend for interpreting the data, and the following lines contain the data for referencing the corresponding pages within the host. The first character of the file is the field delimiter used in the rest of the file. This is followed by the literal "CDX" and then individual field markers as defined below. The following is a sample from a CDX file: CDX A b e a m s c k r V v D d g M n 0-0-0checkmate.com/Bugs/Bug_Investigators.html 20010424210551 209.52.183.152 0-0-0checkmate.com:80/Bugs/Bug_Investigators.html text/html 200 58670fbe7432c5bed6f3dcd7ea32b221 a725a64ad6bb7112c55ed26c9e4cef63 - 17130110 59129865 1927657 6501523 DE_crawl6.20010424210458 - 5750 CDX Data Specifications.

Internet Archive: ARC File Format Reference. Authors: Mike Burner and Brewster Kahle Date: September 15, 1996, Version 1.0 Internet Archive Overview The Archive stores the data it collects in large (currently 100MB) aggregate files for ease of storage in a conventional file system. It is the Archive's experience that it is difficult to manage hundreds of millions of small files in most existing file systems. This document describes the format of the aggregate files. The file format was designed to meet several requirements: The file must be self-contained: it must permit the aggregated objects to be identified and unpacked without the use of a companion index file. The format must be extensible to accommodate files retrieved via a variety of network protocols, including http, ftp, news, gopher, and mail. The file must be "stream able": it must be possible to concatenate multiple archive files in a data stream.

The Archive File Format The description below uses pseudo-BNF to describe the archive file format. The Version Block The URL Record. 13. Internet Archive ARC files. By default, heritrix writes all its crawled to disk using ARCWriterProcessor. This processor writes the found crawl content as Internet Archive ARC files. The ARC file format is described here: Arc File Format. Heritrix writes version 1 ARC files. By default, Heritrix writes compressed version 1 ARC files. The compression is done with gzip, but rather compress the ARC as a whole, instead, each ARC Record is in turn gzipped.

All gzipped records are concatenated together to make up a file of multiple gzipped members. Pre-release of Heritrix 1.0, an amendment was made to the ARC file version 1 format to allow writing of extra metadata into first record of an ARC file. If the extra XML metadata info is present, the second '<reserved>' field of the second line of version 1 ARC files will be changed from '0' to '1': i.e. If present, the ARC file metadata record body will contain at least the following fields (Later revisions to the ARC may add other fields): ... where.

ARC (file format) The .arc file extension is often used for several file archive-like file types. For example, the Internet Archive uses its own ARC format to store multiple web resources into a single file.[1][2] The FreeArc archiver also uses .arc extension, but uses a completely different file format. Nintendo uses an unrelated 'ARC' format for resources, such as MIDI and voice samples, in GameCube and Wii games. Several unofficial extractors exist for this type of ARC file.

The source code for ARC was released by SEA in 1986 and subsequently ported to Unix and the Atari ST in 1987 by Howard Chu. This more portable code base was subsequently ported to other platforms including VAX/VMS and IBM System/370 mainframes. Later, Phil Katz developed his own shareware utilities, PKARC and PKXARC, to create archive files and extract their contents. Following the System Enhancement Associates, Inc. vs PKWARE Inc. and Phillip W.

The SEA vs. List of archive formats. Alexa - The Web Information Company. Internet Archive Forums: Digital Lending Library. Checking out digital versions of books that are automatically returned after two weeks is as easy as logging onto the Internet Archive’s Open Library site, announced digital librarian and Internet Archive founder Brewster Kahle. By integrating this new service, more than seventy thousand current books – best sellers and popular titles – are borrowable by patrons of libraries that subscribe to Overdrive.com's Digital Library Reserve. Additionally, many other books that are not commercially available but are still of interest to library patrons, are available to be borrowed from participating libraries using the same digital technology. According to Kahle, "Digital technologies promise increased access to both old and new books.

The Internet Archive, through its OpenLibrary.org site, is thrilled to be adding the capacity to lend newer books over the internet, in addition to continuing to provide the public with all access, free downloadable older materials.” Jeffrey R. Internet Archive: About IA. The Internet Archive is a 501(c)(3) non-profit that was founded to build an Internet library. Its purposes include offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format. Founded in 1996 and located in San Francisco, the Archive has been receiving data donations from Alexa Internet and others. In late 1999, the organization started to grow to include more well-rounded collections. Now the Internet Archive includes: texts, audio, moving images, and software as well as archived web pages in our collections, and provides specialized services for adaptive reading and information access for the blind and other persons with disabilities.

Why the Archive is Building an 'Internet Library' Libraries exist to preserve society's cultural artifacts and to provide access to them. Many early movies were recycled to recover the silver in the film. Find out Related Projects and Research. Internet Archive Frequently Asked Questions. Why Preserve Books? The New Physical Archive of the Internet Archive. National Archives and Records Administration.

Wayback

Home. Pandora Archive - Summary of Progess. Internet Archive Ethics. Brewster Kahle's Internet Archive. WARC, Web ARChive file format. [0911.1112] Memento: Time Travel for the Web. Memento: Time Travel for the Web Webcast. Tools for a Preservation-Ready Web - Partners. Web Archiving. Memento: Adding Time to the Web. List of Web archiving initiatives. Using Wayback Machine for Research. 2012-10-10: Zombies in the Archives. Google Image Result for.