The Great WARC Adventure: WARCs from creation to use

Date

2014-06-26

Authors

Ruest, Nick
Milligan, Ian

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

We live in a reality where documents are born, revised and disseminated online. Every day, users record their thoughts, feelings, locations, ratings, votes, comments, reviews, jokes, and so forth; an assemblage of traces of the past that historians will be able to mold into historical narratives. Luckily, we have some established standards for preserving and disseminating web archives, and emerging processes for analysis.

This presentation will cover a historical overview of web archiving, how best to both capture and preserve websites, and make them discoverable and usable using open source tools that can be easily replicated by other organizations, the interplay of the archivist and historian with respect to web archives, and finally ways to access web archives beyond the Wayback Machine using open-source tools such as WARC Tools, Apache Solr, and Carrot2 Workbench. Two web archive examples are used for this practical hands-on component: a collection of websites concerning the Dale Askey legal case with Edwin Mellen Press (the #freedaleaskey collection) and a case study collection of archived websites from the .ca top-level domain (amounting to 4.7% in total).

Description

ACA 2014

Keywords

web archiving, warc, #freedaleaskey, visualizations, textual analysis, digital history, digital preservation

Citation