Web Archives Analysis at Scale with the Archives Unleashed Cloud

Date

2019-04-08

Authors

Ruest, Nick
Milligan, Ian

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Web archives, repositories of born-digital information dating back to the Internet Archive and national libraries in the mid-1990s, are fantastic resources of information covering topics of interest to humanities and social sciences scholars. Imagine a political historian studying elections, a historian studying youth culture in the late 1990s, or a scholar of the military or policy exploring how wars were reflected online. Yet while we have been collecting this information for over two decades, access has lagged: most scholars are limited to working with web archives one page at a time through portals such as the Wayback Machine. With the rise of the digital humanities, the computational social sciences, and web science more generally, scholars increasingly have the ability and desire to work with data at scale. In this presentation, we introduce the Archives Unleashed Cloud, currently supported through a grant from The Andrew W. Mellon Foundation. This service facilitates the (a) transfer of web archival data to the Cloud; (b) its analysis and transformation into standard scholarly derivatives; and (c) the building of a community around it via in-person events and learning guides. Our presentation begins by introducing the Cloud and discussing its motivation, discussing its technical underpinnings, and then exploring our current sustainability plan to keep the Archives Unleashed Cloud running after our foundation funding ends in 2020.

Description

CNI 2019 Spring Membership Meeting

Keywords

web archives, web archive analysis, sustainability, cloud computing

Citation