Arch-It!

Date

2022-06-24

Authors

Holzmann, Helge
Ruest, Nick
Bailey, Jefferson
Dempsey, Alex
Fritz, Samantha
Milligan, Ian
Willis, Kody

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Over the past quarter-century, web archive collection has emerged as a user-friendly process thanks to cloud-hosted solutions such as the Internet Archive’s Archive-It subscription service. Despite advancements in collecting web archive content, no equivalent has been found by way of a user-friendly cloud-hosted analysis system. Web archive processing and research require significant hardware resources and cumbersome tools that interdisciplinary researchers find difficult to work with. In this paper, we present ARCH (Archives Research Compute Hub)1, an interactive interface, closely connected with Archive-It, engineered to provide analytical actions, specifically generating datasets and in-browser visualizations. It efficiently streamlines research workflows while eliminating the burden of computing requirements. Building off past work by both the Internet Archive (Archive-It Research Services) and the Archives Unleashed Project (the Archives Unleashed Cloud), this merged platform achieves a scalable processing pipeline for web archive research.

Description

Keywords

web archives, Data analytics, Distributed systems, Information retrieval

Citation