Now showing items 1-6 of 6
Building Community and Tools for Analyzing Web Archives through Datathons
Starting in March 2016, the Archives Unleashed team and our collaborators have brought together social scientists, humanists, archivists, librarians, computer scientists, and other stakeholders to explore web archives as ...
The Cost of a WARC: Analyzing Web Archives in the Cloud
The value of web archives to support scholarship in the humanities and social sciences is slowly being realized by the increasing availability of scalable tools and platforms. The cost of providing scholarly access is a ...
Scalable Content-Based Analysis of Images in Web Archives with TensorFlow and the Archives Unleashed Toolkit
We demonstrate the integration of the Archives Unleashed Toolkit, a scalable platform for exploring web archives, with Google's TensorFlow deep learning toolkit to provide scholars with content-based image analysis ...
The Archives Unleashed Notebook: Madlibs for Jumpstarting Scholarly Exploration
This paper introduces the Archives Unleashed Notebook, which is designed to work with derivative datasets from the Archives Unleashed Cloud, a platform for analyzing web archives. These datasets contain common starting ...
Warclight: A Rails Engine for Web Archive Discovery
This paper describes the development of Warclight, a portmanteau of the open-source Blacklight platform and the ISO-standard Web ARChive file format. Warclight allows users to explore web archives that have been indexed ...
Solr Integration in the Anserini Information Retrieval Toolkit
Anserini is an open-source information retrieval toolkit built around Lucene to facilitate replicable research. In this demonstration, we examine different architectures for Solr integration in order to address two current ...