Search
Now showing items 1-10 of 13
We Could, but Should We? Ethical Considerations for Providing Access to GeoCities and Other Historical Digital Collections
(ACM, 2020-03)
We live in an era in which the ways that we can make sense of our past are evolving as more artifacts from that past become digital. At the same time, the responsibilities of traditional gatekeepers who have negotiated the ...
Building Community at Distance: A Datathon during COVID-19
(Digital Library Perspectives, 2020-08-04)
This paper aims to use the experience of an in-person event that was forced to go virtual in the wake of COVID-19 as an entryway into a discussion on the broader implications around transitioning events online. It gives ...
Solr Integration in the Anserini Information Retrieval Toolkit
(2019)
Anserini is an open-source information retrieval toolkit built around Lucene to facilitate replicable research. In this demonstration, we examine different architectures for Solr integration in order to address two current ...
Content-Based Exploration of Archival Images Using Neural Networks
(ACM/IEEE, 2020-08)
We present DAIRE (Deep Archival Image Retrieval Engine), an image exploration tool based on latent representations derived from neural networks, which allows scholars to "query" using an image of interest to rapidly find ...
The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives
(ACM/IEEE, 2020-08)
The Archives Unleashed project aims to improve scholarly access to web archives through a multi-pronged strategy involving tool creation, process modeling, and community building -- all proceeding concurrently in mutually ...
Building Community and Tools for Analyzing Web Archives through Datathons
(2019)
Starting in March 2016, the Archives Unleashed team and our collaborators have brought together social scientists, humanists, archivists, librarians, computer scientists, and other stakeholders to explore web archives as ...
The Cost of a WARC: Analyzing Web Archives in the Cloud
(2019)
The value of web archives to support scholarship in the humanities and social sciences is slowly being realized by the increasing availability of scalable tools and platforms. The cost of providing scholarly access is a ...
Scalable Content-Based Analysis of Images in Web Archives with TensorFlow and the Archives Unleashed Toolkit
(2019)
We demonstrate the integration of the Archives Unleashed Toolkit, a scalable platform for exploring web archives, with Google's TensorFlow deep learning toolkit to provide scholars with content-based image analysis ...
The Archives Unleashed Notebook: Madlibs for Jumpstarting Scholarly Exploration
(2019)
This paper introduces the Archives Unleashed Notebook, which is designed to work with derivative datasets from the Archives Unleashed Cloud, a platform for analyzing web archives. These datasets contain common starting ...
Warclight: A Rails Engine for Web Archive Discovery
(2019)
This paper describes the development of Warclight, a portmanteau of the open-source Blacklight platform and the ISO-standard Web ARChive file format. Warclight allows users to explore web archives that have been indexed ...