Now showing items 1-8 of 8

    • The Archives Unleashed Notebook: Madlibs for Jumpstarting Scholarly Exploration 

      Deschamps, Ryan; Ruest, Nick; Lin, Jimmy; Fritz, Samantha; Milligan, Ian (2019)
      This paper introduces the Archives Unleashed Notebook, which is designed to work with derivative datasets from the Archives Unleashed Cloud, a platform for analyzing web archives. These datasets contain common starting ...
    • Building Community and Tools for Analyzing Web Archives through Datathons 

      Milligan, Ian; Casemajor, Nathalie; Fritz, Samantha; Lin, Jimmy; Ruest, Nick; Weber, Matthew S.; Worby, Nicholas (2019)
      Starting in March 2016, the Archives Unleashed team and our collaborators have brought together social scientists, humanists, archivists, librarians, computer scientists, and other stakeholders to explore web archives as ...
    • Content Selection and Curation for Web Archiving: The Gatekeepers vs. the Masses 

      Milligan, Ian; Ruest, Nick; Lin, Jimmy (2016)
      Any preservation effort must begin with an assessment of what content to preserve, and web archiving is no different. There have historically been two answers to the question "what should we archive?'' The Internet Archive's ...
    • The Cost of a WARC: Analyzing Web Archives in the Cloud 

      Deschamps, Ryan; Fritz, Samantha; Lin, Jimmy; Milligan, Ian; Ruest, Nick (2019)
      The value of web archives to support scholarship in the humanities and social sciences is slowly being realized by the increasing availability of scalable tools and platforms. The cost of providing scholarly access is a ...
    • Desiderata for Exploratory Search Interfaces to Web Archives in Support of Scholarly Activities 

      Jackson, Andrew; Lin, Jimmy; Milligan, Ian; Ruest, Nick (2016)
      Web archiving initiatives around the world capture ephemeral web content to preserve our collective digital memory. In this paper, we describe initial experiences in providing an exploratory search interface to web archives ...
    • Scalable Content-Based Analysis of Images in Web Archives with TensorFlow and the Archives Unleashed Toolkit 

      Yang, Hsiu-Wei; Liu, Linqing; Milligan, Ian; Ruest, Nick; Lin, Jimmy (2019)
      We demonstrate the integration of the Archives Unleashed Toolkit, a scalable platform for exploring web archives, with Google's TensorFlow deep learning toolkit to provide scholars with content-based image analysis ...
    • Solr Integration in the Anserini Information Retrieval Toolkit 

      Clancy, Ryan; Eskildsen, Toke; Ruest, Nick; Lin, Jimmy (2019)
      Anserini is an open-source information retrieval toolkit built around Lucene to facilitate replicable research. In this demonstration, we examine different architectures for Solr integration in order to address two current ...
    • Warclight: A Rails Engine for Web Archive Discovery 

      Ruest, Nick; Milligan, Ian; Lin, Jimmy (2019)
      This paper describes the development of Warclight, a portmanteau of the open-source Blacklight platform and the ISO-standard Web ARChive file format. Warclight allows users to explore web archives that have been indexed ...

      All items in the YorkSpace institutional repository are protected by copyright, with all rights reserved except where explicitly noted.