Now showing items 1-20 of 46
Project Sustainability and Research Platforms: The Archives Unleashed Cloud Project
(IIPC Web Archiving Conference 2019, 2019-06-07)
The Archives Unleashed Project, founded in 2017 with funding from the Andrew W. Mellon Foundation, aims to make petabytes of historical internet content accessible to scholars and others interested in researching the recent ...
See a little Warclight: building an open-source web archive portal with project blacklight
(IIPC Web Archiving Conference 2019, 2019-06-06)
In 2014-15, due to close collaboration between UK-based researchers and the UK Web Archive, the open-source Shine project was launched. It allowed faceted search, trend diagram exploration, and other advanced methods of ...
Lowering the Barrier to Access: The Archives Unleashed Cloud Project
(The web that was: archives, traces, reflections RESAW 2019*, 2019-06-19)
The Archives Unleashed Project, aims to make petabytes of historical internet content accessible to scholars and others interested in researching the recent past. We respond to one of the major issues facing web archiving ...
Web Archives Analysis at Scale with the Archives Unleashed Cloud
Web archives, repositories of born-digital information dating back to the Internet Archive and national libraries in the mid-1990s, are fantastic resources of information covering topics of interest to humanities and social ...
Solr Integration in the Anserini Information Retrieval Toolkit
Anserini is an open-source information retrieval toolkit built around Lucene to facilitate replicable research. In this demonstration, we examine different architectures for Solr integration in order to address two current ...
Open Source Sustainability in Digital Curation/Preservation Software
Open source sustainability is hard. This talk will outline what the Islandora and Fedora communities have done to address sustainability in their projects, as well as touch in the critical need for sustainability around ...
The Great WARC Adventure: WARCs from creation to use
We live in a reality where documents are born, revised and disseminated online. Every day, users record their thoughts, feelings, locations, ratings, votes, comments, reviews, jokes, and so forth; an assemblage of traces ...
Capturing the Web Today for Tomorrow: Innovations in capturing and analyzing social media and websites for the new scholarly record
The growth of digital sources since the advent of the World Wide Web in 1991, and the commencement of widespread web archiving in 1996, presents profound new opportunities for social and cultural analysis. In simple terms, ...
Scalable Content-Based Analysis of Images in Web Archives with TensorFlow and the Archives Unleashed Toolkit
We demonstrate the integration of the Archives Unleashed Toolkit, a scalable platform for exploring web archives, with Google's TensorFlow deep learning toolkit to provide scholars with content-based image analysis ...
The Archives Unleashed Notebook: Madlibs for Jumpstarting Scholarly Exploration
This paper introduces the Archives Unleashed Notebook, which is designed to work with derivative datasets from the Archives Unleashed Cloud, a platform for analyzing web archives. These datasets contain common starting ...
Warclight: A Rails Engine for Web Archive Discovery
This paper describes the development of Warclight, a portmanteau of the open-source Blacklight platform and the ISO-standard Web ARChive file format. Warclight allows users to explore web archives that have been indexed ...
Building Community and Tools for Analyzing Web Archives through Datathons
Starting in March 2016, the Archives Unleashed team and our collaborators have brought together social scientists, humanists, archivists, librarians, computer scientists, and other stakeholders to explore web archives as ...
The Cost of a WARC: Analyzing Web Archives in the Cloud
The value of web archives to support scholarship in the humanities and social sciences is slowly being realized by the increasing availability of scalable tools and platforms. The cost of providing scholarly access is a ...
Sustainability of Community-owned Repository Software: A Call to Action
Sustainability of open-source software is a continual challenge in the relatively small world of cultural heritage institutions. The challenge is amplified due to the critical preservation implications tied to institutional ...
Active Digital Preservation and Data/Metadata Migration
Digital preservation activities increasingly focus on the movement of data and metadata between systems. This panel will present case studies in moving content through preservation activities with APTrust, the Digital ...
Digital Preservation Tools, Practices, and Policies in Islandora
There exists many standards and best practices in the digital preservation community, but not many of these practices are implemented as easy to use tools in our digital repository platforms. This presentation will focus ...
It’s dangerous to go alone! How about *we* do this!?
We’re all worried about preserving digital assets at some level. One of the most concerning parts of this process is the storage component, and as new and larger objects and collections come under libraries’ care, the ...
OCUL Digital Curation Summit: Digital Curation Life-cycle & Blue Ribbon Task Force on Sustainable Digital Preservation & Access
Presentation slides for Digital Curation Life-cycle and Blue Ribbon Task Force on Sustainable Digital Preservation & Access, and corresponding worksheets + scenarios.
The Islandora Web ARChive Solution Pack
We are now living in a reality where official records are born and disseminated via the Internet. Many institutions have a strategy in place for transferring official university records that are print or tactile to university ...
d3 Data Visualization Bootcamp
Brief introduction of data visualization concepts, brief introduction of d3, and a walkthrough of three exercises using library datasets.