Scalable Content-Based Analysis of Images in Web Archives with TensorFlow and the Archives Unleashed Toolkit
MetadataShow full item record
We demonstrate the integration of the Archives Unleashed Toolkit, a scalable platform for exploring web archives, with Google's TensorFlow deep learning toolkit to provide scholars with content-based image analysis capabilities. By applying pretrained deep neural networks for object detection, we are able to extract images of common objects from a 4TB web archive of GeoCities, which we then compile into browsable collages. This case study illustrates the types of interesting analyses enabled by combining big data and deep learning capabilities.