Yang, Hsiu-WeiLiu, LinqingMilligan, IanRuest, NickLin, Jimmy2019-04-232019-04-232019Hsiu-Wei Yang, Linqing Liu, Ian Milligan, Nick Ruest, and Jimmy Lin. “Scalable Content-Based Analysis of Images in Web Archives with TensorFlow and the Archives Unleashed Toolkit.” Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, Vol. 19 (2019).Hsiu-Wei Yang, Linqing Liu, Ian Milligan, Nick Ruest, and Jimmy Lin. “Scalable Content-Based Analysis of Images in Web Archives with TensorFlow and the Archives Unleashed Toolkit.” Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, Vol. 19 (2019).978-1-7281-1547-4/19http://hdl.handle.net/10315/36161https://doi.org/10.1109/JCDL.2019.00107We demonstrate the integration of the Archives Unleashed Toolkit, a scalable platform for exploring web archives, with Google's TensorFlow deep learning toolkit to provide scholars with content-based image analysis capabilities. By applying pretrained deep neural networks for object detection, we are able to extract images of common objects from a 4TB web archive of GeoCities, which we then compile into browsable collages. This case study illustrates the types of interesting analyses enabled by combining big data and deep learning capabilities.enhttps://doi.org/10.1109/JCDL.2019.00107TensorFlowmachine learningimage analysisweb archivesApache SparkPySparkScalable Content-Based Analysis of Images in Web Archives with TensorFlow and the Archives Unleashed ToolkitArticle