YorkSpace has migrated to a new version of its software. Access our Help Resources to learn how to use the refreshed site. Contact diginit@yorku.ca if you have any questions about the migration.
 

An exploratory look at 3,039,804 #elxn42 tweets

Loading...
Thumbnail Image

Date

2016-04-14

Authors

Ruest, Nick
Milligan, Ian

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This presentation examines the tools, approaches, collaboration, and findings of the Web Archives for Historical Research Group around the capture and analysis of Twitter for the 2015 Canadian Federal Election.

While Twitter is not a representative sample of broader society - Pew Research notes that it skews young, college-educated, and affluent (above $50,000 household income) – Twitter still represents an exponential increase in the amount of information generated, retained, and preserved from non-elite people. Therefore, when historians study the 2015 federal election, Twitter will be a prime source.

On August 3, 2015, the team initiated both a search API and stream API collection with twarc using the hashtag #elxn42. Data collection ceased on November 5, 2015, the day after Justin Trudeau was sworn in as the 42nd Prime Minister of Canada. We collected for a total of 102 days, 13 hors and 50 minutes.

To analyze the data set, we took advantage of a number of utilities that are available within twarc and twarc-report, as well as jq, Mathematica, and Apache Spark Notebook. In accordance with the Twitter ToS, we also hosted the tweet ids in an institutional repository.

Our analytics included:

  • breaking tweet text down by day to track change over time;
  • client analysis, allowing us to see how the scale of mobile devices affected medium interactions;
  • URL analysis, comparing both to Archive-It collections and the Wayback Availability API to add to our understanding of crawl completeness;
  • and image analysis, using an archive of extracted images.

Our presentation introduces our collecting work, the analysis we have done, and provides a framework for other collecting institutions to do similar work with our off-the-shelf open-source tools. We hope that national libraries and other institutions will find our model useful as they consider how to archive ongoing events using Twtiter.

Description

Keywords

social media, web archives, text analysis, iipc, #elxn42, twitter, json

Citation