From January-August 2012, we collected data relating to the Twitter accounts of a list of 464 athletes and organisations that were involved in the London 2012 Summer Olympics. The selection was based on a subset of a user list previously curated by The Telegraph (UK). The accounts were manually assigned to 28 disjoint "ground truth" communities, corresponding to different Olympic sports.

In total, we collected ~726k tweets, ~5k user lists, and ~11k follower links within the set of 464 users. From the tweets, we extracted mention and retweet information. By combining these different "views" of the data using a rank aggregation method, we constructed a "unified" graph representation of the relations between the Twitter accounts, which preserves the most informative underlying associations between users in the original views. A detailed description of the methodology is provided in this paper.

Below is a visualization of the unified graph representation for the users in the data, produced using Gephi and sigma.js. Users are coloured according to their community (i.e. sport). The size of each node is proportional to its in-degree (i.e. number of incoming links).

By rolling over a node with the mouse, you can view the node's corresponding Twitter screen name and hide all nodes and edges, apart from the ones that are connected to the highlighted node. Left clicking on a node will open the user's Twitter page in a new window.


[Download GEXF File]   [Datasets]   [Paper]