MovieLists Dataset

This page contains supplementary material for the paper:
D. Greene, P. Cunningham. (2013), "Producing a Unified Graph Representation from Multiple Social Network Views". ACM Web Science 2013. [Short PDF]  [Extended PDF] [BibTeX]


In many social networks, several different link relations will exist between the same set of users. Additionally, attribute or textual information will be associated with those users, such as user-generated content or demographic details. For many data analysis tasks, such as community finding and data visualisation, the provision of multiple heterogeneous types of user data makes the analysis process more complex. We have developed an unsupervised method for integrating multiple data views to produce a single unified graph representation, based on the combination of the k-nearest neighbour sets for users derived from each view. These views can be either relation-based or feature-based. The proposed method is evaluated on a number of annotated multi-view Twitter datasets listed below, where it is shown to support the discovery of the underlying community structure in the data. 


For evaluation purposes, we collected five topical Twitter datasets, based on curated user lists, for which a manually-curated ground truth set of communities is available. The datasets are as follows: 


We make the five datasets available for further non-commercial and research purposes only. They are provided in pre-processed matrix format only. To comply with the Twitter TOS, we do not include any raw tweets or other full text content. Users and user lists are referenced by their unique Twitter IDs, as opposed to full names or screen names.  

The datasets are provided in a single archive. Each dataset is contained within its own sub-directory, and 9 different "views" or criteria of each dataset are provided in sparse matrix representation. For a dataset <name>, the view files have the following prefixes: 

>> Download Multi-View Twitter Datasets (January 2013)