Dynamic Spectral Co-Clustering

This page contains supplementary materials for the paper "A Spectral Co-Clustering Approach for Dynamic Data". D. Greene, P. Cunningham (2011). 


Synthetic Datasets

We provide four synthetic datasets that have been generated for evaluating dynamic co-clustering. Each dataset contains 10 time step matrices, containing 1,000 objects and 1,500 features, assigned to 8 embedded clusters of objects and features.

>> Download Synthetic Collection (64MB) 


Text Datasets

We provide two text datasets for evaluating dynamic co-clustering. Note that both datasets are made available in pre-processed sparse format, rather than in their original raw form.

The first dataset, the RCV1-5topic dataset, is constructed from a subset of the widely-used Reuters RCV1 corpus. The 10,116 news articles cover a 7 month period. Each article is annotated with a single ground truth topical label: health, religion, science, sport, weather. These topics are present across the entire time period of the corpus. The data provided here is divided into 28 weekly time step graphs.

>> Download RCV1-5topic Weekly Collection (4MB)  

The second dataset consists of a set of bookmarks from the Del.icio.us web portal, originally collected Gorlitz et al. The subset here covers the 2,000 top tags and 5,000 top sites across an eleven month period from January to November 2006. This data is divided into 44 weekly time step graphs. Note that ground-truth annotations are not provided for this dataset.

>> Download Delicious Top 5000 Bookmark Collection (14MB) 


Data Formats

All datasets are provided in sparse matrix form. The data has already been pre-processed and divided into time steps. Each time step graph is stored using the following files: