Dynamic Spectral Co-Clustering
This page contains supplementary materials for the paper "A Spectral Co-Clustering Approach for Dynamic Data". D. Greene, P. Cunningham (2011).
Synthetic Datasets
We provide four synthetic datasets that have been generated for evaluating dynamic co-clustering. Each dataset contains 10 time step matrices, containing 1,000 objects and 1,500 features, assigned to 8 embedded clusters of objects and features.>> Download Synthetic Collection (64MB)
Text Datasets
We provide two text datasets for evaluating dynamic co-clustering. Note that both datasets are made available in pre-processed sparse format, rather than in their original raw form.
The first dataset, the RCV1-5topic dataset, is constructed from a subset of the widely-used Reuters RCV1 corpus. The 10,116 news articles cover a 7 month period. Each article is annotated with a single ground truth topical label: health, religion, science, sport, weather. These topics are present across the entire time period of the corpus. The data provided here is divided into 28 weekly time step graphs.
>> Download RCV1-5topic Weekly Collection (4MB)
The second dataset consists of a set of
bookmarks from the Del.icio.us web portal, originally
collected Gorlitz et al. The
subset here covers the 2,000 top tags and 5,000 top sites across an
eleven month period from January to November 2006. This data is divided
into 44 weekly time step graphs. Note that ground-truth annotations are not provided for this dataset.
>> Download Delicious Top 5000 Bookmark Collection (14MB)
Data Formats
All datasets are provided in sparse matrix form. The data has already been pre-processed and divided into time steps. Each time step graph is stored using the following files:- *.mtx: Sparse feature-object matrix in Matrix Market format.
- *.fids: List of features in present in each step, with each line corresponding to a row (feature) of the corresponding feature-object matrix.
- *.ids: List of object identifiers for each step, with each line corresponding to a column (object) of the corresponding feature-object matrix.
- *.clist: Where available, a list of grounth truth object clustering annotations for each step, with each line corresponding to a column (object) of the corresponding feature-object matrix.
- *.fclist: Where available, a list of grounth truth feature clustering annotations for each step, with each line corresponding to a row (feature) of the corresponding feature-object matrix.