|Dynamic Spectral Co-Clustering|
We provide four synthetic datasets that have been generated for evaluating dynamic co-clustering. Each dataset contains 10 time step matrices, containing 1,000 objects and 1,500 features, assigned to 8 embedded clusters of objects and features.
Text DatasetsWe provide two text datasets for evaluating dynamic co-clustering. Note that both datasets are made available in pre-processed sparse format, rather than in their original raw form.
The first dataset, the RCV1-5topic dataset, is constructed from a subset of the widely-used Reuters RCV1 corpus. The 10,116 news articles cover a 7 month period. Each article is annotated with a single ground truth topical label: health, religion, science, sport, weather. These topics are present across the entire time period of the corpus. The data provided here is divided into 28 weekly time step graphs.
The second dataset consists of a set of
bookmarks from the Del.icio.us web portal, originally
collected Görlitz et al. The
subset here covers the 2,000 top tags and 5,000 top sites across an
eleven month period from January to November 2006. This data is divided
into 44 weekly time step graphs. Note that ground-truth annotations are not provided for this dataset.
Data FormatsAll datasets are provided in sparse matrix form. The data has
already been pre-processed and divided into time steps. Each time step
graph is stored using the following files: