A collection of novel and benchmark datasets produced by members of the Machine Learning Work and used in their experimental work:
A collection of Twitter datasets for evaluating multi-view analysis methods.
A collection of Twitter datasets for evaluating criteria for Twitter user list curation.
A dataset that was collected in order to permit the investigation of contemporary spam comment activity.
Supplementary data for an analysis of tourist behaviour based on the analysis of a collection of 95 million Flickr photos for which precise geographic coordinates (geo-tags) are known.
A new text sentiment analysis collection, produced from three Irish online news sources.
A multi-view text corpus, constructed from news articles from three online news services.
A set of synthetic text datasets for the evaluation of multi-view learning algorithms.
A new text corpus, mined from biomedical literature, which refers to the terms used to describe S. cerevisiae ORFs.
The network constructed from the publications of the CBR conference series (1993-2008).
Two text corpora consisting of news articles, particularly suited to evaluating cluster analysis techniques.
Image dataset for multi-label image classification using Active Learning with SVMs.
A large number of artificially constructed text datasets.
A dataset to train recommendation systems on Bronchiolitis treatment.