Machine Learning + Network Analysis Resources

The Clique SRC was a Strategic Research Cluster funded by Science Foundation Ireland (SFI) at the School of Computer Science, University College Dublin during 2009-2013. The project focused on the development and application of new techniques for network analytics and machine learning. This page hosts links to a variety of research resources that have been made available by researchers from the project. For further details on these resources please contact Derek Greene. For more recent work in the area of machine learning and data science at UCD, please visit The Insight Research Centre for Data Analytics.


A collection of datasets curated by UCD Researchers which have been made available to the research community:

BBC Datasets

Two text corpora consisting of news articles, particularly suited to evaluating cluster analysis techniques.

Stability Topic Corpora

Text corpora for benchmarking stability analysis in topic modeling.

Multi-View Twitter Datasets

A collection of Twitter datasets for evaluating multi-view analysis methods.

News Curation Datasets

A collection of Twitter datasets for evaluating criteria for Twitter user list curation.

Youtube Dataset

A dataset that was collected in order to permit the investigation of contemporary spam comment activity.

Irish Economic Sentiment Collection

A new text sentiment analysis collection, produced from three Irish online news sources.

3Sources Collection

A multi-view text corpus, constructed from news articles from three online news services.

Synthetic Multi-view Datasets

A set of synthetic text datasets for the evaluation of multi-view learning algorithms.

Yeast Literature Dataset

A new text corpus, mined from biomedical literature, which refers to the terms used to describe S. cerevisiae ORFs.

CBR Conference Series Dataset

The network constructed from the publications of the CBR conference series (1993-2008).

20 Newsgroups Subsets

A large number of artificially constructed text datasets.


The following software is made available for research purposes:

Dynamic Community Finding

Software for finding and tracking communities on dynamic social networks.

Ensemble NMF

Software for clustering and visualising protein interaction networks.


System for computing and visualizing dynamic clusters identified in large dynamic text datasets.


Software for integrating and exploring multiple connected datasets.

EMAP Software

Tools for missing value imputation for Epistatic MAPs


Java-based Metabolomics analysis software


Software for the extension of bicluster analysis to allow for the functional classification of Open Reading Frames (ORFs).