Insight Project Resources
This page hosts links to a variety of research resources that have been made available by researchers at various Science Foundation Ireland (SFI)-funded projects, which have been hosted in University College Dublin from 2006 onwards. These projects include the current Insight Centre for Data Analytics and previously the Clique SRC.
A collection of novel and benchmark datasets curated by UCD Researchers and used in their experimental work:
Directed network based on loans on the Prosper.com peer-to-peer lending site.
Text corpora for benchmarking stability analysis in topic modeling.
A collection of Twitter datasets for evaluating multi-view analysis methods.
A dataset of user-curated movie lists from IMDb.com.
A collection of Twitter datasets for evaluating criteria for Twitter user list curation.
A dataset that was collected in order to permit the investigation of contemporary spam comment activity.
Two text corpora consisting of news articles, particularly suited to evaluating cluster analysis techniques.
Supplementary data for an analysis of tourist behaviour based on the analysis of a collection of 95 million Flickr photos for which precise geographic coordinates (geo-tags) are known.
A new text sentiment analysis collection, produced from three Irish online news sources.
A multi-view text corpus, constructed from news articles from three online news services.
A set of synthetic text datasets for the evaluation of multi-view learning algorithms.
A new text corpus, mined from biomedical literature, which refers to the terms used to describe S. cerevisiae ORFs.
The network constructed from the publications of the CBR conference series (1993-2008).
A large number of artificially constructed text datasets.
The following software was developed by SFI-funded UCD researchers, and is made available for research purposes:
Software for finding and tracking communities on dynamic social networks.
Software for clustering and visualising protein interaction networks.
System for computing and visualizing dynamic clusters identified in large dynamic text datasets.
Software for integrating and exploring multiple connected datasets.
Tools for missing value imputation for Epistatic MAPs
Java-based Metabolomics analysis software
Software for the extension of bicluster analysis to allow for the functional classification of Open Reading Frames (ORFs).