PICA - Software for integrating and exploring multiple connected datasets

Parallel Integration Clustering Algorithm (PICA) is a new data integration approach for performing multi-view clustering in domains where two or more related datasets are available.

The algorithm was first described in the publication: Greene, D., Bryan, K. and Cunningham, P. (2008), "Parallel Integration of Heterogeneous Genome-Wide Data Sources", Proc. 8th International Conference on BioInformatics and BioEngineering (BIBE 2008).
[PDF] [BibTeX]


To visually explore the models produced by the PICA algorithm, we developed the PICA Browser tool, a cross-platform Java application for visually exploring a soft clustering produced by integrating data from multiple connected views. The application highlights the contributions of each view and the frequency of appearance of the clusters (i.e. reliability), with the aim of providing insight into the provenance of the cluster relationships in the model. The software is made freely available for research purposes, and makes use of the MTJ library.

PICA Browser

PICA - Social Network Analysis of CBR conference series data
>> Download CBR browser, data and results

PICA - Integration of diverse genome-wide biological data
>> Download Bio browser, data and results

PICA Browser
>> Generic browser only

PICA Implementation

Here we provide a Java-based implementation of the PICA framework. The software is made freely available for research purposes, and makes use of the args4j and MTJ libraries.Please consult the included file README.txt for usage instructions.
>> Download PICA binary

Sample data files from the CBR dataset are provided here:
>> Download CBR data

Related Links

An Analysis of Research Themes in the CBR Conference Literature

Yeast Literature Corpus