Extending Bicluster Analysis to Classify Unannotated ORFs using Expression Data

This page contains supplementary materials for the paper "A Spectral Co-Clustering Approach for Dynamic Data". D. Greene, P. Cunningham (2011). 


Microarrays have the capacity to measure the expressions of thousands of genes in parallel over many experimental amples. The unsupervised technique of bicluster analysis has previously been used to uncover gene expression correlations over subsets of samples with the aim of modelling the natural gene functional classes. However the bicluster model also has the potential to shed light on the functions of unannotated open reading frames (ORFs). This aspect of biclustering has been under-explored. In we have developed aan ORF annotation approach, referred to as BALBOA, in which classifiers are constructed from the class specific expression patterns discovered by bicluster analysis. Firstly a set of bicluster classifiers is build using a labelled training set from the annotated ORFs in the expression data. These biclusters are then used to classify an unlabelled ORF set i.e. the unannotated ORFs from the expression dataset. 


An implementation of the BALBOA classifcation algorithm for Java 1.5 and above is available here:

Download BALBOA

The contents of the archive balboa.zip are as follows: 

We make use of the ARG4J library for command line parsing. 

Running BALBOA

After unzipping the balboa.zip archive, the application can be run from the terminal on UNIX systems as follows:

./runbalboa.sh [options...] labelledSet.txt unlabelledSet.txt

On Windows systems, from the Command Prompt, use the following:

java -cp balboa.jar:args4j.jar balboa.ConsoleApplication [options...] labelledSet.txt unlabelledSet.txt

A full set of command lines options and their descriptions are provided here.


For further details regarding BALBOA, please contact This e-mail address is being protected from spambots. You need JavaScript enabled to view it