Extending Bicluster Analysis to Classify Unannotated ORFs using Expression Data
Microarrays have the capacity to measure the expressions of thousands of genes in parallel over many experimental amples. The unsupervised technique of bicluster analysis has previously been used to uncover gene expression correlations over subsets of samples with the aim of modelling the natural gene functional classes. However the bicluster model also has the potential to shed light on the functions of unannotated open reading frames (ORFs). This aspect of biclustering has been under-explored. In we have developed aan ORF annotation approach, referred to as BALBOA, in which classifiers are constructed from the class specific expression patterns discovered by bicluster analysis. Firstly a set of bicluster classifiers is build using a labelled training set from the annotated ORFs in the expression data. These biclusters are then used to classify an unlabelled ORF set i.e. the unannotated ORFs from the expression dataset.
An implementation of the BALBOA classifcation algorithm for Java 1.5 and above is available here:
The contents of the archive balboa.zip are as follows:
- balboa.jar: Implementation of BALBOA algorithm
- runbalboa.sh: Shell script for running BALBOA on UNIX systems
- labelledSet.txt: labelled ORF set (sample yeast dataset)
- unlabelledSet.txt: unlabelled ORF set (sample yeast dataset)
We make use of the ARG4J library for command line parsing.
After unzipping the balboa.zip archive, the application can be run from the terminal on UNIX systems as follows:
./runbalboa.sh [options...] labelledSet.txt unlabelledSet.txt
On Windows systems, from the Command Prompt, use the following:
java -cp balboa.jar:args4j.jar balboa.ConsoleApplication [options...] labelledSet.txt unlabelledSet.txt
A full set of command lines options and their descriptions are provided here.