Missing Value Imputation for Epistatic MAPs
This page contains supplementary material for the paper 'Missing Value Imputation for Epistatic MAPs'
DescriptionEpistatic miniarray profiles (E-MAPs) are a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data for a number of reasons - primarily the pairwise nature of the data and the significantly larger number of missing values. In this work we propose three local (Nearest neighbor-based) strategies for E-MAP datasets: symmetric KNN, wNN and LLS imputation.
A Python implementation of the KNN, wNN and LLS imputation procedures is provided below. The archive contains documentation and a sample dataset.