Protein subcellular localization classification from multiple subsets of amino acid pair compositions

  • Published : 2004.11.04

Abstract

Subcellular localization is a key functional char acteristic of proteins. With the number of sequences entering databanks rapidly increasing, the importance of developing a powerful tool to identify protein subcellular location has become self-evident. In this paper, we introduce a novel method for predic ting protein subcellular locations from protein sequences. The main idea was motivated from the observation that amino acid pair composition data is redundant. By classifying from multiple feature subsets and using many kinds of amino acid pair composition s, we forced the classifiers to make uncorrelated errors. Therefore when we combined the predictors using a voting scheme, the prediction accuracy c ould be improved. Experiment was conducted on several data sets and significant improvement has been achieve d in a jackknife test.

Keywords