Proceedings of the Korean Society for Bioinformatics Conference (한국생물정보학회:학술대회논문집)
- 2004.11a
- /
- Pages.101-106
- /
- 2004
Protein subcellular localization classification from multiple subsets of amino acid pair compositions
- Tung, Thai Quang (Department of Computer Engineering, Kongju National University) ;
- Lim, Jong-Tae (Department of Computer Engineering, Kongju National University) ;
- Lee, Kwang-Hyung (Department of BioSystems, KAIST) ;
- Lee, Do-Heon (Department of BioSystems, KAIST)
- Published : 2004.11.04
Abstract
Subcellular localization is a key functional char acteristic of proteins. With the number of sequences entering databanks rapidly increasing, the importance of developing a powerful tool to identify protein subcellular location has become self-evident. In this paper, we introduce a novel method for predic ting protein subcellular locations from protein sequences. The main idea was motivated from the observation that amino acid pair composition data is redundant. By classifying from multiple feature subsets and using many kinds of amino acid pair composition s, we forced the classifiers to make uncorrelated errors. Therefore when we combined the predictors using a voting scheme, the prediction accuracy c ould be improved. Experiment was conducted on several data sets and significant improvement has been achieve d in a jackknife test.
Keywords