A Feature Vector Selection Method for Cancer Classification

  • Yun, Zheng (Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University) ;
  • Keong, Kwoh-Chee (Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University)
  • 발행 : 2005.09.22

초록

The high-dimensionality and insufficiency of gene expression profiles and proteomic profiles makes feature selection become a critical step in efficiently building accurate models for cancer problems based on such data sets. In this paper, we use a method, called Discrete Function Learning algorithm, to find discriminatory feature vectors based on information theory. The target feature vectors contain all or most information (in terms of entropy) of the class attribute. Two data sets are selected to validate our approach, one leukemia subtype gene expression data set and one ovarian cancer proteomic data set. The experimental results show that the our method generalizes well when applied to these insufficient and high-dimensional data sets. Furthermore, the obtained classifiers are highly understandable and accurate.

키워드