A Feature Selection Method Based on Fuzzy Cluster Analysis

Rhee, Hyun-Sook;

doi:10.3745/KIPSTB.2007.14-B.2.135

정보처리학회논문지B (The KIPS Transactions:PartB)

제14B권2호
/
Pages.135-140
/
2007
/
1598-284X(pISSN)

한국정보처리학회 (Korea Information Processing Society)

DOI QR Code

퍼지 클러스터 분석 기반 특징 선택 방법

A Feature Selection Method Based on Fuzzy Cluster Analysis

이현숙 (동양공업전문대학 전산정보학부)

Rhee, Hyun-Sook

발행 : 2007.04.30

https://doi.org/10.3745/KIPSTB.2007.14-B.2.135 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

특징선택은 문제 영역에서 관찰된 다차원데이터로부터 데이터가 묘사하는 구조를 잘 반영하는 속성을 선택하여 효과적인 실험 데이터를 구성하는 데이터 준비과정이다. 이 과정은 문서분류, 영상인식, 유전자 선택 분야에서의 같은 분류시스템의 성능향상에 중요한 구성요소로서 상관관계 기법, 차원축소 및 상호 정보 처리 등의 통계학이나 정보이론의 접근방법을 중심으로 연구되어왔다. 이와 같은 선택 분야의 연구는 다루는 데이터의 양이 방대해지고 복잡해지면서 더욱 중요시 되고 있다. 본 논문에서는 데이터가 가지는 특성을 반영하면서 새로운 데이터에 대하여 일반화 할 수 있는 특징선택 방법을 제안하고자 한다. 준비된 데이터의 각 속성 데이터에 대하여 퍼지 클러스터 분석에 의하여 최적의 클러스터 정보를 얻고 이를 바탕으로 근접성과 분리성의 경로를 측정하여 그 값에 따라 특징을 선택하는 매카니즘을 제공한다. 제안된 방법을 실세계의 컴퓨터 바이러스 분류에 적용하여 기존의 대비에 의한 휴리스틱 방법에 의해 선택된 데이터를 가지고 분류한 것과 비교하고자 한다. 이를 통하여 주어진 특징에 시연을 부여할 수 있고 효과적으로 특징을 선택하여 시스템의 성능을 향상 시킬 수 있음을 확인한다.

Feature selection is a preprocessing technique commonly used on high dimensional data. Feature selection studies how to select a subset or list of attributes that are used to construct models describing data. Feature selection methods attempt to explore data's intrinsic properties by employing statistics or information theory. The recent developments have involved approaches like correlation method, dimensionality reduction and mutual information technique. This feature selection have become the focus of much research in areas of applications with massive and complex data sets. In this paper, we provide a feature selection method considering data characteristics and generalization capability. It provides a computational approach for feature selection based on fuzzy cluster analysis of its attribute values and its performance measures. And we apply it to the system for classifying computer virus and compared with heuristic method using the contrast concept. Experimental result shows the proposed approach can give a feature ranking, select the features, and improve the system performance.

키워드

참고문헌

Gupta, M. M., Jin, L., and Homma, N., Static and Dynamic Neural Networks : From Fundamentals to Advanced Theory, Wiley-IEEE Press, April 2004
Chin-Teng Lin, Chang Mao Yeh, Shen-Fu Liang, Jen Feng Chung and Nimit Kumar, 'Support Vector Based Fuzzy Neural Network for Pattern Classification', IEEE Trans, on Fuzzy System, Vol. 14, No. 1, Feb, 2006 https://doi.org/10.1109/TFUZZ.2005.861604
Debrup Chakraborty and Nikhil R. Pal, 'Integrated Feature Analysis and Fuzzy Rule Based System Identification in a Neuro-Fuzzy Paradigm', IEEE Trans. on System, Man and Cybernetics, Vol. 31, No. 3, June 2001 https://doi.org/10.1109/3477.931526
Isabelle Guyon and Andre Elisseeff, 'An Introduction to Variable and Feature Selection', Journal of Machine Learning Research 3, 2003
Huan Liu, 'Evolving Feature Selection', IEEE Intelligent Systems and Their Applications Vol. 20, Issue 4 Nov. Dec. 2005 https://doi.org/10.1109/MIS.2005.105
Jianyong Dai, Joohan Lee and Morgan C. Wang, 'Detecting Unknown Computer Virus Using Data Mining Techniques', Business Intelligent Symposium. poster presentation, April, 2006
Jianyong Dai, Muazzam Siddiqui, Joohan Lee and Morgan C. Wang, 'Detecting Computer Viruses Mining Instruction Sequences', Submitted to IEEE Trans. on Dependable and Secure Computing, Jan, 2007
이현숙, '퍼지 성능 측정자를 이용한 적응 데이터 마이닝 모델', 정보처리학회 논문지, 제13 B권 5호, 2006 https://doi.org/10.3745/KIPSTB.2006.13B.5.541
이현숙, '점증적 학습 퍼지 신경망을 이용한 적응 분류 모델', 퍼지 및 지능시스템 학회 논문지, Vol. 16, No. 6, 2006 https://doi.org/10.5391/JKIIS.2006.16.6.736
J. O. Kephart, 'A Biologically Inspired Immune System for Computers', Proceedings of the 4th Workshop on Synthesis and Simulation of Living Systems, pp.130-139, 1994
Abou-Assaleh, Nick Cercone, Vlado Keselj, and Ray Sweidan, 'Detection of New Malicious Code Using N grams Signatures, Proceedings of the Second Annual Conference on Privacy, Security and Trust (PST'04), pp. 193-196, 2004
Kolter, J.Z., and Maloof, M. A., 'Learning to detect malicious executables in the wild', In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 470-478, New York, NY, 2004 https://doi.org/10.1145/1014052.1014105
I. Witten and E. Frank, 'Data mining: Practical machine leaning tools and techniques with java implementations', Morgan Kaufmann, San francisco, CA, 2000
VX Heaven : http://vx.netlux.org
http://www.datarescue.com

피인용 문헌

Fuzzy Cluster Based Diagnosis System for Digital Mammogram vol.16B, pp.2, 2009, https://doi.org/10.3745/KIPSTB.2009.16-B.2.165
An Intelligent Agent System using Multi-View Information Fusion vol.19, pp.12, 2014, https://doi.org/10.9708/jksci.2014.19.12.011

정보처리학회논문지B (The KIPS Transactions:PartB)

퍼지 클러스터 분석 기반 특징 선택 방법

A Feature Selection Method Based on Fuzzy Cluster Analysis

초록

키워드

참고문헌

피인용 문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)