한국정보기술응용학회:학술대회논문집 (Proceedings of the Korea Society of Information Technology Applications Conference)
- 한국정보기술응용학회 2005년도 6th 2005 International Conference on Computers, Communications and System
- /
- Pages.17-20
- /
- 2005
Comparing Feature Selection Methods in Spam Mail Filtering
- Kim, Jong-Wan (School of Computer and Information Technology, Daegu University) ;
- Kang, Sin-Jae (School of Computer and Information Technology, Daegu University)
- 발행 : 2005.11.25
초록
In this work, we compared several feature selection methods in the field of spam mail filtering. The proposed fuzzy inference method outperforms information gain and chi squared test methods as a feature selection method in terms of error rate. In the case of junk mails, since the mail body has little text information, it provides insufficient hints to distinguish spam mails from legitimate ones. To address this problem, we follow hyperlinks contained in the email body, fetch contents of a remote web page, and extract hints from both original email body and fetched web pages. A two-phase approach is applied to filter spam mails in which definite hint is used first, and then less definite textual information is used. In our experiment, the proposed two-phase method achieved an improvement of recall by 32.4% on the average over the