Analyzing the Effect of Lexical and Conceptual Information in Spam-mail Filtering System

Kang Sin-Jae;Kim Jong-Wan;

doi:10.5391/IJFIS.2006.6.2.105

International Journal of Fuzzy Logic and Intelligent Systems

Volume 6 Issue 2
/
Pages.105-109
/
2006
/
1598-2645(pISSN)
/
2093-744X(eISSN)

Korean Institute of Intelligent Systems (한국지능시스템학회)

DOI QR Code

Analyzing the Effect of Lexical and Conceptual Information in Spam-mail Filtering System

Kang Sin-Jae (School of Computer and Information Technology, Daegu University) ;
Kim Jong-Wan (School of Computer and Information Technology, Daegu University)

Published : 2006.06.01

https://doi.org/10.5391/IJFIS.2006.6.2.105 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we constructed a two-phase spam-mail filtering system based on the lexical and conceptual information. There are two kinds of information that can distinguish the spam mail from the ham (non-spam) mail. The definite information is the mail sender's information, URL, a certain spam keyword list, and the less definite information is the word list and concept codes extracted from the mail body. We first classified the spam mail by using the definite information, and then used the less definite information. We used the lexical information and concept codes contained in the email body for SVM learning in the 2nd phase. According to our results the ham misclassification rate was reduced if more lexical information was used as features, and the spam misclassification rate was reduced when the concept codes were included in features as well.

Keywords

References

L. F. Cranor, and B. A. LaMacchia, 'Spam!,' Communications of ACM, vol.41, no.8, pp. 74-83, 1998
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, 'A bayesian approach to filtering junk e-mail,' In AAAI-98 Workshop on Learning for Text Categorization, pp. 55-62, 1998
V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995
H. Drucker, D. Wu, and V. Vapnik, 'Support Vector Machines for Spam Categorization,' IEEE Trans. on Neural Networks, vol.10, no.5, pp. 1048-1054, 1999 https://doi.org/10.1109/72.788645
T. Joachims, 'Text Categorization with Support Vector Machines: Learning with Many Relevant Features,' ECML, Claire Nedellec and Celine Rouveirol (ed.), 1998
J. Yang, V. Chalasani, and S. Park, 'Intelligent categorization based on textual information metadata,' IEICE Transactions on information System, vol.E86-D, no.7, pp. 1280-1288, 2003
Kim, J. W., Kim, H. J., Kang, S. J., and Kim, B. M., 'Determination of Usenet News Groups by Fuzzy Inference and Kohonen Network,' Lecture Notes in Artificial Intelligence, vol.3157, Springer-Verlag, pp. 654-663, 2004
S. Ohno, and M. Hamanishi, New Synonyms Dictionary, Kadokawa Shoten, Tokyo, 1981
C. J. Park, J. H. Lee, G. B. Lee, and K. Kakechi, 'Collocation-Based Transfer Method in Japanese-Korean Machine Translation,' Transaction of information Processing Society of Japan, vol.38, no.4, pp. 707-718, 1997
K. H. Moon, and J. H. Lee, 'Representation and Recognition Method for Multi-Word Translation Units in Korean-to-Japanese MT System,' In the 18th International Conference on Computational Linguistics (COLING 2000), Germany, pp. 544-550, 2000
H. F. Li, N. W. Heo, K. H. Moon, J. H. Lee, and G. B. Lee, 'Lexical Transfer Ambiguity Resolution Using Automatically-Extracted Concept Co-occurrence Information,' International Journal of Computer Processing of Oriental Languages, World Scientific Pub., vol.13, no. 1 , pp. 53-68, 2000 https://doi.org/10.1016/S0219-4279(00)00005-3
I. H. Witten, and E. Frank, Data Mining: Practical machine learning tools and Techniques with java implementations, Morgan Kaufmann, 2000
Gordon V. Cormack, Overview of the TREC 2005 Spam Track, http://plg.uwaterloo.ca/~gvconnac/ trecspamtrack05, 2005
P. J. Resnick, D. L. Hansen, and C. R. Richardson, 'Calculating Error Rates for Filtering Software,' Communications of ACM, vol.47, no.9, pp. 67-71, 2004 https://doi.org/10.1145/1015864.1015865

International Journal of Fuzzy Logic and Intelligent Systems

Analyzing the Effect of Lexical and Conceptual Information in Spam-mail Filtering System

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)