DOI QR코드

DOI QR Code

Implementation and Experimental Results of Neural Network and Genetic Algorithm based Spam Filtering Technique

신경망과 운전자 알고리즘을 이용한 스팸 메일 필터링 기법에 구현과 성능평가

  • 김범배 (성균관대학교 정보통신공학부 컴퓨터공학과) ;
  • 최형기 (성균관대학교 정보통신공학부 컴퓨터공학과)
  • Published : 2006.04.01

Abstract

As the volume of spam has increased to extreme levels, many anti-spam filtering techniques have been proposed. Among these techniques, the machine-Loaming filtering technique is one of the most popular filtering techniques. In this paper, we propose a machine-learning spam filtering technique based on the neural network, the genetic algorithm and the $X^2$-statistic. This proposed filtering technique is designed to overcome the problems in existing filtering techniques, and to achieve high spam filtering accuracy. It is able to classify spam and legitimate emil with 95.25 percent and 95.31 percent accuracy. This accuracy of the sum filtering is 7.75 percent and the 12.44 percent higher than rule-based filtering and the Bayesian filtering technique, respectively.

스팸 메일의 양의 급증함에 따라, 다양한 스팸 메일 필터링 기법이 제시되고 있다. 이런 필터링 기법 가운데, 학습 기반 필터링 기법은 현재 가장 보편화된 필터링 기법 가운데 하나이다. 본고에서는 신경망과, 유전자알고리즘, 카이제곱통계를 이용한 학습 기반 필터링 기법을 제시한다. 제안된 필터링 기법은 기존 필터링 기법의 문제를 해결하고, 스팸 메일 필터링에 높은 정확도를 제공할 수 있다 제안된 필터링 기법은 스팸메일 필터링 정확도와 정상 메일 필터링 정확도에서 각각 95.25%와 95.31%의 높은 정확도를 보인다. 이런 실험 결과는 기존의 규칙 기반 필터링 기법과 베이지안 필터링 기법에 비해 각각 7%, 12% 이상 높은 수치이다.

Keywords

References

  1. TopTenReviews, 'Spam Statistics 2006', available at http://spam-filter-review.toptenreviews.com/spam-statistics.html
  2. Graham Paul, 'A Plan For Spam', available at http:// www.paulgraham.com/spam.html, 2002
  3. James Clark, Irena Koprinska and Josiah Poon, 'E-mail classification: A hybrid approach combining genetic algorithms with neural networks'
  4. Pobox, SPF, 'How it works', available at http://spf.pobox.com/howworks.html
  5. Microsoft SenderID, 'Sender ID Framework Overview', available at http://www.microsoft.com/mscorp/safety/technologies/senderid/overview.mspx
  6. Yahoo! DomainKeys, 'DomainKeys : Proving and Protecting Email Sender Identity', available at http://antispam.yahoo.com/domainkey
  7. Jim Fenton, 'Identified Internet Mail', Cisco System, 2004 available at https://antiphishing.kavi.com/events/Conference_ Notes/Jim_Fenton_on_Cisco_Internet_Identified_Mail.pd
  8. SpamAssassin, 'The Apache SpamAssassin Project', available at http://spamassassin.apache.org/
  9. William S. Yerazunis, Shalendra Chhabra, Christian Siefkes, Fidelis Assis and Dimitrios Gunopulos, 'A Unified Model Of Spam Filtration', 2005 MIT Spam Conference, Jan., 2005
  10. Darrell Whitley, 'A Genetic Algorithm Tutorial', Statistic and Computing, Vol.4, 1994, pp.65-85 https://doi.org/10.1007/BF00175354
  11. T.A. Andrea and Hooshmand Kalayeh, 'Application of Neural Networks in Quantitative Structure-Activity Relationships of Dihydrofolate Reductase Inhitors', J. Med. Chem. 34, pp.2824-2836, 1991 https://doi.org/10.1021/jm00113a022
  12. Internet Contents Filtering Group, 'Ling-Spam', available at http://www.iit.demokritos.gr/skel/i-config
  13. William S. Yerazunis, 'The Spam-Filtering Accuracy Plateau at 99.9% Accuracy and How to Get Past It', 2004 MIT Spam Conference, Jan., 2004
  14. Sam Holden, 'Spam Filters', available at http://freshmeat.net/articles/view/964, Aug., 2003

Cited by

  1. Spam Message Filtering for Internet Communities using Collection and Frequency Analysis vol.18C, pp.2, 2011, https://doi.org/10.3745/KIPSTC.2011.18C.2.061