DOI QR코드

DOI QR Code

Spam Message Filtering with Bayesian Approach for Internet Communities

베이지안을 이용한 인터넷 커뮤니티 상의 유해 메시지 차단 기법

  • 김범배 (성균관대학교 컴퓨터공학과) ;
  • 최형기 (성균관대학교 정보통신공학부)
  • Published : 2006.10.30

Abstract

Spam Message has been Causing widespread damages on the Internet. One source of the problems is rooted from an anonymously posted message in the bulletin board in Internet communities. This type of the Spam messages tries to advertise products, to harm other's reputation, to deliver religious messages and so on. In this paper we present the Spam message filtering using the Bayesian approach. In order to increase usefulness of the Spam filter in the bulletin board in Internet communities, we made the Spam filter which can divide the Spam message into six categories such as advertisement, pornography, abuse, religion and other. The test conducted against messages posted on the popular web sites.

스팸의 피해가 이메일 서비스를 넘어 인터넷 전반에 걸쳐 급증하는 현재 인터넷은 익명성을 악용하여 해당 커뮤니티의 공동 관심사와는 무관한 메시지들, 즉 상업적 광고, 상호비방, 종교 홍보 등의 스팸 메시지들을 게재하면서 심각한 사회적 문제를 일으키고 있다. 본고에서는 인터넷 커뮤니티 상의 스팸 메시지를 해결하고자 기존의 스팸 메일 차단에 이용되고 있는 베이지안 접근법을 적용한 인터넷 커뮤니티 상의 스팸 메시지 차단 방법을 소개한다. 나아가 인터넷 커뮤니티 상에서의 스팸 메시지 필터링의 효과를 증대시키기 위한 방편으로 스팸 메시지를 다양한 소분류로 세분화가 가능토록 구성했다 이는 인터넷 커뮤니티의 다양한 이용자의 요구를 충족시키기 위한 방안이다. 구현된 베이지안 필터링 기법은 현재 운영되고 있는 사이트들을 대상으로 정확도를 측정하였다.

Keywords

References

  1. TopTenReviews, 'Spam Statistics 2006,' available at http://spam-filter-review.toptenreviews.com/spam-statistics.html
  2. Paulson, L.D, 'Spam hits instant messaging,' IEEE Computer, IEEE Computer Society, Volume 37, Issue 4, April 2004 pp. 18 https://doi.org/10.1109/MC.2004.1297295
  3. The Radicati Group Inc., 'Email Sent and Received Growth Statistic, 2003-2005', Jul. 2003
  4. Graham Paul, 'A Plan For Spam,' available at http://www.paulgraham.com/spam.html, 2002
  5. Graham Paul, 'Better Bayesian Filtering,' available at http://paulgraham.com/better.html, Jan. 2003
  6. Trend Micro Inc., 'Nominations', available at http://www.mail-abuse.com/nominats.html
  7. SpamCop, 'SpamCop Blocking List,' available at http://www.spamcop.net/bl.shtml
  8. Spamhaus, 'The Spamhaus Block List,' available at http://www.spamhaus.org/sbl/index.lasso
  9. Pobox, SPF, 'How it works,' available at http://spf.pobox.com/howworks.html
  10. Microsoft SenderID, 'Sender ID Framework Overview,' available at http://www.microsoft.com/mscorp/safety/technologies/senderid/overview.mspx
  11. Yahoo! DomainKeys, 'Domainkeys: Proving and Protecting Email Sender Identity,' available at http://antispam.yahoo.com/domainkey
  12. Jim Fenton, 'Identified Internet Mail,' Cisco System, 2004 available at http://antiphishing.kavi.com/events/Conference_Notes/Jim_Fenton_on_Cisco_Internet_Identified_Mail.pdf
  13. SpamAssassin, 'The Apache SpamAssassin Project,' available at http://spamassassin.apache.org
  14. Thornsten Joachims, 'Text categorization with support vector machines: learning with many relevant features,' Proc. European Conference on Machine Learning, Springer-Verlag, pp.137-142, 1998
  15. Hongrak Lee and Andrew Y. Ng, 'Spam Deobfuscation using a Hidden Markov Model,' Second Conference on Email and Anti-Spam (CEAS2005), 2005, available at http://www.ceas.cc/papers-2005/166.pdf
  16. Ian Stuart, Sung-Hyuk Cha, Charles C. Tappert, 'A Neural Network Classifier for Junk E-Mail,' Proc. Document Analysis System VI, 6th International Workshop, Springer-Verlag, pp.442-450, 2004
  17. Sam Holden, 'Spam Filters,' Category Reviews, Aug. 2003, available at http://freshmeat.net/articles/view/964
  18. Roger Burton, 'Mail::SpamTest::Bayesian,' available at http://search.cpan.org/~firedrake/Mail-SpamTest-Bayesian-0.02/Bayesian.pm

Cited by

  1. Spam Message Filtering for Internet Communities using Collection and Frequency Analysis vol.18C, pp.2, 2011, https://doi.org/10.3745/KIPSTC.2011.18C.2.061