Context-based classification for harmful web documents and comparison of feature selecting algorithms

Kim, Young-Soo;Park, Nam-Je;Hong, Do-Won;Won, Dong-Ho;

Journal of Korea Multimedia Society (한국멀티미디어학회논문지)

Volume 12 Issue 6
/
Pages.867-875
/
2009
/
1229-7771(pISSN)
/
2384-0102(eISSN)

Korea Multimedia Society (한국멀티미디어학회)

Context-based classification for harmful web documents and comparison of feature selecting algorithms

Kim, Young-Soo (Cryptography Research Team, Electronics and Telecommunications Research Institue) ;
Park, Nam-Je (RFID/USN Security Research Team, Electronics and Telecommunications Research Institue) ;
Hong, Do-Won (Cryptography Research Team, Electronics and Telecommunications Research Institue) ;
Won, Dong-Ho (School of Information and Communication Engineering, Sungkyunkwan University)

Published : 2009.06.30

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

More and richer information sources and services are available on the web everyday. However, harmful information, such as adult content, is not appropriate for all users, notably children. Since internet is a worldwide open network, it has a limit to regulate users providing harmful contents through each countrie's national laws or systems. Additionally it is not a desirable way of developing a certain system-specific classification technology for harmful contents, because internet users can contact with them in diverse ways, for example, porn sites, harmful spams, or peer-to-peer networks, etc. Therefore, it is being emphasized to research and develop context-based core technologies for classifying harmful contents. In this paper, we propose an efficient text filter for blocking harmful texts of web documents using context-based technologies and examine which algorithms for feature selection, the process that select content terms, as features, can be useful for text categorization in all content term occurs in documents, are suitable for classifying harmful contents through implementation and experiment.

Keywords

References

F.Sebastiani, "Machine Learning in Automated Text Categorization," ACM Computing Surveys, Vol.43, No.1, pp. 1-47, 2002.
C.Burges, "A Tutorial on Support Vector Machines for Pattern Recognition," Data Mining and Knowledge Discovery, Vol.2, pp. 121-167, 1998. https://doi.org/10.1023/A:1009715923555
W.Frakes and R.Baeza-Yates, Information Retrieval: Data Structures and Algorithms, Prentice Hall, 1992.
Thesaurus, Wikipedia: the Free Encyclopedia, http://en.wikipedia.org/wiki/Thesaurus,2008.
G.Siolas and F.d'Alche-Buc, "Support Vector Machines based on a Semantic Kernel for Text Categorization," Proceeding of IJCNN 2000, Vol.5, pp. 205-209, 2000.
Support Vector Machine, Wikipedia, the free Encyclopedia, http://en.wikipedia.org/wiki/SVM,2005.
Y.Yang and J.Pederson, "A Comparative Study on Feature Selection in text Categorization," Proceedings of the 14th International Conference on Machine Learning, pp.412-420, 1997.
S.Kang, Korean Morphological Analysis and Information Retrieval, Hongreung Science Press, 2002.

Journal of Korea Multimedia Society (한국멀티미디어학회논문지)

Context-based classification for harmful web documents and comparison of feature selecting algorithms

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)