DOI QR코드

DOI QR Code

A System for Automatic Classification of Traditional Culture Texts

전통문화 콘텐츠 표준체계를 활용한 자동 텍스트 분류 시스템

  • Hur, YunA (Department of Computer Science and Engineering, Korea University) ;
  • Lee, DongYub (Department of Computer Science and Engineering, Korea University) ;
  • Kim, Kuekyeng (Department of Computer Science and Engineering, Korea University) ;
  • Yu, Wonhee (Department of Computer Science and Engineering, Korea University) ;
  • Lim, HeuiSeok (Department of Computer Science and Engineering, Korea University)
  • 허윤아 (고려대학교 컴퓨터학과) ;
  • 이동엽 (고려대학교 컴퓨터학과) ;
  • 김규경 (고려대학교 컴퓨터학과) ;
  • 유원희 (고려대학교 컴퓨터학과) ;
  • 임희석 (고려대학교 컴퓨터학과)
  • Received : 2017.10.28
  • Accepted : 2017.12.20
  • Published : 2017.12.28

Abstract

The Internet have increased the number of digital web documents related to the history and traditions of Korean Culture. However, users who search for creators or materials related to traditional cultures are not able to get the information they want and the results are not enough. Document classification is required to access this effective information. In the past, document classification has been difficult to manually and manually classify documents, but it has recently been difficult to spend a lot of time and money. Therefore, this paper develops an automatic text classification model of traditional cultural contents based on the data of the Korean information culture field composed of systematic classifications of traditional cultural contents. This study applied TF-IDF model, Bag-of-Words model, and TF-IDF/Bag-of-Words combined model to extract word frequencies for 'Korea Traditional Culture' data. And we developed the automatic text classification model of traditional cultural contents using Support Vector Machine classification algorithm.

Keywords

Text Classification;Big Data;Supervised Learning;Machine Learning;Natural Language Processing

Acknowledgement

Grant : 2017. 전통문화 융복합 지원을 위한 지능형 검색 플랫폼 구축

Supported by : 한국콘텐츠진흥원

References

  1. J. U. Kim, H. J. Kim, S. G. Lee, "An Active Learning-based Method for Composing Train Document Set in Bayesian Text Classification Systems ," Journal of KISS : Software and Applications, Vol. 29, No 11-12, pp. 996-978, 2002.
  2. J. H. Park, J. S. Kim, "A Text Classification System for Hierarchical Categories," Korean Institute on Information Scientists Engineers, Vol. 27, No. 2, pp.128-130, 2000.
  3. J. H. Lee, S. H. Cheon, S. H. Kim, "Efficient Document Classification for Web Document Collection," Korean Institute on Information Scientists Engineers, Vol. 33, No. 2, pp. 397-401, 2006.
  4. K. H. Park, "The development of culture contents appling record heritage," Korea Institute for National Unification, Vol. 12. pp.313-341, 2008.
  5. S. H. Kim, J. E. Eom, "A Study on the Document's Automatic Classification Using Machine Learning," Journal of Information Science Theory and Practice, vol.39, no.4 pp.47-66, 2008.
  6. J. H. Roh, H. J. Kim, J. Y. Chang, "A WordNet-based Feature Engineering Method for Text Classification," Society for e-business studies, Vol.2012, No.4, pp.96-102, 2012
  7. S. S. Lee, J. M. Choi, C. Gun, B. S. Lee, "Empirical Analysis & Comparisons of Web Document Classification Methods," Korean Institute on Information Scientists Engineers, Vol.29, No.2, pp.154-156, 2002.
  8. D. H. Park, W. S. Choi, H. J. Kim, S. L. Lee, "Web Document Classification System Using the Text Analysis and Decision Tree Model," Korean Institute on Information Scientists Engineers, Vol.38, No.2, pp.248-251, 2011.
  9. J. S. Hong, N. G. Kim, S. W. Lee, "A Methodology for Automatic Multi - Categorization of Single - Categorized Documents," Journal of Intelligence and Information System Society, Vol.20, No.3, 2014
  10. J. H. Her, S. J. Ko, T. Y. Kim, J. H. Choi, Jung-Hyun Lee, "An Automatic Classification of Korean Documents Using Weight for Keywords of Document and Corpus : Bayesian classifier," Korean Institute on Information Scientists Engineers, Vol.26, No.2, pp.154-156, 1999.
  11. K. G. Cho, J. H. Kim, "Automatic Text Categorization on Hierarchical Category Structure by using ICF(Inverted Category Frequency) Weighting," Korean Institute on Information Scientists Engineers, Vol.24, No.1, pp.507-510, 1997
  12. Thorsten Joachims, "Transductive Inference for Text Classification using Support Vector Machines," ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning, pp.200-209, 1999.
  13. Simon Tong, Daphne Koller, "Support Vector Machine Active Learning with Applications to Text Classification," Journal of Machine Learning Research, Vol.2, pp.45-66, 2002
  14. M. Sahami, S. Dumais, D. Heckerman, E. Horvitz, "A Bayesian approach to filtering junk e-mail," AAAI'98 Workshop on Learning for Text Categorization.,1998.
  15. P. J. Kim, "A Study on automatic assignment of descriptors using machine learning," Journal of the Korean Society for Information Management, Vol.23 No.1, pp.279-299, 2006. https://doi.org/10.3743/KOSIM.2006.23.1.279
  16. Y. D. Yun, Y. W. Yang, H. S. Ji, H. S. Lim, "Development of Smart Senior Classification Model based on Activity Profile Using Machine Learning Method," Journal of the Korea Convergence Society, Vol.8, No.1, pp.25-34, 2017.
  17. Li Fei-Fei, Rob Fergues, Antonio Torralba, "Recognizing and Learning Object Categories," ICCV, 2005.
  18. G. Csurka, C. Dance, L.X. Fan, J. Willamowski, and C. Bray. "Visual categorization with bags of keypoints," ECCV, 2004.
  19. Lazebnik, S., Schmid, C., Ponce, J., "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," CVPR, 2006.
  20. C. H. Lampert, M. M. Blaschko, and T. Hofmann, "Beyond Sliding Windows: Object Localization by Efficient Subwindow Search," CVPR, 2008.
  21. Niculescu-Mizil, A., Caruana, R., "Predicting good probabilities with supervised learning", In Proceedings of the 22nd international conference on Machine learning(ACM), pp. 625-632. 2005