Hangeul Stem Extraction Algorithm for Text Mining Based on Natural Language Processing

자연어 처리 기반 텍스트 마이닝을 위한 한글 어간 추출 알고리즘

  • Choi, Ki-won (Inje-University Institute of Digital Anti-aging Healthcare) ;
  • Choi, Seong-hun (Inje-University Institute of Digital Anti-aging Healthcare) ;
  • Jo, Sang-hyeon (Inje-University Institute of Digital Anti-aging Healthcare) ;
  • Kim, Hee-cheol (Inje-University Institute of Digital Anti-aging Healthcare)
  • 최기원 (인제대학교 디지털 항노화 헬스케어학과 대학원) ;
  • 최성훈 (인제대학교 디지털 항노화 헬스케어학과 대학원) ;
  • 조상현 (인제대학교 디지털 항노화 헬스케어학과 대학원) ;
  • 김희철 (인제대학교 디지털 항노화 헬스케어학과 대학원)
  • Published : 2017.05.31

Abstract

Natural language processing, which is the basis of text mining, differs depending on the type of language. Especially, Hangeul, which has relatively high freedom of expression compared to other languages, has various forms of words depending on the use of ending. The part that does not change in these various forms of words is called the stem. For effective text mining, it is essential to extract words and unify various types of words. Therefore, this paper proposes an extraction algorithm for Hangul word for effective text mining of Hangul document.

Acknowledgement

Supported by : 산업통상자원부