A study on Korean language processing using TF-IDF

TF-IDF를 활용한 한글 자연어 처리 연구

  • 이종화 (동의대학교 e비즈니스학과) ;
  • 이문봉 (동의대학교 경영학과) ;
  • 김종원 (동의대학교 경영정보학과)
  • Received : 2019.06.17
  • Accepted : 2019.08.08
  • Published : 2019.09.30


Purpose One of the reasons for the expansion of information systems in the enterprise is the increased efficiency of data analysis. In particular, the rapidly increasing data types which are complex and unstructured such as video, voice, images, and conversations in and out of social networks. The purpose of this study is the customer needs analysis from customer voices, ie, text data, in the web environment.. Design/methodology/approach As previous study results, the word frequency of the sentence is extracted as a word that interprets the sentence has better affects than frequency analysis. In this study, we applied the TF-IDF method, which extracts important keywords in real sentences, not the TF method, which is a word extraction technique that expresses sentences with simple frequency only, in Korean language research. We visualized the two techniques by cluster analysis and describe the difference. Findings TF technique and TF-IDF technique are applied for Korean natural language processing, the research showed the value from frequency analysis technique to semantic analysis and it is expected to change the technique by Korean language processing researcher.


Big Data;TF-IDF;Text Mining;Cluster Analysis;KoNLP


