Proceedings of the Korean Institute of Information and Commucation Sciences Conference (한국정보통신학회:학술대회논문집)The Korea Institute of Information and Commucation Engineering (한국정보통신학회)
Case Study on Public Document Classification System That Utilizes Text-Mining Technique in BigData Environment
빅데이터 환경에서 텍스트마이닝 기법을 활용한 공공문서 분류체계의 적용사례 연구
- Shim, Jang-sup (IITP (Institute for Information & Communication Technology Promotion)) ;
- Lee, Kang-wook
- Published : 2015.10.26
Text-mining technique in the past had difficulty in realizing the analysis algorithm due to text complexity and degree of freedom that variables in the text have. Although the algorithm demanded lots of effort to get meaningful result, mechanical text analysis took more time than human text analysis. However, along with the development of hardware and analysis algorithm, big data technology has appeared. Thanks to big data technology, all the previously mentioned problems have been solved while analysis through text-mining is recognized to be valuable as well. However, applying text-mining to Korean text is still at the initial stage due to the linguistic domain characteristics that the Korean language has. If not only the data searching but also the analysis through text-mining is possible, saving the cost of human and material resources required for text analysis will lead efficient resource utilization in numerous public work fields. Thus, in this paper, we compare and evaluate the public document classification by handwork to public document classification where word frequency(TF-IDF) in a text-mining-based text and Cosine similarity between each document have been utilized in big data environment.