A Study on Patent Literature Classification Using Distributed Representation of Technical Terms

기술용어 분산표현을 활용한 특허문헌 분류에 관한 연구

Choi, Yunsoo;Choi, Sung-Pil

  • Received : 2019.03.28
  • Accepted : 2019.05.20
  • Published : 2019.05.31


In this paper, we propose optimal methodologies for classifying patent literature by examining various feature extraction methods, machine learning and deep learning models, and provide optimal performance through experiments. We compared the traditional BoW method and a distributed representation method (word embedding vector) as a feature extraction, and compared the morphological analysis and multi gram as the method of constructing the document collection. In addition, classification performance was verified using traditional machine learning model and deep learning model. Experimental results show that the best performance is achieved when we apply the deep learning model with distributed representation and morphological analysis based feature extraction. In Section, Class and Subclass classification experiments, We improved the performance by 5.71%, 18.84% and 21.53%, respectively, compared with traditional classification methods.


Patent Literature Classification;Distributed Representation;Word Embedding Vector;Deep Learning


  1. Kim, Jao-Ho and Choi, Key-Sun. 2005. "Patent Document Categorization based on Semantic Structural Information." Proc. of the 17th Annual Conference on Human and Cognitive Language Technology, 28-34.
  2. Park, Chanjeong, Kim, Kiyong and Seong, Dongsu. 2014. "Automatic IPC Classification for Patent Documents of Convergence Technology Using KNN." Journal of Korean Institute of Information Technology, 12(3): 175-185.
  3. Lim, Sora and Kwon, Yongjin. 2017. "IPC Multi-label Classification based on Functional Characteristics of Fields in Patent Documents." Review of Korean Society for Internet Information, 18(1): 77-88.
  4. Korean Intellectual Property Office. 2018. Intellectual Property Statistics for 2017. Daejeon: Korean Intellectual Property Office.
  5. KIST, Convergence Research Policy Center. 2018. Research and Analysis of National Convergence Technology R & D in 2017. Seoul: KIST, Convergence Research Policy Center.
  6. Bahdanau D., Cho, K. and Bengio, Y. 2015. "Neural Machine Translation by Jointly Learning to Align and Translate." In Proceeding of ICLR 2015. [arXiv:1409.0473]
  7. Bojanowski, P. et al. 2017. "Enriching word vectors with subword information." Transactions of the Association for Computational Linguistics, 5: 135-146.
  8. Chen, Y. and Chang, Y. 2012. "A three-phase method for patent classification." Information Processing & Management, 48(6): 1017-1030.
  9. Collobert, R. and Weston, J. 2008. "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning." In Proceeding of the 25th International Conference on Maching Learning.
  10. Fall, C. et al. 2003. "Automated categorization in the international patent classification." In Acm Sigir Forum, 37(1): 10-25.
  11. Koster, C. and Seutter, M. 2003. "Taming wild phrases." In Proceedings of the 25th European conference on IR research (ECIR'03), 161-176.
  12. Larkey, L. 1999. "A patent search and classification system." In Proceedings of the fourth ACM conference on Digital libraries, 179-187.
  13. Mikolov, T., Chen, K., Corrado, G. and Dean, J. 2013. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781.
  14. Pennington, J., Socher, R. and Manning, C. 2014. "Glove: Global vectors for word representation." In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532-1543.
  15. Tikk, D., Biro, G. and Torcsvari, A. 2008. "A hierarchical online classifier for patent categorization." Emerging technologies of text mining: Techniques and applications. IGI Global, 244-267.