A Study on Patent Literature Classification Using Distributed Representation of Technical Terms

기술용어 분산표현을 활용한 특허문헌 분류에 관한 연구

Choi, Yunsoo;Choi, Sung-Pil

  • Received : 2019.03.28
  • Accepted : 2019.05.20
  • Published : 2019.05.31


In this paper, we propose optimal methodologies for classifying patent literature by examining various feature extraction methods, machine learning and deep learning models, and provide optimal performance through experiments. We compared the traditional BoW method and a distributed representation method (word embedding vector) as a feature extraction, and compared the morphological analysis and multi gram as the method of constructing the document collection. In addition, classification performance was verified using traditional machine learning model and deep learning model. Experimental results show that the best performance is achieved when we apply the deep learning model with distributed representation and morphological analysis based feature extraction. In Section, Class and Subclass classification experiments, We improved the performance by 5.71%, 18.84% and 21.53%, respectively, compared with traditional classification methods.


Patent Literature Classification;Distributed Representation;Word Embedding Vector;Deep Learning