A Language Model and Clue based Machine Learning Method for Discovering Technology Trends from Patent Text

특허 문서 텍스트로부터의 기술 트렌드 탐지를 위한 언어 모델 및 단서 기반 기계학습 방법

  • 전영실 (한국과학기술원 정보통신공학과) ;
  • 김영호 (한국과학기술원 정보통신공학과) ;
  • 정윤재 (한국과학기술원 전산학과) ;
  • 류지희 (한국과학기술원 전산학과) ;
  • 맹성현 (한국과학기술원 정보통신공학과)
  • Published : 2009.05.15

Abstract

Patent text is a rich source for discovering technological trends. In order to automate such a discovery process, we attempt to identify phrases corresponding to the problem and its solution method which together form a technology. Problem and solution phrases are identified by a SVM classifier using features based on a combination of a language modeling approach and linguistic clues. Based on the occurrence statistics of the phrases, we identify the time span of each problem and solution and finally generate a trend. Based on our experiment, we show that the proposed semantic phrase identification method is promising with its accuracy being 77% in R-precision. We also show that the unsupervised method for discovering technological trends is meaningful.

특허 문서는 과학기술 발전을 탐지하고 기존 트렌드를 이해함으로써 미래의 트렌드를 예측하는데 유용한 자원이다. 본 연구에서는 단위 기술을 "문제점"과 "해결방법"으로 구성되어 있다고 보고, 언어적 단서(linguistic clue)와 언어 모델(1anguage model)을 결합한 혼합 모델을 사용하여 이들에 해당하는 의미 핵심문구(semantic keyphrase)를 찾고, 의미 핵심문구로 표현되는 단위 기술을 추출하였다. 추출된 결과에 근거하여 비지도 학습(unsupervised learning) 방법으로 과학기술들의 트렌드를 발견하는 새로운 접근방법(Technological Trend Discovery, TTD)을 제안한다. 실험 결과에 따르면 본 연구에서 제안한 방법으로 과학 기술을 나타내는 의미적 핵심 문구를 추출하는데 77%의 R-정확률을 달성하였고 결과적으로 의미있는 과학기술 트렌드를 발견할 수 있었다.

Keywords

References

  1. Q.Mei and C.Zhai. A mixture model for contextual text mining. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge Discovery and Data mining(KDD'06), pp. 649-655, 2006 https://doi.org/10.1145/1150402.1150482
  2. R. Nallpati. Semantic language models for topic detection and tracking. In Proceedings of the conference of the North American chapter of the Association for Computational Linguistics on Human Language Technology (HLTNAACL'03), pp. 1-6, 2003 https://doi.org/10.3115/1073416.1073417
  3. B. Lent, R. Agrawal, and R. Srikant. Discovering trends in text databases. In Proceedings of the 3rd international conference on Knowledge Discovery and Data mining (KDD'97), pp. 227-230, 1997
  4. A. Porter and D. Jhu. Technological mapping for management of technology. In Proceedings of International Symposium on Technology, 2001
  5. L. Wanner, et al. Towards content-oriented patent document processing. World Patent Information, Vol. 30 (1), pp. 21-33, 2007 https://doi.org/10.1016/j.wpi.2007.03.008
  6. W. Pottenger and T. Yang. Detecting emerging concepts in textual data mining. Computational Information Retrieval, pp. 1-17, 2001
  7. B. Yoon and Y. Park. A text mining-based patent network: analytical tool for high-technology trend. Journal of High Technology Management Research, Vol. 15 (1), pp. 37-50, 2004 https://doi.org/10.1016/j.hitech.2003.09.003
  8. Y. Kim, J. Suh, and S. Park. Visualization of patent analysis for emerging technology. Expert Systems with Applications,Vol. 34 (3), pp. 1804-1812, 2007 https://doi.org/10.1016/j.eswa.2007.01.033
  9. K. Ahmad and A. Al-Thubaity. Can text analysis tell us something about technology progress? In Proceedings of the ACL-03 workshop on patent corpus processing, pp. 41-45, 2003 https://doi.org/10.3115/1119303.1119309
  10. F. Bouskila and W. Pottenger. The role of semantic locality in hierarchical distributed dynamic indexing. In Proceedings of the International Conference on Artificial Intelligence (IC-AI'00), 2000
  11. D. Klein and C. Manning. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), pp. 423-430, 2003
  12. T. Takaki, A. Fujii, and T. Ishikawa. Associative document retrieval by query subtopic analysis and its application to invalidity patent search. In Proceedings of the 13th ACM International conference on Information and Knowledge Management (CIKM '04), pp. 399-406, 2004 https://doi.org/10.1145/1031171.1031251
  13. H. Itoh, H. Mano, and Y. Ogawa, Term distillation in patent retrieval. In Proceedings of the ACL-03 workshop on patent corpus processing, pp. 41-45, 2003 https://doi.org/10.3115/1119303.1119308
  14. C. Koster, M. Seutter and J. Beney. Multi- Classification of Patent Applications with winnow. In Proceedings PSI 2003, pp. 545-554, 2003
  15. K. Lai and S. Wu. Using the patent co-citation approach to establish a new patent classification system. Information Processing and Management, Vol. 41, pp. 313-330, 2005 https://doi.org/10.1016/j.ipm.2003.11.004
  16. A. Shinmori, M. Okumura, Y. Marukawa, and M. Iwayama.Patent claim processing for readability: structure analysis and term explanation. In Proceedings of the ACL-03 workshop on patent corpus processing, pp. 56-65, 2003 https://doi.org/10.3115/1119303.1119310
  17. A. Chakrabarti, I. Dror, and N. Eakabuse. Interorganizational transfer of knowledge: An analysis of patent citations of a defense firm. IEEE Transactions on Engineering Management, Vol. 40 (1), pp. 91-94, 1993 https://doi.org/10.1109/17.206656
  18. Library for Support Vector Machine http://www.csie.ntu.edu.tw/-cjlin/libsvm/