Search | Korea Science

A Novel Classification Model for Efficient Patent Information Research (효율적인 특허정보 조사를 위한 분류 모형)

Kim, Youngho;Park, Sangsung;Jang, Dongsik
- Journal of Korea Society of Digital Industry and Information Management
- /
- v.15 no.4
- /
- pp.103-110
- /
- 2019
A patent contains detailed information of the developed technology and is published to the public. Thus, patents can be used to overcome the limitations of traditional technology trend research and prediction techniques. Recently, due to the advantages of patented analytical methodology, IP R&D is carried out worldwide. The patent is big data and has a huge amount, various domains, and structured and unstructured data characteristics. For this reason, there are many difficulties in collecting and researching patent information. Patent research generally writes the Search formula to collect patent documents from DB. The collected patent documents contain some noise patents that are irrelevant to the purpose of analysis, so they are removed. However, eliminating noise patents is a manual task of reading and classifying technology, which is time consuming and expensive. In this study, we propose a model that automatically classifies The Noise patent for efficient patent information research. The proposed method performs Patent Embedding using Word2Vec and generates Noise seed label. In addition, noise patent classification is performed using the Random forest. The experimental data is published and registered with the USPTO among the patents related to Ocean Surveillance & Tracking Network technology. As a result of experimenting with the proposed model, it showed 73% accuracy with the label actually given by experts.
https://doi.org/10.17662/ksdim.2019.15.4.103 인용 PDF KSCI

Korean Patent ELECTRA : a pre-trained Korean Patent language representation model for the study of Korean Patent natural language processing(KorPatELECTRA) (Korean Patent ELECTRA : 한국 특허문헌 자연어처리 연구를 위한 사전 학습된 언어모델(KorPatELECTRA))

Min, Jae-Ok;Jang, Ji-Mo;Jo, Yu-Jeong;Noh, Han-Sung
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2021.07a
- /
- pp.69-71
- /
- 2021
특허분야에서 자연어처리 태스크는 특허문헌의 언어적 특이성으로 문제 해결의 난이도가 높은 과제임에 따라 한국 특허문헌에 최적화된 언어모델의 연구가 시급한 실정이다. 본 논문에서는 대량의 한국 특허문헌 데이터를 최적으로 사전 학습(pre-trained)한 Korean Patent ELECTRA 모델과 tokenize 방식을 제안하며 기존 범용 목적의 사전학습 모델과 비교 실험을 통해 한국 특허문헌 자연어처리에 대한 발전 가능성을 확인하였다.
PDF

USPTO의 특허통계

Robert Johnson
- Patent21
- /
- s.65
- /
- pp.23-28
- /
- 2006
본 보고서는 2005년 11월 3일 개최된 PATINEX(PATent INformation EXpo)에서 USPTO의 Mr. Robert Johnson이 발표한 자료인 "Patent Statistics of the U.S.Patent and Trademark Office"를 국문으로 재작성한 글입니다. <혁신기획팀 김민아 역>
PDF