Search | Korea Science

Kang, In-Su
- Journal of the Korea Society of Computer and Information
- /
- v.26 no.8
- /
- pp.47-54
- /
- 2021
Given a document, keyphrase extraction is to automatically extract words or phrases which topically represent the content of the document. In unsupervised keyphrase extraction approaches, candidate words or phrases are first extracted from the input document, and scores are calculated for keyphrase candidates, and final keyphrases are selected based on the scores. Regarding the computation of the scores of candidates in unsupervised keyphrase extraction, this study proposes a method of adjusting the scores of keyphrase candidates according to the types of keyphrase candidates: word-type or phrase-type. For this, type-token ratios of word-type and phrase-type candidates as well as information content of high-frequency word-type and phrase-type candidates are collected from the input document, and those values are employed in adjusting the scores of keyphrase candidates. In experiments using four keyphrase extraction evaluation datasets which were constructed for full-text articles in English, the proposed method performed better than a baseline method and comparison methods in three datasets.
https://doi.org/10.9708/jksci.2021.26.08.047 인용 PDF KSCI HTML

Tian, Yingshi;Kim, Young-Ho;Jeong, Yoon-Jae;Ryu, Ji-Hee;Myaeng, Sung-Hyon
- Journal of KIISE:Software and Applications
- /
- v.36 no.5
- /
- pp.420-429
- /
- 2009
Patent text is a rich source for discovering technological trends. In order to automate such a discovery process, we attempt to identify phrases corresponding to the problem and its solution method which together form a technology. Problem and solution phrases are identified by a SVM classifier using features based on a combination of a language modeling approach and linguistic clues. Based on the occurrence statistics of the phrases, we identify the time span of each problem and solution and finally generate a trend. Based on our experiment, we show that the proposed semantic phrase identification method is promising with its accuracy being 77% in R-precision. We also show that the unsupervised method for discovering technological trends is meaningful.
PDF KSCI