Korean Homograph Tagging Model based on Sub-Word Conditional Probability

Shin, Joon Choul;Ock, Cheol Young;

doi:10.3745/KTSDE.2014.3.10.407

KIPS Transactions on Software and Data Engineering (정보처리학회논문지:소프트웨어 및 데이터공학)

Volume 3 Issue 10
/
Pages.407-420
/
2014
/
2287-5905(pISSN)
/
2734-0503(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Korean Homograph Tagging Model based on Sub-Word Conditional Probability

부분어절 조건부확률 기반 동형이의어 태깅 모델

신준철 (울산대학교 지능형컴퓨터연구실) ;
옥철영 (울산대학교 전기공학부 IT융합전공)

Received : 2014.06.20
Accepted : 2014.09.03
Published : 2014.10.31

https://doi.org/10.3745/KTSDE.2014.3.10.407 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In general, the Korean morpheme analysis procedure is divided into two steps. In the first step as an ambiguity generation step, an Eojeol is analyzed into many morpheme sequences as candidates. In the second step, one appropriate candidate is chosen by using contextual information. Hidden Markov Model(HMM) is typically applied in the second step. This paper proposes Sub-word Conditional Probability(SCP) model as an alternate algorithm. SCP uses sub-word information of adjacent eojeol first. If it failed, then SCP use morpheme information restrictively. In the accuracy and speed comparative test, HMM's accuracy is 96.49% and SCP's accuracy is just 0.07% lower. But SCP reduced processing time 53%.

한국어 형태소 분석 및 태깅은 크게 2가지 단계로 나뉜다. 첫 번째 단계는 어절을 분석하여 후보들을 생성하는 것으로, 여러 의미를 가진 어절은 이 단계에서 다양한 후보들이 생성된다. 두 번째는 문맥 정보를 이용하여 후보 중에 가장 적절한 하나를 선택하는 단계로, 흔히 태깅이라 한다. 일반적으로 두 번째 단계에서는 은닉 마르코프 모델(Hidden Markov Model, 이하 HMM)을 자주 사용하지만, 본 논문에서는 처리속도를 향상시킨 부분어절 조건부확률 모델을 제안한다. 이 모델은 우선적으로 인접 어절 정보를 이용하여 현재 처리 중인 어절의 의미를 결정하고, 예외적으로 용언이 인접한 경우에만 후보 정보의 극히 일부분을 이용한다. 실험 결과 정확률은 HMM의 96.49%보다 0.07% 낮았지만, 처리 소요 시간을 약 53% 감소시켰다.

Keywords

References

Jin-dong Kim, Heui-Seok Lim, and Hae-Chang Rim, "Twoply HMM: A Part-of-Speech Tagging Model based on Morpheme-Unit considering the Characteristics of Korean", Journal of KIISE, Vol.24, No.12, pp.1502-1512, Dec., 1997.
Hee-Geun Park, Y. M. Ahn, and Y. H. Seo, "Korean Part-of-Speech Tagging System Using Resolution Rules for Individual Ambiguous Word(in Korean)", Journal of KIISE: Computing Practices and Letters, Vol.13, No.6, pp.427-431, 2007.
Scott M. Thede, Mary P. Harper, "A Second-Order Hidden Markov Model for Part-of-Speech Tagging", In Proceedings of the 37th of ACL, pp.175-182, 1999.
Eric Brill, "Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging", Computational Linguistics, Vol.21, No.4, pp.543-565. 1995.
Dan Roth, Dmitry Zelenko, "Part of speech tagging using a network of linear separators", Proceedings of COLINGACL 98, pp.1136-1142, 1998.
David Yarowsky, "Word-sense disambiguation using statistical models of Roget's categories trained on large corpora", International Conference On Computational Linguistics(Proceedings of the 14th conference on Computational linguistics-Vol.2), pp.454-460, 1992.
Soojong Lim, Youngja Park, and Mansuk Song, "Word Sense Disambiguation of Korean Verbs Using Weight Information from Context", In Proceedings of the 10th Conference on Hangul and Korean Language Information Processing, pp.425-429, Oct., 1998.
Jun-Su Kim, H. S. Choe, and C. Y. Ock, "A Korean Homonym Disambiguation Model Based on Statistics Using Weights(in Korean)", Journal of KIISE: Software and Applications, Vol.30, No.11, 2003.
Wang Woo Lee, "Word Sense Disambiguation System Using Lexical Co-occurrencing Set and Thesaurus(in Korean)", Master Thesis, Ulsan university, 2003.
Yong-Gu Lee, Y. M. Chung, "An Experimental Study on an Effective Word Sense Disambiguation Model Based on Automatic Sense Tagging Using Dictionary Information (in Korean)", Journal of the Korean Society for Information Management, Vol.24, No.1, 2005.
Jeong Heo, H. C. Seo, and M. G. Jang, "Homonym Disambiguation based on Mutual Information and Sense-Tagged Compound Noun Dictionary(in Korean)", Journal of KIISE: Software and Applications, Vol.33, No.12, 2003.
Dong Myung Kim, "Simultaneous Korean POS and Homonym Tagging System using HMM(in Korean)", Masters Thesis, Ulsan University, 2009.
Minho Kim, H. C. Kwon, "Word Sense Disambiguation using Semantic Relations in Korean WordNet(in Korean)", Journal of KIISE: Software and Applications, Vol.38, No.10, pp.503-577, 2011.
Young-Jun Base, Cheol-Young Ock, "Semantic Analysis of Korean Compound Noun using Lexical Semantic Network(U-WIN)", Ph. D. Thesis, Ulsan University, 2013.
Joon-Choul Shin, Cheol-Young Ock, "A Stage Transition Model for Korean Part-of-Speech and Homograph Tagging", Journal of KIISE: Software and Applications, Vol.39, No.11, pp.889-901, 2012.
Joon-Choul Shin, Cheol-Young Ock, "Comparison between Markov Model and Hidden Markov Model for Korean Part-of-Speech and Homograph Tagging", In Proceddings of the 25th Conference of Hangul and Korean Information Processing, pp.152-155, Oct., 2013.
Joon-Choul Shin, C. Y. Ock, "A Korean Morphological Analyzer using a Pre-analyzed Partial Word-phrase Dictionary(in Korean)", Journal of KIISE, Vol.39, No.5, 2012.
Ho Suk Lee, "A Survey of conditional Random Fields and Applications", In Proceddings of Fall Conference of KIISE, Vol.36, No.2, 2009.
Seung-Hoon Na, Chang-Hyun Kim, and Young-Kil Kim, "Semi-CRF or Linear-chain CRF? A comparative Study of Joint Models for Korean Morphological Analysis and POS Tagging", In Proceddings of the 25th Conference of Hangul and Korean Information Processing, pp.9-12, 2013.

Cited by

Effect of Word Sense Disambiguation on Neural Machine Translation: A Case Study in Korean vol.6, pp.2169-3536, 2018, https://doi.org/10.1109/ACCESS.2018.2851281

KIPS Transactions on Software and Data Engineering (정보처리학회논문지:소프트웨어 및 데이터공학)

Korean Homograph Tagging Model based on Sub-Word Conditional Probability

부분어절 조건부확률 기반 동형이의어 태깅 모델

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)