A Hybrid of Rule based Method and Memory based Loaming for Korean Text Chunking

;;

한국정보과학회논문지:소프트웨어및응용 (Journal of KIISE:Software and Applications)

제31권3호
/
Pages.369-378
/
2004
/
1229-6848(pISSN)

한국정보과학회 (Korean Institute of Information Scientists and Engineers)

한국어 구 단위화를 위한 규칙 기반 방법과 기억 기반 학습의 결합

A Hybrid of Rule based Method and Memory based Loaming for Korean Text Chunking

박성배 (서울대학교 컴퓨터공학부) ;
장병탁 (서울대학교 컴퓨터공학부)

발행 : 2004.03.01

PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

한국어나 일본어와 같이 부분 어순 자유 언어에서는 규칙 기반 방법이 구 단위화에 있어서 매우 유용한 방법이며, 실제로 잘 발달된 조사와 어미를 활용하면 소수의 규칙만으로도 여러 가지 기계학습 기법들만큼 높은 성능을 보일 수 있다. 하지만, 이 방법은 규칙의 예외를 처리할 수 있는 방법이 없다는 단점이 있다. 예외 처리는 자연언어처리에서 매우 중요한 문제이며, 기억 기반 학습이 이 문제를 효과적으로 다룰 수 있다. 본 논문에서는, 한국어 단위화를 위해서 규칙 기반 방법과 기억 기반 학습을 결합하는 방법을 제시한다. 제시된 방법은 우선 규칙에 기초하고, 규칙으로 추정한 단위를 기억 기반 학습으로 검증한다. STEP 2000 말뭉치에 대한 실험 결과, 본 논문에서 제시한 방법이 규칙이나 여러 기계학습 기법을 단독으로 사용하였을 때보다 높은 성능을 보였다. 규칙과 구 단위화에 가장 좋은 성능을 보인 Support Vector Machines의 F-score가 각각 91.87과 92.54인데 비하여, 본 논문에서 제시된 방법의 최종 F-score 는 94.19이다.

In partially free word order languages like Korean and Japanese, the rule-based method is effective for text chunking, and shows the performance as high as machine learning methods even with a few rules due to the well-developed overt Postpositions and endings. However, it has no ability to handle the exceptions of the rules. Exception handling is an important work in natural language processing, and the exceptions can be efficiently processed in memory-based teaming. In this paper, we propose a hybrid of rule-based method and memory-based learning for Korean text chunking. The proposed method is primarily based on the rules, and then the chunks estimated by the rules are verified by memory-based classifier. An evaluation of the proposed method on Korean STEP 2000 corpus yields the improvement in F-score over the rules or various machine teaming methods alone. The final F-score is 94.19, while those of the rules and SVMs, the best machine learning method for this task, are just 91.87 and 92.54 respectively.

키워드

참고문헌

L. Ramshaw and M. Marcus, 'Text chunking using transformation-based learning,' In Proceedings of the Third ACL Workshop on Very Large Corpora, pp. 82-94, 1995
S. Argamon, I. Dagan, and Y. Krymolowski, 'A memory-based approach to learning shallow natural language patterns,' In Proceedings of COLING/ACL 98, pp. 67-73, 1998 https://doi.org/10.3115/980451.980857
T. Kudo and Y. Matsumoto, 'Use of support vector learning for chunk identification,' In Proceedings of the 4th Conference on Computational Natural Language Learning, pp. 142-144, 2000
T. Zhang, F. Damerau, and D. Johnson, 'Text chunking using regularized Winnow,' In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pp. 539-546, 2001 https://doi.org/10.3115/1073012.1073081
G. Zhou and J. Su, 'Error-driven HMM-based chunk tagger with context-dependent lexicon,' In Proceedings of EMNLP/VLC-2000, pp. 71-79, 2000
M. Shibatani, The Languages of Japan, Cambridge University Press, 1990
V. Cherkas sky and F. Mulier, Learning from Data: Concepts, Theory, and Methods, John Wiley & Sons, Inc., 1998
김미영, 강신재, 이종혁, '규칙과 어휘정보를 이용한 한국어 문장의 구묶음(Chunking)', 제12회 한국 및 한국어 정보처리 학술대회 논문집, pp.11-17, 2000
심효필, '최소자원 최대효과의 구문분석', 제11회 한글 및 한국어 정보처리 학술대회 논문집, pp. 242-244, 1999
W. Daelemans, A. Bosch, and J. Zavrel, 'Forgetting exceptions is harmful in language learning,' Machine Learning, Vol. 34, No.1, pp. 11-41, 1999 https://doi.org/10.1023/A:1007585615670
J.-T. Yoon, K-S. Choi and M.-S. Song, 'Three types of chunking in Korean and dependency analysis based on lexical association,' In Proceedings of the 18th International Conference on Computer Processing Languages, pp. 59-65, 1999
박성배, 장병탁, '최대 엔츠로피 모델을 이용한 텍스트 단위화 학습', 제13회 한국 및 한국어 정보처리학술대회 논문집, pp. 130-137,2001
Y.-S. Hwang, H.-J. Chung, Y.-J. Kwak, S.-Y. Park, and H.-C. Rim, 'Shallow Parsing by Weighted Probabilistic Sum,' In Proceedings of the 19th International Conference on Computer Processing Languages, pp. 236-241, 2001
M. Kay, 'Algorithm Schemata and Data Structures in Syntactic Processing,' In Readings in Natural Language Processing, pp. 35-70, Morgan Kaufmann, 1970
김기철, 이기오, 이용석, '형태소 분석 주도의 한국어 복합동사 처리', 정보과학회 논문지, 제22권, 제9호, pp. 1384-1393, 1995
T. Cover and P. Hart, 'Nearest neighbor pattern classification,' IEEE Transactions on Information Theory, Vol. 13, pp. 21-27, 1967 https://doi.org/10.1109/TIT.1967.1053964
W. Daelemans, J. Zavrel, K. Sloot, and A. Bosch, 'TiMBL: Tilburg Memory Based Learner, version 4.1, Reference Guide,' Technical Report ILK 01-04, Tilburg University, 2001
R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993
A. Danyluk and F. Provost, 'Small disjuncts in action: Learning to diagnose errors in the local loop of the telephone network,' In Proceedings of the 10th International Conference on Machine Learning, pp. 81-88, 1993
W. Daelemans, J. Zavrel, P. Berek, and S. Gillis, 'MBT: A memory-based part of speech taggergenerator,' In Proceedings of the 4th Workshop on Very Large Corpora, pp. 14-27, 1996
E. Brill, 'Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging,' Computational Linguistics, Vol. 21, No.4, pp. 543-566, 1995
Y. Freund and R. Schapire, 'Experiments with a new boosting algorithm,' In Proceedings of the 13th International Conference on Machine Learning, pp. 148-156, 1996
S. Abney, R. Schapire, and Y. Singer, 'Boosting applied to tagging and PP attachment,' In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 38-45, 1999
최기선, 남영준, 김진규, 한영균, 박석문, 김진수, 이춘택, 김덕봉, 김재훈, 최병진, '한국어정보베이스를 위한 형태, 통사 태그 표준에 관한 연구', 인지과학, 제7권, 제4호, pp. 43-61, 1996
CoNLL, Shared Task for Computational Natural Language Learning (CoNLL), http://Icg-www.uia.ac.be/conll2000/chunking, 2000
T. Joachirns, 'Making large-scale SVM learning practical,' Technical Report LS8, Universitaet Dortmund, 1998
B. Scholkopf, C. Burges, and A. Smola, Advances in Kernel Methods - Support Vector Learning, MIT Press, 1999
J. Zavrel, W. Daelemans, and J. Veenstra, 'Resolving PP attachment ambiguities with memorybased learning,' In Proceedings of the Conference on Computational Language Learning, pp. 136-144, 1997
Proceedings of the Conference on Computational Language Learning Resolving PP attachment ambiguities with memorybased learning J.Zavrel;W.Daelemans;J.Veenstra

한국정보과학회논문지:소프트웨어및응용 (Journal of KIISE:Software and Applications)

한국어 구 단위화를 위한 규칙 기반 방법과 기억 기반 학습의 결합

A Hybrid of Rule based Method and Memory based Loaming for Korean Text Chunking

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)