Korean Sentence Boundary Detection Using Memory-based Machine Learning

Han Kun-Heui;Lim Heui-Seok;

The Journal of the Korea Contents Association (한국콘텐츠학회논문지)

Volume 4 Issue 4
/
Pages.133-139
/
2004
/
1598-4877(pISSN)
/
2508-6723(eISSN)

The Korea Contents Association (한국콘텐츠학회)

Korean Sentence Boundary Detection Using Memory-based Machine Learning

메모리 기반의 기계 학습을 이용한 한국어 문장 경계 인식

한군희 (천안대학교 정보통신학부) ;
임희석 (한신대학교 소프트웨어학과)

Published : 2004.12.01

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

This paper proposes a Korean sentence boundary detection system which employs k-nearest neighbor algorithm. We proposed three scoring functions to classify sentence boundary and performed comparative analysis. We uses domain independent linguistic features in order to make a general and robust system. The proposed system was trained and evaluated on the two kinds of corpus; ETRI corpus and KAIST corpus. As experimental results, the proposed system shows about $98.82\%$ precision and $99.09\%$ recall rate even though it was trained on relatively small corpus.

본 논문은 기계 학습 기법 중에서 메모리 기반 학습을 사용하여 범용의 학습 가능한 한국어 문장 경계 인식기를 제안한다. 제안한 방법은 메모리 기반 학습 알고리즘 중 최근린 이웃(kNN) 알고리즘을 사용하였으며, 이웃들을 이용한 문장 경계 결정을 위한 스코어 값 계산을 위한 다양한 가중치 방법을 적용하여 이들을 비교 분석하였다 문장 경계 구분을 위한 자질로는 특정 언어나 장르에 제한적이지 않고 범용으로 적용될 수 있는 자질만을 사용하였다. 성능 실험을 위하여 ETRI 코퍼스와 KAIST 코퍼스를 사용하였으며, 성능 척도로는 정확도와 재현율이 사용되었다. 실험 결과 제안한 방법은 적은 학습 코퍼스만으로도 $98.82\%$의 문장 정확률과 $99.09\%$의 문장 재현율을 보였다.

The Journal of the Korea Contents Association (한국콘텐츠학회논문지)

Korean Sentence Boundary Detection Using Memory-based Machine Learning

메모리 기반의 기계 학습을 이용한 한국어 문장 경계 인식

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)