Recognizing Unknown Words and Correcting Spelling errors as Preprocessing for Korean Information Processing System

Park, Bong-Rae;Rim, Hae-Chang;

The Transactions of the Korea Information Processing Society (한국정보처리학회논문지)

Volume 5 Issue 10
/
Pages.2591-2599
/
1998
/
1226-9190(pISSN)

Korea Information Processing Society (한국정보처리학회)

Recognizing Unknown Words and Correcting Spelling errors as Preprocessing for Korean Information Processing System

한국어 정보처리 시스템의 전처리를 위한 미등록어 추정 및 철자 오류의 자동 교정

Park, Bong-Rae (Dept.of Computer Sceince, Korea University) ;
Rim, Hae-Chang (Dept.of Computer Sceince, Korea University)

박봉래 (고려대학교 컴퓨터학과) ;
임해창 (고려대학교 컴퓨터학과)

Published : 1998.10.01

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we proose a method of recognizing unknown words and correcting spelling errors(including spacing erors) to increase the performance of Korean information processing systems. Unknown words are recognized through comparative analysis of two or more morphologically similar eojeols(spacing units in Korean) including the same unknown word candidates. And spacing errors and spelling errors are corrected by using lexicatlized rules shich are automatically extracted from very large raw corpus. The extractionof the lexicalized rules is based on morphological and contextual similarities between error eojeols and their corection eojeols which are confirmed to be used in the corpus. The experimental result shows that our system can recognize unknown words in an accuracy of 98.9%, and can correct spacing errors and spelling errors in accuracies of 98.1% and 97.1%, respectively.

본 논문은 한국어 정보 처리 시스템의 성능 향상을 위하여 입력 문서에 존재하는 미등록어를 인식하고 철자 오류(뛰어쓰기 오류 포함)를 자동으로 교정하는 방법을 제안한다. 동일한 미등록어 후보가 포함된 둘 이상의 형태적 유사 어절을 비교 분석함으로써 입력 문서에 존재하는 미등록어를 인식하고, 오류 어절과 코퍼스내에 존재하는 교정 어절 사이의 형태적 및 문맥적 유사성에 근거하여 대량의 원시 코퍼스로부터 자동으로 오류 교정용 어휘 규칙을 생성한 후에 이를 이용하여 입력 문서에 존재하는 뛰어쓰기 및 절차 오류를 교정한다. 실험 결과에 따르면 제안한 방법으로 구현된 시스템은 약 98.9%의 정확도로 미등록어를 인식할 수 있고, 98.1%와 97.1%의 정확도로 뛰어쓰기 오류와 철자 오류를 각각 교정할 수 있다.

The Transactions of the Korea Information Processing Society (한국정보처리학회논문지)

Recognizing Unknown Words and Correcting Spelling errors as Preprocessing for Korean Information Processing System

한국어 정보처리 시스템의 전처리를 위한 미등록어 추정 및 철자 오류의 자동 교정

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)