한국어 텍스트에 사용된 이음표의 자동 전사

Automatic Transcription of the Union Symbols in Korean Texts

  • 윤애선 (부산대학교 인문대학 불어불문학과) ;
  • 권혁철 (부산대학교 공과대학 전자전기정보컴퓨터학부)
  • 발행 : 2003.06.01

초록

In this paper, we have proposed Auto-TUS, an automatic transcription module of three union symbols-hyphen, dash and tilde (‘­’, ‘―’, ‘∼’)-using their linguistic contexts. Few previous studies have discussed the problems of ambiguities in transcribing symbols into Korean alphabetic letters. We have classified six different reading formulae of the union symbols, analyzed the left and right contexts of the symbols, and investigated selection rules and distributions between the symbols and their contexts. Based on these linguistic features, 86 stereotyped patterns, 78 rules and 8 heuristics determining the types of reading formulae are suggested for Auto-TUS. This module works modularly in three steps. The pilot test was conducted with three test suites, which contains respectively 418, 987 and 1,014 clusters of words containing a union symbol. Encouraging results of 97.36%, 98.48%, 96.55% accuracy were obtained for three test suites. Our next phases are to develop a guessing routine for unknown contexts of the union symbols by using statistical information; to refine the proper nouns and terminology detecting module; and to apply Auto-TUS on a larger scale.

키워드