Search | Korea Science

Korean Word Segmentation and Compound-noun Decomposition Using Markov Chain and Syllable N-gram (마코프 체인 밀 음절 N-그램을 이용한 한국어 띄어쓰기 및 복합명사 분리)

권오욱
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.3
- /
- pp.274-284
- /
- 2002
Word segmentation errors occurring in text preprocessing often insert incorrect words into recognition vocabulary and cause poor language models for Korean large vocabulary continuous speech recognition. We propose an automatic word segmentation algorithm using Markov chains and syllable-based n-gram language models in order to correct word segmentation error in teat corpora. We assume that a sentence is generated from a Markov chain. Spaces and non-space characters are generated on self-transitions and other transitions of the Markov chain, respectively Then word segmentation of the sentence is obtained by finding the maximum likelihood path using syllable n-gram scores. In experimental results, the algorithm showed 91.58% word accuracy and 96.69% syllable accuracy for word segmentation of 254 sentence newspaper columns without any spaces. The algorithm improved the word accuracy from 91.00% to 96.27% for word segmentation correction at line breaks and yielded the decomposition accuracy of 96.22% for compound-noun decomposition.
PDF KSCI

A Unit Selection Methods using Variable Break in a Japanese TTS (일본어 TTS의 가변 Break를 이용한 합성단위 선택 방법)

Na, Deok-Su;Bae, Myung-Jin
- Proceedings of the IEEK Conference
- /
- 2008.06a
- /
- pp.983-984
- /
- 2008
This paper proposes a variable break that can offset prediction error as well as a pre-selection methods, based on the variable break, for enhanced unit selection. In Japanese, a sentence consists of several APs (Accentual phrases) and MPs (Major phrases), and the breaks between these phrases must predicted to realize text-to-speech systems. An MP also consists of several APs and plays a decisive role in making synthetic speech natural and understandable because short pauses appear at its boundary. The variable break is defined as a break that is able to change easily from an AP to an MP boundary, or from an MP to an AP boundary. Using CART (Classification and Regression Trees), the variable break is modeled stochastically, and then we pre-select candidate units in the unit-selection process. As the experimental results show, it was possible to complement a break prediction error and improve the naturalness of synthetic speech.
PDF

Consonant Inventories of the Better Cochlear Implant Children in Korea (말지각 능력이 우수한 인공와우 착용 아동들의 조음 능력;음소의 정밀 전사)

Chang, Son-A;Kim, Su-Jin;Sin, Ji-Yeong
- Proceedings of the KSPS conference
- /
- 2007.05a
- /
- pp.274-277
- /
- 2007
The purpose of this study is 1) to describe the phoneme inventories of cochlear implant(CI) children and 2) to describe their utterances using narrow phonetic transcription method. All the subjects had more than 2 year-experience with CI and showed more than 87% open-set sentence perception abilities. Average consonant accuracy was 81.36% and it was improved up to 87.41% when distortion errors were not counted. They showed different error patterns from hearing aid users. The prominent error pattern was weakening of consonants.
PDF

An Analysis of Pronunciation Errors in Word-initial Onglides in English and a Suggestion of Teaching Method (어두에 나타나는 상향 이중모음의 오류분석 및 지도방안 연구)

Choi, Ju-Young;Park, Han-Sang
- Proceedings of the KSPS conference
- /
- 2007.05a
- /
- pp.183-186
- /
- 2007
This study analyzes Korean high school students' pronunciation errors in word-initial onglides in English. For this study, 24 Korean high school students read 34 English words including glide-vowel sequences in word-initial positions and vowel-initial words in a frame sentence. The results showed 2 different error types: glide deletion and vowel distortion. After the analysis of the first recording, the subjects were taught how to pronounce glide-vowel sequences properly in a 60-minute class. Comparison of the analyses of the first and second recordings showed that the subjects improved on the pronunciation of glide-vowel sequences. After the training, the pronunciation errors of diphthongs unique to English, [$j_I$], decreased substantially. However, most subjects still had difficulties in pronouncing [$w{\mho}$], [wu], and [wo]. There was no significant correlation between English course grade and error reduction.
PDF

Semi-Automatic Annotation Tool to Build Large Dependency Tree-Tagged Corpus

Park, Eun-Jin;Kim, Jae-Hoon;Kim, Chang-Hyun;Kim, Young-Kill
- Proceedings of the Korean Society for Language and Information Conference
- /
- 2007.11a
- /
- pp.385-393
- /
- 2007
Corpora annotated with lots of linguistic information are required to develop robust and statistical natural language processing systems. Building such corpora, however, is an expensive, labor-intensive, and time-consuming work. To help the work, we design and implement an annotation tool for establishing a Korean dependency tree-tagged corpus. Compared with other annotation tools, our tool is characterized by the following features: independence of applications, localization of errors, powerful error checking, instant annotated information sharing, user-friendly. Using our tool, we have annotated 100,904 Korean sentences with dependency structures. The number of annotators is 33, the average annotation time is about 4 minutes per sentence, and the total period of the annotation is 5 months. We are confident that we can have accurate and consistent annotations as well as reduced labor and time.
PDF

A study on the Character Correction of the Wrongly Recognized Sentence Marks, Japanese, English, and Chinese Character in the Off-line printed Character Recognition (오프라인 인쇄체 문장부호, 일본 문자, 영문자, 한자 인식에서의 오인식 문자 교 정에 관한 연구)

Lee, Byeong-Hui;Kim, Tae-Gyun
- The Transactions of the Korea Information Processing Society
- /
- v.4 no.1
- /
- pp.184-194
- /
- 1997
In the recent years number of commercial off-line character recognition systems have been appeared in the Korean market. This paper describes a "self -organizing" data structure for representing a large dictionary which can be searched in real time and uses a practical amount of memory, and presents a study on the character correction for off-line printed sentence marks, Japanese, English, and Chinese character recognition. Self-organizing algorithm can be recommenced as particularly appropriate when we have reasons to suspect that the accessing probabilities for individual words will change with time and theme. The wrongly recognized characters generated by OCR systems are collected and analyzed Error types of English characters are reclassified and 0.5% errors are corrected using an English character confusion table with a self-organizing dictionary containing 25,145 English words. And also error types of Chinese characters are classified and 6.1% errors are corrected using a Chinese character confusion table with a self-organizing dictionary carrying 34,593 Chinese words.ese words.
PDF

Arithmetic Fluctuation Effect affected by Induced Emotional Valence (유발된 정서가에 따른 계산 요동의 효과)

Kim, Choong-Myung
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.19 no.2
- /
- pp.185-191
- /
- 2018
This study examined the type and extent of interruption between induced emotion and succeeding arithmetic operation. The experiment was carried out to determine the influence of the induced emotions (anger, joy, and sorrow) and stimulus types (picture and sentence) on the cognitive process load that may block the interactions among the constituents of working memory. The study subjects were 32 undergraduates who were similar with respect to age and education parameters and were especially instructed to attend to induced emotion by imitation of facial expression and to make a correct decision during the remainder calculation task. In the results, the stimulus types did not exhibit any difference but there was a significant difference among the induced emotion types. The difference was observed in slower response time at positive emotion(joy condition) as compared with other emotions(anger and sorrow). More specifically, error and delayed correct response rate for emotion types were analysed to determine which phase the slower response was associated with. Delayed responses of the joy condition by sentence-inducing stimulus were identified with the error rate difference, and those by picture-inducing stimulus with the delayed correct response rate. These findings not only suggest that induced positive emotion increased response time compared to negative emotions, but also imply that picture-inducing stimulus easily affords arithmetic fluctuation whereas sentence-inducing stimulus results in arithmetic failure.
https://doi.org/10.5762/KAIS.2018.19.2.185 인용 PDF KSCI

CHART PARSER FOR ILL-FORMED INPUT SENTENCES (잘못 형성된 입력문장에 대한 CHART PARSER)

KyonghoMin
- Korean Journal of Cognitive Science
- /
- v.4 no.1
- /
- pp.177-212
- /
- 1993
My research is based on the parser for ill-formed input by Mellish in a paper in ACL 27th meeting Proceedings. 1989. My system is composed of two parsers:WFCP and IFCP. When WFCP fails to give the parse tree for the input sentence, the sentence is identified as ill-formed and is parsed by IFCP for error detection and recovery at the syntactic level. My system is indendent of grammatical rules. It does not take into account semantic ill-formedness. My system uses a grammar composed of 25 context-free rules. My system consistes of two major parsing strategies:top-down expection and bottem-up satisfaction. With top-down expectation. rules are retrieved under the inference condition and expaned by inactive arcs. When doing bottom-up parsing. my parser used two modes:Left-Right parsing and Right-to-Left parsing. My system repairs errors sucessfully when the input contains an omitted word or an unknown word substitued for a valid word. Left- corner and right-corner errors are more easily detected and repaired than ill-formed senteces where the error is in teh middle. The deviance note. with repair details, is kept in new inactive arcs which are generated by the error correction procedure. The implementation of my system is quite different from Mellish's. When rules are invoked. my system invokes all rules with minimal inference. My bottom up parsing strategy uses Left-to-Right mode and Right-to-Left mode. My system is bottom-up-parsing-oriented like the chart parser. Errors are repaired in two ways:using top-down hypothesis, and using Need-Chart which keeps the information of expectation and complection of expanded goals by rules. To reduce the number of top-down cycles. all rules are invoked simultaneously and this invocation information is kept in Need-Chart. This idea will be extended for the implementation of multiple error recovery system.

Context-sensitive Spelling Error Correction using Eojeol N-gram (어절 N-gram을 이용한 문맥의존 철자오류 교정)

Kim, Minho;Kwon, Hyuk-Chul;Choi, Sungki
- Journal of KIISE
- /
- v.41 no.12
- /
- pp.1081-1089
- /
- 2014
Context-sensitive spelling-error correction methods are largely classified into rule-based methods and statistical data-based methods, the latter of which is often preferred in research. Statistical error correction methods consider context-sensitive spelling error problems as word-sense disambiguation problems. The method divides a vocabulary pair, for correction, which consists of a correction target vocabulary and a replacement candidate vocabulary, according to the context. The present paper proposes a method that integrates a word-phrase n-gram model into a conventional model in order to improve the performance of the probability model by using a correction vocabulary pair, which was a result of a previous study performed by this research team. The integrated model suggested in this paper includes a method used to interpolate the probability of a sentence calculated through each model and a method used to apply the models, when both methods are sequentially applied. Both aforementioned types of integrated models exhibit relatively high accuracy and reproducibility when compared to conventional models or to a model that uses only an n-gram.
https://doi.org/10.5626/JOK.2014.41.12.1081 인용

Automatic Error Correction System for Erroneous SMS Strings (SMS 변형된 문자열의 자동 오류 교정 시스템)

Kang, Seung-Shik;Chang, Du-Seong
- Journal of KIISE:Software and Applications
- /
- v.35 no.6
- /
- pp.386-391
- /
- 2008
Some spoken word errors that violate grammatical or writing rules occurs frequently in communication environments like mobile phone and messenger. These unexpected errors cause a problem in a language processing system for many applications like speech recognition, text-to-speech translation, and so on. In this paper, we proposed and implemented an automatic correction system of ill-formed words and word spacing errors in SMS sentences that has been the major errors of poor accuracy. We experimented three methods of constructing the word correction dictionary and evaluated the results of those methods. They are (1) manual construction of error words from the vocabulary list of ill-formed communication languages, (2) automatic construction of error dictionary from the manually constructed corpus, and (3) context-dependent method of automatic construction of error dictionary.
PDF KSCI

Search Result 76, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)