• Title/Summary/Keyword: Sentence Error

Search Result 77, Processing Time 0.032 seconds

Sentence Unit De-noising Training Method for Korean Grammar Error Correction Model (한국어 문법 오류 교정 모델을 위한 문장 단위 디노이징 학습법)

  • Hoonrae Kim;Yunsu Kim;Gary Geunbae Lee
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.507-511
    • /
    • 2022
  • 문법 교정 모델은 입력된 텍스트에 존재하는 문법 오류를 탐지하여 이를 문법적으로 옳게 고치는 작업을 수행하며, 학습자에게 더 나은 학습 경험을 제공하기 위해 높은 정확도와 재현율을 필요로 한다. 이를 위해 최근 연구에서는 문단 단위 사전 학습을 완료한 모델을 맞춤법 교정 데이터셋으로 미세 조정하여 사용한다. 하지만 본 연구에서는 기존 사전 학습 방법이 문법 교정에 적합하지 않다고 판단하여 문단 단위 데이터셋을 문장 단위로 나눈 뒤 각 문장에 G2P 노이즈와 편집거리 기반 노이즈를 추가한 데이터셋을 제작하였다. 그리고 문단 단위 사전 학습한 모델에 해당 데이터셋으로 문장 단위 디노이징 사전 학습을 추가했고, 그 결과 성능이 향상되었다. 노이즈 없이 문장 단위로 분할된 데이터셋을 사용하여 디노이징 사전 학습한 모델을 통해 문장 단위 분할의 효과를 검증하고자 했고, 디노이징 사전 학습하지 않은 기존 모델보다 성능이 향상되는 것을 확인하였다. 또한 둘 중 하나의 노이즈만을 사용하여 디노이징 사전 학습한 두 모델의 성능이 큰 차이를 보이지 않는 것을 통해 인공적인 무작위 편집거리 노이즈만을 사용한 모델이 언어학적 지식이 필요한 G2P 노이즈만을 사용한 모델에 필적하는 성능을 보일 수 있다는 것을 확인할 수 있었다.

  • PDF

Performance Improvement of Context-Sensitive Spelling Error Correction Techniques using Knowledge Graph Embedding of Korean WordNet (alias. KorLex) (한국어 어휘 의미망(alias. KorLex)의 지식 그래프 임베딩을 이용한 문맥의존 철자오류 교정 기법의 성능 향상)

  • Lee, Jung-Hun;Cho, Sanghyun;Kwon, Hyuk-Chul
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.3
    • /
    • pp.493-501
    • /
    • 2022
  • This paper is a study on context-sensitive spelling error correction and uses the Korean WordNet (KorLex)[1] that defines the relationship between words as a graph to improve the performance of the correction[2] based on the vector information of the word embedded in the correction technique. The Korean WordNet replaced WordNet[3] developed at Princeton University in the United States and was additionally constructed for Korean. In order to learn a semantic network in graph form or to use it for learned vector information, it is necessary to transform it into a vector form by embedding learning. For transformation, we list the nodes (limited number) in a line format like a sentence in a graph in the form of a network before the training input. One of the learning techniques that use this strategy is Deepwalk[4]. DeepWalk is used to learn graphs between words in the Korean WordNet. The graph embedding information is used in concatenation with the word vector information of the learned language model for correction, and the final correction word is determined by the cosine distance value between the vectors. In this paper, In order to test whether the information of graph embedding affects the improvement of the performance of context- sensitive spelling error correction, a confused word pair was constructed and tested from the perspective of Word Sense Disambiguation(WSD). In the experimental results, the average correction performance of all confused word pairs was improved by 2.24% compared to the baseline correction performance.

A Modified Binary n-gram Algorithm for the postprocessing of the Automatic Document Reading (자동문서판독 후처리를 위한 수정된 n-gram 알고리즘)

  • Kim, Il-Hwoe;Ryoo, Keun-Ho;Lee, Cheol-Hee
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1352-1355
    • /
    • 1987
  • This Paper proposed the modified binary n-gram algorithm for the contextual post processing system in English sentence. Backward gram was used to correct the first position error in a word. It is not requires additional storage but more times of comparison it allows interactive correction routine. Experiments were implemented using PASCAL language on a micro computer, IBM PC/XT. This algorithm improves the correction rate around $4{\sim}5%$ on a limited experimental environments.

  • PDF

Home Network Control System using SMS Dialog Interface (SMS를 통한 홈네트워크 제어 시스템)

  • Chang, Du-Seong;Kim, Hyun-Jeong;Eun, Ji-Hyun;Kang, Seung-Shik;Koo, Myoung-Wan
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.330-333
    • /
    • 2007
  • This paper presents a dialogue interface using the dialogue management system as a method for controlling home appliances in Home Network Services. In order to realize this type of dialogue interface, we annotated 96,000 utterance pair sized dialogue set and developed an example-based dialogue system. This paper introduces the automatic error correction module for the SMS-styled sentence. With this module we increase the accuracy of NLU(Natural Language Understanding) module. Our NLU module shows an accuracy of 86.2%, which is an improvement of 5.25% over than the baseline. The task completeness of the proposed SMS dialogue interface was 82%.

  • PDF

Effects of Feedback Types on Writing Accuracy, Fluency, and Complexity

  • Park, Chongwon
    • English Language & Literature Teaching
    • /
    • v.17 no.4
    • /
    • pp.207-227
    • /
    • 2011
  • This paper investigates how two different modes of feedback (selective vs. comprehensive) affect selected students' writing development in terms of three different types of measurement (accuracy, fluency, and complexity). 139 university students participated in the study, and 278 writing samples were analyzed. The results of the study indicate that participants who received selective feedback wrote more accurately and fluently than their counterparts. However, in terms of complexity, both selective and comprehensive groups showed no sign of improvement in semester-based investigations. The results of this study support Skehan's (2009) theory of trade-off effects, suggesting that 'natural' tension exists between accuracy and complexity when resources are limited. Moreover, this finding contrasts with the theory of Cognition Hypothesis, which proposes that task complexity will be associated with increases in complexity and accuracy. In the study, selected participants (N=21) strongly nominated their error sources as unfamiliarity toward using key words, usage, transition, and sentence types. This study not only contributes to the accumulation of our current knowledge in the related area of theory, but offers educational implications for those who are dealing with intermediate-level students when deciding what particular teaching content should constitute a priority within a limited instructional period.

  • PDF

Performance of Pseudomorpheme-Based Speech Recognition Units Obtained by Unsupervised Segmentation and Merging (비교사 분할 및 병합으로 구한 의사형태소 음성인식 단위의 성능)

  • Bang, Jeong-Uk;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.155-164
    • /
    • 2014
  • This paper proposes a new method to determine the recognition units for large vocabulary continuous speech recognition (LVCSR) in Korean by applying unsupervised segmentation and merging. In the proposed method, a text sentence is segmented into morphemes and position information is added to morphemes. Then submorpheme units are obtained by splitting the morpheme units through the maximization of posterior probability terms. The posterior probability terms are computed from the morpheme frequency distribution, the morpheme length distribution, and the morpheme frequency-of-frequency distribution. Finally, the recognition units are obtained by sequentially merging the submorpheme pair with the highest frequency. Computer experiments are conducted using a Korean LVCSR with a 100k word vocabulary and a trigram language model obtained by a 300 million eojeol (word phrase) corpus. The proposed method is shown to reduce the out-of-vocabulary rate to 1.8% and reduce the syllable error rate relatively by 14.0%.

Problems of Traditional Medicine Research Papers in Korea (한국 한의학 논문의 몇 가지 문제점 -학술논문작성법과 비교를 중심으로-)

  • Lee Sun-Dong;Lee Yong-Bum
    • Journal of Society of Preventive Korean Medicine
    • /
    • v.7 no.2
    • /
    • pp.35-44
    • /
    • 2003
  • Research papers must be expression given by letter that had fixed types, rules and universal sentence languages to inform many persons about study results until present when these research paper was considered review, oriental medical papers had some problems, as like difficult contents not to inform absolutely, usage of past language, not considerable in logic history and continuity, and have used not fixed types and rules, error of statistical analysis and research construction. Also centered clinic that had trends important treat tools by drugs ,acupuncture and moxa studies. In briefly, papers of korea traditional medicine had some several problems. Oriental medicine has very concerns not only korean but other countries in present and then it will increases much more study in future. For the purpose cope with this concerns korea traditional medicine researchers need much educations of relation department and must recognized much concerns.

  • PDF

ERRATUM : 'LYMANα EMITTERS BEYOND REDSHIFT 5: THE DAWN OF GALAXY FORMATION' (JKAS, 36, 123, [2003])

  • Taniguchi, Yoshiaki;Shioya, Yasuhiro;Ajiki, Masaru;Fujita, Shinobu S.;Nagao, Tohru;Murayama, Takashi
    • Journal of The Korean Astronomical Society
    • /
    • v.36 no.4
    • /
    • pp.283-283
    • /
    • 2003
  • The first sentence in the second paragraph of INTRODUCTION, 'The first discovery of a galaxy beyond z=5 was reported by Weymann et al. (1998); HDF 4-470.3 at z=5.60.' should be read as 'The first discovery of a galaxy beyond z=5 was reported by Dey et al. (1998); 0140+326 RD1 at z=5.34'. The authors sincerely regret this error.

The Usage of Phoneme Duration Information for Rejecting Garbage Sentences (소음문장 제거를 위한 음소지속시간 사용)

  • Koo Myoung-Wan;Kim Ho-Kyoung;Park Sung-Joon;Kim Jae-In
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.219-222
    • /
    • 2003
  • In this paper, we study the usage of phoneme duration information for rejection garbage sentence. First, we build a phoneme duration modeling in a speech recognition system based on dicicion tree state tying, We assume that phone duration has a Gamma distribution. Next, we build a verification module in which word-level confidence measure is used. Finally, we make a comparative study on phoneme duration with speech DB obtained from the live system. This DB consistes of OOT(out-of-task) and ING(in-grammar) utterences. the usage of phone duration information yields that OOT recognition rate is improved by 46% and that another 8.4% error rate is reduced when combined with utterence verification module.

  • PDF

Fast speaker adaptation using extended diagonal linear transformation for deep neural networks

  • Kim, Donghyun;Kim, Sanghun
    • ETRI Journal
    • /
    • v.41 no.1
    • /
    • pp.109-116
    • /
    • 2019
  • This paper explores new techniques that are based on a hidden-layer linear transformation for fast speaker adaptation used in deep neural networks (DNNs). Conventional methods using affine transformations are ineffective because they require a relatively large number of parameters to perform. Meanwhile, methods that employ singular-value decomposition (SVD) are utilized because they are effective at reducing adaptive parameters. However, a matrix decomposition is computationally expensive when using online services. We propose the use of an extended diagonal linear transformation method to minimize adaptation parameters without SVD to increase the performance level for tasks that require smaller degrees of adaptation. In Korean large vocabulary continuous speech recognition (LVCSR) tasks, the proposed method shows significant improvements with error-reduction rates of 8.4% and 17.1% in five and 50 conversational sentence adaptations, respectively. Compared with the adaptation methods using SVD, there is an increased recognition performance with fewer parameters.