• 제목/요약/키워드: sentence structure

Search Result 228, Processing Time 0.263 seconds

Korean Syntactic Analysis by Using Clausal Segmentation of Embedded Clause (내포문의 단문 분할을 이용한 한국어 구문 분석)

  • Lee, Hyeon-Yeong;Lee, Yong-Seok
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.1
    • /
    • pp.50-58
    • /
    • 2008
  • Most of Korean sentences are complex sentences which consisted of main clause and embedded clause. These complex sentences have more than one predicate and this causes various syntactic ambiguities in syntactic analysis. These ambiguities are caused by phrase attachment problems which are occurred by the modifying scope of embedded clause. To resolve it, we decide the scope of embedded clause in the sentence and consider this clause as a unit of syntactic category. In this paper, we use sentence patterns information(SPI) and syntactic properties of Korean to decide a scope of embedded clause. First, we split the complex sentence into embedded clause and main clause by the method that embedded clause must have maximal arguments. This work is done by the SPI of the predicate in the embedded clause. And then, the role of this embedded clause is converted into a noun phrases or adverbial phrases in the main clause by the properties of Korean syntax. By this method, the structure of complex sentence is exchanged into a clause. And some phrases attachment problem, which is mainly caused by the modifying scope, is resolved easily. In this paper, we call this method clausal segmentation for embedded clause. By empirical results of parsing 1000 sentences, we found that our method decreases 88.32% of syntactic ambiguities compared to the method that doesn't use SPI and split the sentence with basic clauses.

A Comparative study on the Effectiveness of Segmentation Strategies for Korean Word and Sentence Classification tasks (한국어 단어 및 문장 분류 태스크를 위한 분절 전략의 효과성 연구)

  • Kim, Jin-Sung;Kim, Gyeong-min;Son, Jun-young;Park, Jeongbae;Lim, Heui-seok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.12
    • /
    • pp.39-47
    • /
    • 2021
  • The construction of high-quality input features through effective segmentation is essential for increasing the sentence comprehension of a language model. Improving the quality of them directly affects the performance of the downstream task. This paper comparatively studies the segmentation that effectively reflects the linguistic characteristics of Korean regarding word and sentence classification. The segmentation types are defined in four categories: eojeol, morpheme, syllable and subchar, and pre-training is carried out using the RoBERTa model structure. By dividing tasks into a sentence group and a word group, we analyze the tendency within a group and the difference between the groups. By the model with subchar-level segmentation showing higher performance than other strategies by maximal NSMC: +0.62%, KorNLI: +2.38%, KorSTS: +2.41% in sentence classification, and the model with syllable-level showing higher performance at maximum NER: +0.7%, SRL: +0.61% in word classification, the experimental results confirm the effectiveness of those schemes.

Exploiting Chunking for Dependency Parsing in Korean (한국어에서 의존 구문분석을 위한 구묶음의 활용)

  • Namgoong, Young;Kim, Jae-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.7
    • /
    • pp.291-298
    • /
    • 2022
  • In this paper, we present a method for dependency parsing with chunking in Korean. Dependency parsing is a task of determining a governor of every word in a sentence. In general, we used to determine the syntactic governor in Korean and should transform the syntactic structure into semantic structure for further processing like semantic analysis in natural language processing. There is a notorious problem to determine whether syntactic or semantic governor. For example, the syntactic governor of the word "먹고 (eat)" in the sentence "밥을 먹고 싶다 (would like to eat)" is "싶다 (would like to)", which is an auxiliary verb and therefore can not be a semantic governor. In order to mitigate this somewhat, we propose a Korean dependency parsing after chunking, which is a process of segmenting a sentence into constituents. A constituent is a word or a group of words that function as a single unit within a dependency structure and is called a chunk in this paper. Compared to traditional dependency parsing, there are some advantage of the proposed method: (1) The number of input units in parsing can be reduced and then the parsing speed could be faster. (2) The effectiveness of parsing can be improved by considering the relation between two head words in chunks. Through experiments for Sejong dependency corpus, we have shown that the USA and LAS of the proposed method are 86.48% and 84.56%, respectively and the number of input units is reduced by about 22%p.

Sijo Works seen in terms of Sentence Structure (문장구조에서 본 현대시조 연구)

  • Im, Jong-Chan
    • Sijohaknonchong
    • /
    • v.25
    • /
    • pp.5-27
    • /
    • 2006
  • This paper aims at examining how sijo works, including ancient sijo works, those published before the 1960s, those written by China-residing Koreans, and those published in the 2000s, convey the poetic meaning in terms of sentence structure. Firstly, ancient sijo works, those published before the 1960s, and those written by China-residing Koreans, have sentences. whose meaning the readers can easily grasp, with simple structures and little rhetoric words. But moderns works published In the 2000s (modern sijo works after) are mingled with too many rhetoric expressions, sometimes misused. Secondly, ancient sijo works, those published before the 1960s, and those written by China-residing Koreans, having a clarified subject-verb context. are easily understood by the readers. But, in modern sijo works, there are many cases with an unclarified subject-verb context and redundant rhetoric words, which will cause misunderstanding of the meaning of the work. Thirdly, in ancient sijo works. those published before the 1960s and those written by China-residing Koreans, each of the three statements (called in) in a stanza is separate from the others in context. But, in some modern sijo works, the first and second statements (called chojang and jungjang) fall into just rhetoric parts for the last statement (called jongjang), and each of them is not read as an independent statement. Fourthly, there are some cases whose forms are distant from those of siio works. but are written in three statements like traditional sijo works. Regular poems, though written in regular rhythm, should be also acoustically regular. Sijo works should be easily understood when recited. If not, they are basically far from sijo works. If modern sijo works should overcome their easy expressions and simplicity of themes, they should be composed through using not complicated sentence structures but brand-new metaphors, clear images, and fresh themes.

  • PDF

Expertise Service Model Aiding Local Information Writing on the Web (PC통신과 웹에서 지역알림정보의 작성을 돕는 전문가적인 서비스 모형에 관한 연구: 지역주민의 견문을 중심으로)

  • 이태영
    • Journal of the Korean Society for information Management
    • /
    • v.16 no.1
    • /
    • pp.89-117
    • /
    • 1999
  • (1) structure of writings, (2) construction of paragraphs. (3) structure of sentences, (4) use of words were most important things on local information composition. To be expertise system, the Knowledge-base had the writing frames for essay structure and paragraph frames for construction of paragraph. To refer to the clause and sentence structure and to the use of words, Example dictionaries were offered. It is necessary to make nore precise rule for (1) extracting subject name of the frames and (2) making representative word of sentences for advanced system in the future.

  • PDF

A Study on Rhythmic Units in Korean -with Respect to Syntactic Structure- (한국어의 리듬 단위에 관한 연구 - 문법 구조와 관련하여)

  • Kim, Sun-Mi
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.224-228
    • /
    • 1996
  • This paper is intended as a study on how an utterance is divided into rhythmic units in Standard Korean with respect to its syntactic structure. With respect to the data in this study I used 150 sentences which contained similar number of words and various syntactic structures. Those sentences were read by 7 speakers of Seoul dialect in a conversation style. Each sentence was read twice in a normal speed and twice in a fast speed. As a total, 4200 sentences were recorded. Then listening to them, the author marked the sentences with two kinds of boundaries i.e. strong and weak. To explore the relationship between rhythmic units and syntactic structure I devised a framework of grammatical symbols. Each symbol is designed to have both syntactic and morphological information at the same time. So I assigned those grammatical symbols to the sentences. Having sentences marked with grammatical symbols on the one hand, and with the rhythmic boundaries on the other hand, 1 could show the relationship between rhythmic units and syntactic structure; which syntactic structures are likely to be pronounced as one rhythmic unit, and which are on the rhythmic boundaries.

  • PDF

Space Structure Character of Hangeul Typography (한글 타이포그래피의 공간 구조적 특성)

  • Kim, Young-Kook;Park, Seong-Hyeon
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.3
    • /
    • pp.86-96
    • /
    • 2008
  • General development basis of letter system is recognized by formative value in terms of its function and structure. principle of clustered writing is the most significant feature of Hangeul typography as considered that it is based on function and formativeness. Thus, not only by changes with its form but also by its characteristic syllable combination, space structure is made as consonants and vowels are combined in single letter, then the combination develop into word, sentence, paragraph to make second, third space structure character. This character has significant impact on readability that is core function of typography. With this property, space structure character is regarded as very important component of Hangeul typography. First, space structure character of Hangeul typography is reviewed by relating it to visual perception of gestalt psychology and compared square-framed letter and framed latter By applying square-framed letter and framed latter in same sentence, legibility and readability were studied. Researcher has found that space structure character of Hangeul typography has significant impact on its function, and in terms of future design, it is very critical not only for design but also for communication environment as space structure formativeness of Hangeul typography interact with communication that is basic concept.

A Study on the Automatic Abstracting System for Journal Articles in Korean in the Field of Microbiology (한국어 초록 작성의 자동화에 관한 연구 -미생물학분야 학술지의 논문을 대상으로-)

  • 이태영
    • Journal of the Korean Society for information Management
    • /
    • v.9 no.2
    • /
    • pp.43-79
    • /
    • 1992
  • This study proposes a Korean aut.omatic abstracting system in microbiology by applying Case Grammar, Concept Dependency Grammar, and Unification-Based Grammar(PATR- I[. DCG). The sample abstracts are analyzesd to clarify the ideal structure of abstract-a purpose sentence as first sentcnce, 2-3 method and result sentences as middle sentences, and a conclusion sentence as last sentences. To extract and refine the representative sentences constructing an automated abstract requires tht. rules giving the role features to nouns. And t.he rules rearranging the extracted sentences and the rules generating the abstract sentences arc also required. Evaluat.ing the effic~ency of this system. the method used in this automatic abstracting system needs thc more precise role features and the rules of sentence generation to reach the level of the author abstracts.

  • PDF

Symbolizing Numbers to Improve Neural Machine Translation (숫자 기호화를 통한 신경기계번역 성능 향상)

  • Kang, Cheongwoong;Ro, Youngheon;Kim, Jisu;Choi, Heeyoul
    • Journal of Digital Contents Society
    • /
    • v.19 no.6
    • /
    • pp.1161-1167
    • /
    • 2018
  • The development of machine learning has enabled machines to perform delicate tasks that only humans could do, and thus many companies have introduced machine learning based translators. Existing translators have good performances but they have problems in number translation. The translators often mistranslate numbers when the input sentence includes a large number. Furthermore, the output sentence structure completely changes even if only one number in the input sentence changes. In this paper, first, we optimized a neural machine translation model architecture that uses bidirectional RNN, LSTM, and the attention mechanism through data cleansing and changing the dictionary size. Then, we implemented a number-processing algorithm specialized in number translation and applied it to the neural machine translation model to solve the problems above. The paper includes the data cleansing method, an optimal dictionary size and the number-processing algorithm, as well as experiment results for translation performance based on the BLEU score.

Prediction of Prosodic Break Using Syntactic Relations and Prosodic Features (구문 관계와 운율 특성을 이용한 한국어 운율구 경계 예측)

  • Jung, Young-Im;Cho, Sun-Ho;Yoon, Ae-Sun;Kwon, Hyuk-Chul
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.1
    • /
    • pp.89-105
    • /
    • 2008
  • In this paper, we suggest a rule-based system for the prediction of natural prosodic phrase breaks from Korean texts. For the implementation of the rule-based system, (1) sentence constituents are sub-categorized according to their syntactic functions, (2) syntactic phrases are recognized using the dependency relations among sub-categorized constituents, (3) rules for predicting prosodic phrase breaks are created. In addition, (4) the length of syntactic phrases and sentences, the position of syntactic phrases in a sentence, sense information of contextual words have been considered as to determine the variable prosodic phrase breaks. Based on these rules and features, we obtained the accuracy over 90% in predicting the position of major break and no break which have high correlation with the syntactic structure of the sentence. As for the overall accuracy in predicting the whole prosodic phrase breaks, the suggested system shows Break_Correct of 87.18% and Juncture Correct of 89.27% which is higher than that of other models.

  • PDF