• 제목/요약/키워드: computational linguistics

검색결과 50건 처리시간 0.019초

한국어의 어순과 격 할당에 대한 전산적 처리 (A Computational Treatment of Word Order and Case Assignment in Korean)

  • 이기용
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2001년도 제13회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.431-438
    • /
    • 2001
  • 일반적으로 한국어 문장에서 명사는 용언의 항가(valency)에 의해 격이 할당된다. 그러한 이유로, 한국어는 용언이 문장 끝에 온다는 일반적인 제약 이외에는 그 어순이 비교적 자유롭다. 그러나 격 할당과 자유 어순에 대한 여러 가지 비규칙적인 현상들 때문에 문장 분석이나 생성에 문제가 일어난다. 예를 들면, "나 머리 아프다"에서처럼 명사에 격조사가 표시되지 않고 문장이 생성될 수도 있고, "은/는"이나 "도"와 같은 특수조사와 결합할 때는 그 격이 드러나지도 않는다. 어순의 경우, "물이 얼음이 되었다"=/= "얼음이 물이 되었다" 에서처럼 주격이 이중으로 나타나면 어순이 자유롭지 않는 반면, 용언의 어미가 문장 종결형일 때에는 "어서 가자 백두산으로"에서처럼 용언이 문미에 오지 않을 수도 있다. 이 논문은 한국어의 어순과 격 할당에 관한 이러한 문제를 어떻게 처리할 것인가를 보이는 것이 그 목적이다. 문제를 가급적 명시적으로 해결하기 위하여, 본 논문은 문장 분석과 생성에 대한 규칙과 제약 조건들을 형식화하고 문장 처리 과정에서 일어나는 격 할당과 어구 결합 및 배열 과정을 malaga라는 프로그래밍 언어로 구현하여 실험할 것이다.

  • PDF

How has belief modality contributed to formal semantics?

  • Tojo, Satoshi
    • 한국언어정보학회:학술대회논문집
    • /
    • 한국언어정보학회 2007년도 정기학술대회
    • /
    • pp.42-53
    • /
    • 2007
  • Looking back the history of formal treatment of linguistics, we cannot disregard the contribution of possible world semantics. Intensional logic of Montague semantics, DRT (Discourse Representation Theory), mental space, and situation theory are closely related to or compared with the notion of possible world. All these theories have commonly clarified the structure of belief context or uncertain knowledge, employing hypothesized worlds. In this talk, I firstly brief the pedigree of these theories. Next, I will introduce the recent development of modal logic for the representation of (i) knowledge and belief and (ii) time, in which belief modality is precisely discussed together with the accessibility among possible worlds. I will refer to BDI (belief-desire-intention) logic, CTL (computational tree logic), and sphere-based model in belief revision. Finally, I will discuss how these theories could be applied to the further development of analyses of natural language.

  • PDF

Recent Progresses in the Linguistic Modeling of Biological Sequences Based on Formal Language Theory

  • Park, Hyun-Seok;Galbadrakh, Bulgan;Kim, Young-Mi
    • Genomics & Informatics
    • /
    • 제9권1호
    • /
    • pp.5-11
    • /
    • 2011
  • Treating genomes just as languages raises the possibility of producing concise generalizations about information in biological sequences. Grammars used in this way would constitute a model of underlying biological processes or structures, and that grammars may, in fact, serve as an appropriate tool for theory formation. The increasing number of biological sequences that have been yielded further highlights a growing need for developing grammatical systems in bioinformatics. The intent of this review is therefore to list some bibliographic references regarding the recent progresses in the field of grammatical modeling of biological sequences. This review will also contain some sections to briefly introduce basic knowledge about formal language theory, such as the Chomsky hierarchy, for non-experts in computational linguistics, and to provide some helpful pointers to start a deeper investigation into this field.

Temporal Structure on Discourse bevel within the Controlled Information Packaging Theory

  • Lee, Min-Haeng;Lee, Ik-Hwan
    • 한국언어정보학회지:언어와정보
    • /
    • 제6권1호
    • /
    • pp.91-103
    • /
    • 2002
  • The temporal structure of events on the discourse level has long been of great interest in both theoretical and computational linguistics. In this paper, we offer a unified approach to the temporal relationships related to a hierarchical discourse structure. We apply the method of pronoun resolution to the interpretation of tense. It is based on an analysis within the framework of the controlled information packaging theory. A unique aspect of our account is that temporal interpretation across discourse segments in global discourse is subject to the same principles as the interpretation of global anaphora, and that there is thus no need to postulate independent principles to account for the discourse behaviour of tense. In this way, we can neatly explain the general view that tense parallels the anaphoric nature of pronouns.

  • PDF

A Rule-Based Analysis from Raw Korean Text to Morphologically Annotated Corpora

  • Lee, Ki-Yong;Markus Schulze
    • 한국언어정보학회지:언어와정보
    • /
    • 제6권2호
    • /
    • pp.105-128
    • /
    • 2002
  • Morphologically annotated corpora are the basis for many tasks of computational linguistics. Most current approaches use statistically driven methods of morphological analysis, that provide just POS-tags. While this is sufficient for some applications, a rule-based full morphological analysis also yielding lemmatization and segmentation is needed for many others. This work thus aims at 〔1〕 introducing a rule-based Korean morphological analyzer called Kormoran based on the principle of linearity that prohibits any combination of left-to-right or right-to-left analysis or backtracking and then at 〔2〕 showing how it on be used as a POS-tagger by adopting an ordinary technique of preprocessing and also by filtering out irrelevant morpho-syntactic information in analyzed feature structures. It is shown that, besides providing a basis for subsequent syntactic or semantic processing, full morphological analyzers like Kormoran have the greater power of resolving ambiguities than simple POS-taggers. The focus of our present analysis is on Korean text.

  • PDF

전산언어학에서의 한국어 필수논항의 의미역 상정과 재고 (Consideration of Sematic Roles of Korean Subcategory in Computational Linguistics)

  • 김윤정;김완수;옥철영
    • 한국언어정보학회지:언어와정보
    • /
    • 제18권2호
    • /
    • pp.169-199
    • /
    • 2014
  • This study was performed to assume the Sematic role of the obligatory argument of the predicate in a Korean sentence, and to accomplish the task to attach the assumed thematic role to the real corpus. With this study, the maximum of the Sematic role was determined and the Criterion of the Sematic role was set. The maximum of the Sematic role was determined 22. This study arranged the Sematic role of case marker and attached the Sematic role to the predicate of the sentence within The standard Korean Dictionary. The program to attach the thematic role was developed(UTagger-SR). The Sematic role of case marker and Case frame dictionary was equipped in this program. By attaching the Sematic role, it was found that the most important the Sematic role in the korean sentence is the theme of the predicate and the next is the subject of the predicate.

  • PDF

Sentiment Analysis Main Tasks and Applications: A Survey

  • Tedmori, Sara;Awajan, Arafat
    • Journal of Information Processing Systems
    • /
    • 제15권3호
    • /
    • pp.500-519
    • /
    • 2019
  • The blooming of social media has simulated interest in sentiment analysis. Sentiment analysis aims to determine from a specific piece of content the overall attitude of its author in relation to a specific item, product, brand, or service. In sentiment analysis, the focus is on the subjective sentences. Hence, in order to discover and extract the subjective information from a given text, researchers have applied various methods in computational linguistics, natural language processing, and text analysis. The aim of this paper is to provide an in-depth up-to-date study of the sentiment analysis algorithms in order to familiarize with other works done in the subject. The paper focuses on the main tasks and applications of sentiment analysis. State-of-the-art algorithms, methodologies and techniques have been categorized and summarized to facilitate future research in this field.

Citation-based Article Summarization using a Combination of Lexical Text Similarities: Evaluation with Computational Linguistics Literature Summarization Datasets

  • Kang, In-Su
    • 한국컴퓨터정보학회논문지
    • /
    • 제24권7호
    • /
    • pp.31-37
    • /
    • 2019
  • Citation-based article summarization is to create a shortened text for an academic article, reflecting the content of citing sentences which contain other's thoughts about the target article to be summarized. To deal with the problem, this study introduces an extractive summarization method based on calculating a linear combination of various sentence salience scores, which represent the degrees to which a candidate sentence reflects the content of author's abstract text, reader's citing text, and the target article to be summarized. In the current study, salience scores are obtained by computing surface-level textual similarities. Experiments using CL-SciSumm datasets show that the proposed method parallels or outperforms the previous approaches in ROUGE evaluations against SciSumm-2017 human summaries and SciSumm-2016/2017 community summaries.

A Study on the Diachronic Evolution of Ancient Chinese Vocabulary Based on a Large-Scale Rough Annotated Corpus

  • Yuan, Yiguo;Li, Bin
    • 아시아태평양코퍼스연구
    • /
    • 제2권2호
    • /
    • pp.31-41
    • /
    • 2021
  • This paper makes a quantitative analysis of the diachronic evolution of ancient Chinese vocabulary by constructing and counting a large-scale rough annotated corpus. The texts from Si Ku Quan Shu (a collection of Chinese ancient books) are automatically segmented to obtain ancient Chinese vocabulary with time information, which is used to the statistics on word frequency, standardized type/token ratio and proportion of monosyllabic words and dissyllabic words. Through data analysis, this study has the following four findings. Firstly, the high-frequency words in ancient Chinese are stable to a certain extent. Secondly, there is no obvious dissyllabic trend in ancient Chinese vocabulary. Moreover, the Northern and Southern Dynasties (420-589 AD) and Yuan Dynasty (1271-1368 AD) are probably the two periods with the most abundant vocabulary in ancient Chinese. Finally, the unique words with high frequency in each dynasty are mainly official titles with real power. These findings break away from qualitative methods used in traditional researches on Chinese language history and instead uses quantitative methods to draw macroscopic conclusions from large-scale corpus.

A Simple Syntax for Complex Semantics

  • Lee, Kiyong
    • 한국언어정보학회:학술대회논문집
    • /
    • 한국언어정보학회 2002년도 Language, Information, and Computation Proceedings of The 16th Pacific Asia Conference
    • /
    • pp.2-27
    • /
    • 2002
  • As pact of a long-ranged project that aims at establishing database-theoretic semantics as a model of computational semantics, this presentation focuses on the development of a syntactic component for processing strings of words or sentences to construct semantic data structures. For design arid modeling purposes, the present treatment will be restricted to the analysis of some problematic constructions of Korean involving semi-free word order, conjunction arid temporal anchoring, and adnominal modification and antecedent binding. The present work heavily relies on Hausser's (1999, 2000) SLIM theory for language that is based on surface compositionality, time-linearity arid two other conditions on natural language processing. Time-linear syntax for natural language has been shown to be conceptually simple and computationally efficient. The associated semantics is complex, however, because it must deal with situated language involving interactive multi-agents. Nevertheless, by processing input word strings in a time-linear mode, the syntax cart incrementally construct the necessary semantic structures for relevant queries and valid inferences. The fragment of Korean syntax will be implemented in Malaga, a C-type implementation language that was enriched for both programming and debugging purposes arid that was particluarly made suitable for implementing in Left-Associative Grammar. This presentation will show how the system of syntactic rules with constraining subrules processes Korean sentences in a step-by-step time-linear manner to incrementally construct semantic data structures that mainly specify relations with their argument, temporal, and binding structures.

  • PDF