• Title, Summary, Keyword: Corpus Linguistics

Search Result 57, Processing Time 0.039 seconds

GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction

  • Oh, So-Yeon;Kim, Ji-Hyeon;Kim, Seo-Jin;Nam, Hee-Jo;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • v.16 no.3
    • /
    • pp.75-77
    • /
    • 2018
  • Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.

Corpus Linguistics as Necessary Concept for Korean Lexicography (뭉치 언어학 : 사전 편찬의 필수적 개념)

  • Lee, Sang-Sup
    • Annual Conference on Human and Language Technology
    • /
    • /
    • pp.73-76
    • /
    • 1989
  • 기존 한국어 사전들은 자연 언어로서의 한국어에 대한 실질적 조사 연구에 근거하고 있지 않다는 점에서 치명적 결함을 안고 있다. 최근 유럽에서 개발 응용되고 있는 ${\ulcorner}$뭉치 언어학${\lrcorner}$(corpus linguistics) 은 컴퓨터의 급격한 발전에 힘입어 대규모 용량의 자연언어 자료를 다각적으로 처리할 수 있는 방법을 고안할 수 있게 해주고 있다. 예컨대 영국 버밍엄 대학의 COBUILD 계획은 전혀 새로운 개념의 영어 사전을 편찬하는 데에 성공했다. 한국어 사전의 편찬도 뭉치 언어학적 방법의 도입으로 가능할 것으로 믿어, 필자가 작성한 작은 ${\ulcorner}$뭉치${\lrcorner}$로부터의 실례를 제시한다.

  • PDF

-eullanjira Construction of the Southwestern Dialect in Korea (서남방언의 '-을란지라' 구문 연구)

  • KIM, Ji-eun
    • Korean Linguistics
    • /
    • v.74
    • /
    • pp.1-24
    • /
    • 2017
  • This paper investigated -eullanjira sentence as a kind of construction of the Southwestern dialect in Korea. Five informants were selected to form the main corpus of -eullanjira. Through analyzing the corpus, its semantic, syntactic and morphological characteristics were figured out. Firstly, a view of construction grammar was adopted to capture the semantic and syntactic characteristics of -eullanjira. The construction of -eullanjira was established as "Xdo Yeullanjira Z". Syntactically, -do was found to be a common auxiliary particle, which allowed nouns, adverbs, verbs and adjectives to appear at the position of X, while only verbs and adjectives could appear at the position of Y. Subject-honorific, causative and passive prefinal endings could coexist with Y, while tense and modal prefinal endings could not. Z was an embedded clause, which had the semantic feature of [-DOUBT], meaning 'it should be done undoubtedly'. The formation of -eullanjira was next examined both diachronically and synchronically. It was found there was a conjuntive ending of Middle Korean, corresponding -eullanjira, namely, -landai. Finally, -eullanjira was newly analyzed as [[-eulla-]+[-n-ji-ra]].

KKMA : A Tool for Utilizing Sejong Corpus based on Relational Database (꼬꼬마 : 관계형 데이터베이스를 활용한 세종 말뭉치 활용 도구)

  • Lee, Dong-Joo;Yeon, Jong-Heum;Hwang, In-Beom;Lee, Sang-Goo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.11
    • /
    • pp.1046-1050
    • /
    • 2010
  • Corpus is widely used as a fundamental resource for various purposes in linguistic studies. There are several large corpora such as Sejong corpus in Korea. However, it is hard to find a tool utilizing such large corpora. In this paper, we propose a method of utilizing Sejong corpus based on the relational database. We designed the relational database scheme to store corpus and implemented a Web-based application so that many researchers can easily access and utilize the Sejong corpus.

Using Corpora for the Study of Word-Formation: A Case Study in English Negative Prefixation

  • Kwon, Heok-Seung
    • Korean Journal of English Language and Linguistics
    • /
    • v.1 no.3
    • /
    • pp.369-386
    • /
    • 2001
  • This paper will show that traditional approaches to the derivation of different negative words have been of an essentially hypothetical nature, based on either linguists' intuitions or rather scant evidence, and that native-speaker dictionary entries show meaning potentials (rather than meanings) which are in fact linguistic and cognitive prototypes. The purpose of this paper is to demonstrate that using a large corpus of natural language can provide better answers to questions about word-formation (i.e., with particular reference to negative prefixation) than any other source of information.

  • PDF

A Corpus based Analysis of the Argument Structure of Korean Perception Verbs (코퍼스를 이용한 한국어 지각동사의 논항구조 분석)

  • Chung, Eu-Gene;Kang, Beom-Mo
    • Annual Conference on Human and Language Technology
    • /
    • /
    • pp.316-323
    • /
    • 1999
  • 동사의 다의성은 결합되는 어휘에 따른 의미확장으로 설명된다. 본고에서는 한국어 지각동사의 기본의미가 갖는 논항관계를 바탕으로 코퍼스를 이용하여 다른 어휘와의 연여관계를 관찰함으로써 공기하는 어휘를 체계화시키고 기본의미와 의미확장의 실제 사용빈도를 조사하는데 그 의의가 있다.

  • PDF

A Language Model Approach to "The Vegetarian" (채식주의자: 랭귀지 모델 접근)

  • Kim, Jaejun;Kwon, Junhyeok;Kim, Yoolae;Park, Myung-Kwan;Song, Sanghoun
    • Annual Conference on Human and Language Technology
    • /
    • /
    • pp.260-263
    • /
    • 2017
  • This paper is to broaden the possible spectrums of analyzing the Korean-written novel "The Vegetarian" by using the computational linguistics program. Through the use of language model, which was usually used in bi-gram analysis in corpus linguistics, to the International Man Booker award winning novel, the characteristics of "The Vegetarian" is investigated by comparing it to the English-written novel "A Little Life".

  • PDF

A Language Model Approach to "The Vegetarian" (채식주의자: 랭귀지 모델 접근)

  • Kim, Jaejun;Kwon, Junhyeok;Kim, Yoolae;Park, Myung-Kwan;Song, Sanghoun
    • 한국어정보학회:학술대회논문집
    • /
    • /
    • pp.260-263
    • /
    • 2017
  • This paper is to broaden the possible spectrums of analyzing the Korean-written novel "The Vegetarian" by using the computational linguistics program. Through the use of language model, which was usually used in bi-gram analysis in corpus linguistics, to the International Man Booker award winning novel, the characteristics of "The Vegetarian" is investigated by comparing it to the English-written novel "A Little Life".

  • PDF

Corpus-Based Literary Analysis (코퍼스에 기반한 문학텍스트 분석)

  • Ha, Myung-Jeong
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.9
    • /
    • pp.440-447
    • /
    • 2013
  • Recently corpus linguistic analyses enable researchers to examine meanings and structural features of data, that is not detected intuitively. While the potential of corpus linguistic techniques has been established and demonstrated for non-literary data, corpus stylistic analyses have been rarely performed in terms of the analysis of literature. Specifically this paper explores keywords and their role in text analysis, which is primary part of corpus linguistic analyses. This paper focuses on the application of techniques from corpus linguistics and the interpretation of results. This paper addresses the question of what is to be gained from keyword analysis by scrutinizing keywords in Shakespeare's Romeo and Juliet.

An Analysis on Korean Intonation Patterns Using Momel (Momel을 이용한 한국어의 억양 패턴 분석)

  • Kim, Sun-Hee;Yoo, Hyun-Ji
    • Proceedings of the KSPS conference
    • /
    • /
    • pp.243-246
    • /
    • 2007
  • This paper aims to propose an intonation labeling method using Momel and to present results of analyzing a speech corpus consisting of 80 passages pronounced by 4 speakers (2 male and 2 female) using the proposed method. The results show that Momel works well enough to derive meaningful pitch targets, which could be labeled with H and L tones. On the other hand, the results of the analysis of Korean speech corpus correspond to earlier work.

  • PDF