• Title, Summary, Keyword: 코퍼스언어학

Search Result 23, Processing Time 0.038 seconds

Corpus-Linguistical Analysis of Newspaper Articles (신문 기사의 코퍼스 언어학적 분석)

  • Song, Kyung-Hwa;Kang, Beom-Mo
    • Annual Conference on Human and Language Technology
    • /
    • /
    • pp.7-14
    • /
    • 2006
  • 본 연구에서는 신문 기사에 대한 실증적 언어 분석을 목적으로 하여, <21세기 세종계획>에 의해 구축된 대용량의 신문 기사 말뭉치를 다양한 각도로 계량화하여 분석한다. 신문 기사를 표제, 전문, 본문의 구성으로 나누고 각 구성의 특징에 따라 형태 분석 말뭉치, 형태의미 분석 말뭉치, 구문 분석 말뭉치를 이용하여 분석한다. 본 연구는 대량의 신문 기사 말뭉치를 이용한 계량적 방법이라는데 의의가 있다 이러한 연구 방법을 통하여 기존의 직관을 이용한 연구 방법들과 차별화 된 실증적 연구로서 신문 이론을 검증하고, 신문 기사의 새로운 언어 현상을 발견할 수 있을 것이다.

  • PDF

Disambiguation on the Analysis of Korean Complex Nominals, Using Probabilistic CFG Parsing (확률적 CFG 파싱을 활용한 한국어 복합명사 구조 분석의 중의성 해소)

  • Kim, Dong-Sung
    • Annual Conference on Human and Language Technology
    • /
    • /
    • pp.61-66
    • /
    • 2011
  • 본 논문은 한국어 복합명사 구조의 분석을 목적으로 한다. 연구는 이론 언어학뿐만이 아니라 정보처리, 정보검색과 같은 언어의 전산적 처리에서도 중요한다. 복합명사 구조는 크게 외심구조와 내심구조로 나뉘며 내심구조의 경우에 좌분지나 우분지 구조로 분석이 되어야 하는 중의성이 있다. 기존의 Lauer 모델은 사전적 정보에서 발견되는 확률 정보를 구조 정보에 연결하기 위한 모델로 의존모델과 인접모델을 제시하였다. 본 연구에서는 구조에 기반을 둔 확률정보를 결합하기 위한 확률적 CFG 파싱 방법을 활용하고자 하였다. 이를 위해서 실제 코퍼스상에서 발견되는 복합명사 패턴을 대상으로 구조적 분석을 화자 직관을 통해서 진행하고, 이를 다시 Lauer 모델과 확률적 CFG 파싱 방법 응용과 비교해 보았다. 결과적으로 화자 직관에 가장 일치한 예측을 하였으며, 구조에 대한 정보 해석이 가능하였다.

  • PDF

Word Sense Similarity Clustering Based on Vector Space Model and HAL (벡터 공간 모델과 HAL에 기초한 단어 의미 유사성 군집)

  • Kim, Dong-Sung
    • Korean Journal of Cognitive Science
    • /
    • v.23 no.3
    • /
    • pp.295-322
    • /
    • 2012
  • In this paper, we cluster similar word senses applying vector space model and HAL (Hyperspace Analog to Language). HAL measures corelation among words through a certain size of context (Lund and Burgess 1996). The similarity measurement between a word pair is cosine similarity based on the vector space model, which reduces distortion of space between high frequency words and low frequency words (Salton et al. 1975, Widdows 2004). We use PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) to reduce a large amount of dimensions caused by similarity matrix. For sense similarity clustering, we adopt supervised and non-supervised learning methods. For non-supervised method, we use clustering. For supervised method, we use SVM (Support Vector Machine), Naive Bayes Classifier, and Maximum Entropy Method.

  • PDF

Critical Discourse Analysis of '5.18' in 'Honam' and 'Yeongnam' Local Newspapers by Using Corpus (코퍼스를 이용한 '호남'과 '영남' 지역신문에서의 '5.18'에 대한 비판적 담화분석)

  • Lee, Sukeui;Jin, Duhyeon
    • Korean Linguistics
    • /
    • v.76
    • /
    • pp.83-112
    • /
    • 2017
  • In this paper, newspaper articles were collected through '5.18' keyword search results and the news corpus was constructed from the collected data. In the articles of local newspapers 'Honam' and 'Yeongnam', the ideological differences regarding '5.18' were investigated. The ideological differences of local newspaper discourse through objective figures was analyzed.. The subjects of the newspaper articles, the frequency of nouns and predicates were analyzed. The use and meaning of the intended vocabulary were examined. As a result of analyzing the title of the newspaper article, the discourse written in 'Honam' emphasized the necessity of re - recognition of 5.18. In both regions, the word "Gwangju" is often used. However, 'Gwangju' in 'Honam' newspaper means spiritual space, not physical space. In Honam regional newspapers, there are many vocabularies describing the events such as 'shoot' and 'fire', this calls for recollection and memory of '5.18'. In the analysis of newspaper discourse, the analysis of the contrast between the local newspapers was very insignificant, but, this study was conducted to analyze the discourse among local newspapers.

Concept and Application of Deep learning-based Automatic Spacing (문장 정보를 고려한 딥 러닝 기반 자동 띄어쓰기의 개념 및 활용)

  • Cho, Won Ik;Cheon, Sung Jun;Kim, Ji Won;Kim, Nam Soo
    • Annual Conference on Human and Language Technology
    • /
    • /
    • pp.181-184
    • /
    • 2018
  • 본 논문에서는, 한국어 활용에 있어 중의성을 해소해 주고 심미적 효과를 줄 수 있는 개념인 띄어쓰기를, 교정이 아닌 입력 보조의 관점에서 접근한다. 사람들에게 자판을 통한 텍스트 입력이 언어활동의 보편적인 수단이 되면서 가독성을 포기하고서라도 편의를 택하는 경우가 증가하게 되었는데, 본 연구에서는 그러한 문장들의 전달력을 높여 줄 수 있는 자동 띄어쓰기 및 그 활용 방안을 제시한다. 전체 시스템은 dense word embedding과 딥 러닝 아키텍쳐를 활용하여 훈련되었으며, 사용된 코퍼스는 비표준어 및 비정형을 포함하는 대화체 문장으로 구성되어 user-generate된 대화형 문장 입력의 처리에 적합하다.

  • PDF

Modelling Grammatical Pattern Acquisition using Video Scripts (비디오 스크립트를 이용한 문법적 패턴 습득 모델링)

  • Seok, Ho-Sik;Zhang, Byoung-Tak
    • Annual Conference on Human and Language Technology
    • /
    • /
    • pp.127-129
    • /
    • 2010
  • 본 논문에서는 다양한 코퍼스를 통해 언어를 학습하는 과정을 모델링하여 무감독학습(Unsupervised learning)으로 문법적 패턴을 습득하는 방법론을 소개한다. 제안 방법에서는 적은 수의 특성 조합으로 잠재적 패턴의 부분만을 표현한 후 표현된 규칙을 조합하여 유의미한 문법적 패턴을 탐색한다. 본 논문에서 제안한 방법은 베이지만 추론(Bayesian Inference)과 MCMC (Markov Chain Mote Carlo) 샘플링에 기반하여 특성 조합을 유의미한 문법적 패턴으로 정제하는 방법으로, 랜덤하이퍼그래프(Random Hypergraph) 모델을 이용하여 많은 수의 하이퍼에지를 생성한 후 생성된 하이퍼에지의 가중치를 조정하여 유의미한 문법적 패턴을 탈색하는 방법론이다. 우리는 본 논문에서 유아용 비디오의 스크립트를 이용하여 다양한 유아용 비디오 스크립트에서 문법적 패턴을 습득하는 방법론을 소개한다.

  • PDF

Aspects of Language Use in Newspaper Articles: A Corpus Linguistic Perspective (신문 기사의 언어 사용 양상: 코퍼스언어학적 접근)

  • Song, Kyung-Hwa;Kang, Beom-Mo
    • Korean Journal of Cognitive Science
    • /
    • v.17 no.4
    • /
    • pp.255-269
    • /
    • 2006
  • The purpose of this study is to analyze newspaper articles from corpus linguistic point of view. We used a large corpus of newspaper articles built from <21st century Sejong Project> and counted occurrences of certain expressions. A newspaper article is divided into the headline, the lead and the body. We tried to figure out how to measure the characteristics of indication and compression which are typical to headlines. Then, we focused on the differences between the headline and the lead. finally, we analyzed the sentence structure and measured the ratio of the frequency of common nouns in the body. This study verifies the existing stylistic theories of newspapers and shows new aspects of language use in newspaper articles. Texts like newspaper articles are the results of human language processing and they in turn affect the development of cognitive ability of language.

  • PDF

An Exploratory Study of Collective E-Petitions Estimation Methodology Using Anomaly Detection: Focusing on the Voice of Citizens of Changwon City (이상탐지 활용 전자집단민원 추정 방법론에 관한 탐색적 연구: 창원시 시민의 소리 사례를 중심으로)

  • Jeong, Ha-Yeong
    • Informatization Policy
    • /
    • v.26 no.4
    • /
    • pp.85-106
    • /
    • 2019
  • Recently, there have been increasing cases of collective petitions filed in the electronic petitions system. However, there is no efficient management system, raising concerns on side effects such as increased administrative workload and mass production of social conflicts. Aimed at suggesting a methodology for estimating electronic collective petitions using anomaly detection and corpus linguistics-based content analysis, this study conducted the followings: i) a theoretical review of the concept of collective petitions, ii) estimation of electronic collective petitions using anomaly detection based on nonparametric unsupervised learning, iii) a content similarity analysis on petitions using n-gram cosine angle distance, and iv) a case study on the Voice of Citizens of Changwon City, through which the utility of the proposed methodology, policy implications and future tasks were reviewed.

This study revises Lee Hyo-seok's The Buckwheat Season, utilizing Novel Corpus, intermediate learners' level (소설텍스트의 난이도 조정 방안 연구 -이효석의 「메밀꽃 필 무렵」을 중심으로-)

  • Hwang, Hye ran
    • Journal of Korean language education
    • /
    • v.29 no.4
    • /
    • pp.255-294
    • /
    • 2018
  • The Buckwheat Season, evaluated as the best of Lee Hyo-seok's literature, is one of the short stories that represent Korean literature. However, vivid literary expressions such as lyrical and beautiful depictions, figurative expressions and dialects, which show the Korean beauty, rather make learners have difficulty and become a factor that fails in reading comprehension. Thus, it is necessary to revise and present the text modified for the learners' language level. The methods of revising a literary text include the revision of linguistic elements such as cryptic vocabulary or sentence structure and the revision of the composition of the text, e.g. suggestion of characters or plot, or insertion of illustration. The methods of revising the language of the text can be divided into methods of simplification and detailing. However, in the process of revising the text, many depend on the adapter's subjective perception, not revising it with objective criteria. This paper revised the text, utilizing by the Academy of Korean Studies, , and the by the National Institute of Korean Language to secure objectivity in revising the text.

The Stream of Uncertainty in Scientific Knowledge using Topic Modeling (토픽 모델링 기반 과학적 지식의 불확실성의 흐름에 관한 연구)

  • Heo, Go Eun
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.1
    • /
    • pp.191-213
    • /
    • 2019
  • The process of obtaining scientific knowledge is conducted through research. Researchers deal with the uncertainty of science and establish certainty of scientific knowledge. In other words, in order to obtain scientific knowledge, uncertainty is an essential step that must be performed. The existing studies were predominantly performed through a hedging study of linguistic approaches and constructed corpus with uncertainty word manually in computational linguistics. They have only been able to identify characteristics of uncertainty in a particular research field based on the simple frequency. Therefore, in this study, we examine pattern of scientific knowledge based on uncertainty word according to the passage of time in biomedical literature where biomedical claims in sentences play an important role. For this purpose, biomedical propositions are analyzed based on semantic predications provided by UMLS and DMR topic modeling which is useful method to identify patterns in disciplines is applied to understand the trend of entity based topic with uncertainty. As time goes by, the development of research has been confirmed that uncertainty in scientific knowledge is moving toward a decreasing pattern.