• Title/Summary/Keyword: text linguistics

Search Result 69, Processing Time 0.018 seconds

Using Collective Citing Sentences to Recognize Cited Text in Computational Linguistics Articles

  • Kang, In-Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.11
    • /
    • pp.85-91
    • /
    • 2016
  • This paper proposes a collective approach to cited text recognition by exploiting a set of citing text from different articles citing the same article. First, the proposed method gathers highly-ranked cited sentences from the cited article using a group of citing text to create a collective information of probable cited sentences. Then, such collective information is used to determine final cited sentences among highly-ranked sentences from similarity-based cited text recognition. Experiments have been conducted on the data set which consists of research articles from a computational linguistics domain. Evaluation results showed that the proposed method could improve the performance of similarity-based baseline approaches.

100 K-Poison: Poisonous Texts Resistance Test Dataset For Korean Generative Models (100 K-Poison: 한국어 생성 모델을 위한 독성 텍스트 저항력 검증 데이터셋 )

  • Li Fei;Yejee Kang;Seoyoon Park;Yeonji Jang;Hansaem Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.149-154
    • /
    • 2023
  • 본고는 한국어 생성 모델의 독성 텍스트 저항 능력을 검증하기 위해 'CVALUE' 데이터셋에서 추출한 고난도 독성 질문-대답 100쌍을 바탕으로 한국어 생성 모델을 위한 '100 K-Poison' 데이터셋을 시범적으로 구축했다. 이 데이터셋을 토대로 4가지 대표적인 한국어 생성 모델 'ZeroShot TextClassifcation'과 'Text Generation7 실험을 진행함으로써 현재 한국어 생성 모델의 독성 텍스트 식별 및 응답 능력을 종합적으로 고찰했고, 모델 간의 독성 텍스트 저항력 격차 현상을 분석했으며, 앞으로 한국어 생성 모델의 독성 텍스트 식별 및 웅대 성능을 한층 더 강화하기 위한 '이독공독(以毒攻毒)' 학습 전략을 새로 제안하였다.

  • PDF

A Term Importance-based Approach to Identifying Core Citations in Computational Linguistics Articles

  • Kang, In-Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.9
    • /
    • pp.17-24
    • /
    • 2017
  • Core citation recognition is to identify influential ones among the prior articles that a scholarly article cite. Previous approaches have employed citing-text occurrence information, textual similarities between citing and cited article, etc. This study proposes a term-based approach to core citation recognition, which exploits the importance of individual terms appearing in in-text citation to calculate influence-strength for each cited article. Term importance is computed using various frequency information such as term frequency(tf) in in-text citation, tf in the citing article, inverse sentence frequency in the citing article, inverse document frequency in a collection of articles. Experiments using a previous test set consisting of computational linguistics articles show that the term-based approach performs comparably with the previous approaches. The proposed technique could be easily extended by employing other term units such as n-grams and phrases, or by using new term-importance formulae.

Using a Prosodic Labeling Text(PLT) in the Synthesis of Spoken Chinese

  • Wu, Zong-Ji
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.473-475
    • /
    • 1996
  • The prosodic features of Spoken Chinese play the important roll of the naturalness, a list of prosodic labeling symbols represents all the prosodic features is given in this paper, and a paragraph of ' Prosodic Labeling Text '(PLT) is also attached for example.

  • PDF

텍스트 맥락과 중한번역

  • Park, Eun-Suk
    • 중국학논총
    • /
    • no.70
    • /
    • pp.61-86
    • /
    • 2021
  • 本文主要討論了語境在中韓語篇翻譯中的作用。作者把韓礼德的語境三分法, 應用于中韓語篇翻譯實踐中。概括起來設, 上下文語境(又称爲語言語境)是指一个詞, 一个短語, 乃至更長的語篇前后的内容。情景語境就是語域變量, 語域變量可分爲以下三种 : 語場, 語旨和語式, 最后, 文化語境指語篇所涉及的社會, 文化, 經濟, 宗教和政治背景等。作者把語境分爲上下文語境, 情景語境和文化語境, 深入探討了中韓翻譯中的語境問題。作者把文化語境, 還分爲文化詞和文化含義詞的影響与制約和文化詞的翻譯戰略二部分, 論述了文化詞翻譯的難点以及文化詞的翻譯技巧。通過語境分析我們可以看出, 在中韓翻譯實踐中利用語境因素能排除歧義 ; 借助語境中特定的情境意義在譯文中重构原文中用語法, 語用和語体等表現出的意義。最后, 我們在翻譯過程中依靠文化語境能判斷在原文中文化詞所含有的詞義。

Improving spaCy dependency annotation and PoS tagging web service using independent NER services

  • Colic, Nico;Rinaldi, Fabio
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.21.1-21.6
    • /
    • 2019
  • Dependency parsing is often used as a component in many text analysis pipelines. However, performance, especially in specialized domains, suffers from the presence of complex terminology. Our hypothesis is that including named entity annotations can improve the speed and quality of dependency parses. As part of BLAH5, we built a web service delivering improved dependency parses by taking into account named entity annotations obtained by third party services. Our evaluation shows improved results and better speed.

KTARSQI: The Annotation of Temporal and Event Expressions in Korean Text (KTARSQI: 한국어 텍스트의 시간 및 사건 표현 주석)

  • Im, Seohyun;Kim, Yoon-Shin;Jo, Yoomi;Jang, Hayun;Ko, Minsoo;Nam, Seungho;Shin, Hyopil
    • Annual Conference on Human and Language Technology
    • /
    • 2009.10a
    • /
    • pp.130-135
    • /
    • 2009
  • 정보추출(information extraction), 질의-응답 시스템(Question-Answering system) 등의 자연언어처리 응용분야에서 시간과 사건에 관련한 정보를 추출하는 것은 중요한 부분이다. 그럼에도 불구하고, 한국어의 자연언어처리 응용분야에서는 아직까지 이 연구가 본격화되지 않았다. 미국 TARSQI 프로젝트의 연구결과를 바탕으로 하여 한국어 텍스트에서 시간 및 사건 표현의 주석, 추출, 추론을 위한 명세 언어(KTimeML), 주석 말뭉치(KTimeBank), 자동 태깅 시스템(KTarsqi Toolkit: KTTK)의 개발을 목표로 2008년 KTARSQI 프로젝트가 시작되었다. 이 논문에서는 KTARSQI 프로젝트의 목표와 과제에 대한 전반적인 소개와 함께, 현재까지 진행된 작업의 결과로서 사건 태그의 명세와 주석에 관한 논의를 덧붙인다.

  • PDF