• Title/Summary/Keyword: 구어체요약

Search Result 4, Processing Time 0.02 seconds

Text summarization of dialogue based on BERT

  • Nam, Wongyung;Lee, Jisoo;Jang, Beakcheol
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.8
    • /
    • pp.41-47
    • /
    • 2022
  • In this paper, we propose how to implement text summaries for colloquial data that are not clearly organized. For this study, SAMSum data, which is colloquial data, was used, and the BERTSumExtAbs model proposed in the previous study of the automatic summary model was applied. More than 70% of the SAMSum dataset consists of conversations between two people, and the remaining 30% consists of conversations between three or more people. As a result, by applying the automatic text summarization model to colloquial data, a result of 42.43 or higher was derived in the ROUGE Score R-1. In addition, a high score of 45.81 was derived by fine-tuning the BERTSum model, which was previously proposed as a text summarization model. Through this study, the performance of colloquial generation summary has been proven, and it is hoped that the computer will understand human natural language as it is and be used as basic data to solve various tasks.

An Extractive Summarization System for Real-time Call Center Agent (실시간 콜센터 상담사 보조를 위한 주요 상담 발화 추출 요약 시스템)

  • Ian Jung;YongTaek Lee;Hyunmok Kim;Yongchol Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.53-58
    • /
    • 2022
  • 인공지능 기술이 발전하며, 다양한 산업군에 사람의 업무를 보조하는 인공지능 시스템이 적용되고 있다. 그 중 콜센터 상담사의 상담 업무를 보조하는 자연어 처리 기술 역시 활발히 연구되고 있는 분야 중 하나이다. 콜센터 상담사 보조 시스템은 상담사를 보조하기에 앞서 고객과 상담사의 대화로 진행되는 상담이 어떤 내용인지 정확히 인식해야 한다. 이때, 시스템이 상담의 목적을 대표할 수 있는 발화를 판별한다면 상담 내용을 보다 명확히 인식할 수 있다. 본 논문은 구어체로 진행되는 상담 스크립트의 특징을 주목하여, 실시간으로 상담 내용을 분석하고, 중요한 의미를 가지는 발화를 인지하여 추출하는 모델을 제안한다. 실험 결과, 제안한 모델이 기존 추출 요약과 비교하여, 우수한 성능을 보였다. 본 논문에 제안한 모델을 적용하여, 주요 상담 발화를 추출하고, 관련된 상담 문서 검색, 상담 내용 분류 등에 적용할 수 있다.

  • PDF

Automatic Product Feature Extraction for Efficient Analysis of Product Reviews Using Term Statistics (효율적인 상품평 분석을 위한 어휘 통계 정보 기반 평가 항목 추출 시스템)

  • Lee, Woo-Chul;Lee, Hyun-Ah;Lee, Kong-Joo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.6
    • /
    • pp.497-502
    • /
    • 2009
  • In this paper, we introduce an automatic product feature extracting system that improves the efficiency of product review analysis. Our system consists of 2 parts: a review collection and correction part and a product feature extraction part. The former part collects reviews from internet shopping malls and revises spoken style or ungrammatical sentences. In the latter part, product features that mean items that can be used as evaluation criteria like 'size' and 'style' for a skirt are automatically extracted by utilizing term statistics in reviews and web documents on the Internet. We choose nouns in reviews as candidates for product features, and calculate degree of association between candidate nouns and products by combining inner association degree and outer association degree. Inner association degree is calculated from noun frequency in reviews and outer association degree is calculated from co-occurrence frequency of a candidate noun and a product name in web documents. In evaluation results, our extraction method showed an average recall of 90%, which is better than the results of previous approaches.

Topic Continuity in Korea Narrative (한국 설화문에서의 화제표현의 연속성)

  • Hi-JaChong
    • Korean Journal of Cognitive Science
    • /
    • v.2 no.2
    • /
    • pp.405-428
    • /
    • 1990
  • Language has a social function to communicate information. Linguists have gradually paid their attention to the function of language since the nineteen sixties, especially to the relationship of form, meaning and the function. The relationship could be more clearly grasped through disciyrse-based analysis than through sentence-based analysis. Many researches were centered on the discourse functional notion of topic. In the early 1970's the subject was defined as the grammatiocalized topic the topic as a discrete single constituent of the clause. In the late 1970's several lingusts including Givon suggerted that the topic was not an atomic, disctete entity, and that the clause could have more than one topic. The purpose of the present study is, following Givon, to study grammatical coding devices of topic and to measure the relative topic continuity/discontinuity of participant argu, ents in Korean narratives. By so doing, I would like to shed some light on effective ways of communicating information. The grammatical coding devices analyzed are the following eight structures: zero-anaphora, personal pronous, demonstrative pronouns, names, noun phrases following demonstratives, noun phrases following possessives, definite noun phrases and indefinite referentials. The narrative studied for the count was taken from the KoreanCIA chief's Testiomny:Revolution and Idol by Hyung Wook Kim. It was chosen because it was assumed that Kim's purpose in the novel was to tell a true story, which would not distort the natural use of language for literary effect. The measures taken in the analysis wre those of 'lookback', 'persistence', ambiguity'. The first of these, 'lookback', is a measure of the size of gap between the previous occurrence of a referent and its current occurence in the clause. The meausure of persistence, which is a measure of the speaker's topocal intent, reflects the topic's importance in the discourse. The third measure is a measure of ambiguity. This is necessary for assessing the disruptive effects that other topics within five previous clauses may have on topic identification. The more other topics are present within five previous clauses, the more difficult is the task of correct identification of a topic. The results of the present study show that the humanness of entities is the most powerful factior in topic continutiy in narrative discourse. The semantic roles of human arguments in narrative discourse tend to be agents or experiences. Since agents and experiences have high topicality in discourse, human entities clearly become clausal or discoursal topics. The results also show that the grammatical devices signal varying degrees of topic continuity discontinuity in continuous discourse. The more continuous a topic argument is, the less it is coded. For example, personal pronouns have the most continutiy and indefinite referentials have the least continutiy. The study strongly shows that topic continuity discontinutiy is controlled not only by grammatical devices available in the language but by socio-cultural factors and writer's intentions.