Search | Korea Science

Learning Contextual Meaning Representations of Named Entities for Correcting Factual Inconsistent Summary (개체명 문맥의미표현 학습을 통한 기계 요약의 사실 불일치 교정)

Park, Junmo;Noh, Yunseok;Park, Seyoung
- Annual Conference on Human and Language Technology
- /
- 2020.10a
- /
- pp.54-59
- /
- 2020
사실 불일치 교정은 기계 요약 시스템이 요약한 결과를 실제 사실과 일치하도록 만드는 작업이다. 실제 요약 생성연구에서 가장 공통적인 문제점은 요약을 생성할 때 잘못된 사실을 생성하는 것이다. 이는 요약 모델이 실제 서비스로 상용화 하는데 큰 걸림돌이 되는 부분 중 하나이다. 본 논문에서는 원문으로부터 개체명을 가져와 사실과 일치하는 문장으로 고치는 방법을 제안한다. 이를 위해서 언어 모델이 개체명에 대한 문맥적 표현을 잘 생성할 수 있도록 학습시킨다. 그리고 학습된 모델을 이용하여 원문과 요약문에 등장한 개체명들의 문맥적 표현 비교를 통해 적절한 단어로 교체함으로써 요약문의 사실 불일치를 해소한다. 제안 모델을 평가하기 위해 추상 요약 데이터를 이용해 학습데이터를 만들어 학습하고, 실제 시나리오에서 적용가능성을 검증하기 위해 모델이 요약한 요약문을 이용해 실험을 수행했다. 실험 결과, 자동 평가와 사람 평가에서 제안 모델이 비교 모델보다 높은 성능을 보여주었다.
PDF

A Study on Korean Generative Question-Answering with Contextual Summarization (문맥 요약을 접목한 한국어 생성형 질의응답 모델 연구)

Jeongjae Nam;Wooyoung Kim;Sangduk Baek;Wonjun Lee;Taeyong Kim;Hyunsoo Yoon;Wooju Kim
- Annual Conference on Human and Language Technology
- /
- 2023.10a
- /
- pp.581-585
- /
- 2023
Question Answering(QA)은 질문과 문맥에 대한 정보를 토대로 적절한 답변을 도출하는 작업이다. 이때 입력으로 주어지는 문맥 텍스트는 대부분 길기 때문에 QA 모델은 이 정보를 처리하기 위해 상당한 컴퓨팅 자원이 필요하다. 이 문제를 해결하기 위해 본 논문에서는 요약 모델을 활용한 요약 기반 QA 모델 프레임워크를 제안한다. 이를 통해 문맥 정보를 효과적으로 요약하면서도 QA 모델의 컴퓨팅 비용을 줄이고 성능을 유지하는 것을 목표로 한다.
PDF

A Modular Pointer Analysis using Function Summaries (함수 요약을 이용한 모듈단위 포인터분석)

Park, Sang-Woon;Kang, Hyun-Goo;Han, Tai-Sook
- Journal of KIISE:Software and Applications
- /
- v.35 no.10
- /
- pp.636-652
- /
- 2008
In this paper, we present a modular pointer analysis algorithm based on the update history. We use the term 'module' to mean a set of mutually recursive procedures and the term 'modular analysis' to mean a program analysis that does not need the source codes of the other modules to analyze a module. Since a modular pointer analysis does not utilize any information on the callers, it is difficult to design a precise analysis that does not lose the information related to the program flow or the calling context. In this paper, we propose a modular and flow- and context-sensitive pointer analysis algorithm based on the update history that can memory states of a procedure independently of the information on the calling context and keep the information on the order of side effects performed. Such a memory representation not only enables the analysis to be formalized as a modular analysis, but also helps the analysis to effectively identify killed side effects and relevant alias contexts.
PDF KSCI

A Study on the Construction of the Automatic Summaries - on the basis of Straight News in the Web - (자동요약시스템 구축에 대한 연구 - 웹 상의 보도기사를 중심으로 -)

Lee, Tae-Young
- Journal of the Korean Society for information Management
- /
- v.23 no.4 s.62
- /
- pp.41-67
- /
- 2006
The writings frame and various rules based on discourse structure and knowledge-based methods were applied to construct the automatic Ext/sums (extracts & summaries) system from the straight news in web. The frame contains the slot and facet represented by the role of paragraphs, sentences , and clauses in news and the rules determining the type of slot. Rearrangement like Unification, separation, and synthesis of the candidate sentences to summary, maintaining the coherence of meanings, was carried out by using the rules derived from similar degree measurement, syntactic information, discourse structure, and knowledge-based methods and the context plots defined with the syntactic/semantic signature of noun and verb and category of verb suffix. The critic sentence were tried to insert into summary.
https://doi.org/10.3743/KOSIM.2006.23.4.041 인용 PDF

A Sentiment Classification Method Using Context Information in Product Review Summarization (상품 리뷰 요약에서의 문맥 정보를 이용한 의견 분류 방법)

Yang, Jung-Yeon;Myung, Jae-Seok;Lee, Sang-Goo
- Journal of KIISE:Databases
- /
- v.36 no.4
- /
- pp.254-262
- /
- 2009
As the trend of e-business activities develop, customers come into contact with products through on-line shopping sites and lots of customers refer product reviews before the purchasing on-line. However, as the volume of product reviews grow, it takes a great deal of time and effort for customers to read and evaluate voluminous product reviews. Lately, attention is being paid to Opinion Mining(OM) as one of the effective solutions to this problem. In this paper, we propose an efficient method for opinion sentiment classification of product reviews using product specific context information of words occurred in the reviews. We define the context information of words and propose the application of context for sentiment classification and we show the performance of our method through the experiments. Additionally, in case of word corpus construction, we propose the method to construct word corpus automatically using the review texts and review scores in order to prevent traditional manual process. In consequence, we can easily get exact sentiment polarities of opinion words in product reviews.
PDF KSCI

Contextual Advertisement System based on Document Clustering (문서 클러스터링을 이용한 문맥 광고 시스템)

Lee, Dong-Kwang;Kang, In-Ho;An, Dong-Un
- The KIPS Transactions:PartB
- /
- v.15B no.1
- /
- pp.73-80
- /
- 2008
In this paper, an advertisement-keyword finding method using document clustering is proposed to solve problems by ambiguous words and incorrect identification of main keywords. News articles that have similar contents and the same advertisement-keywords are clustered to construct the contextual information of advertisement-keywords. In addition to news articles, the web page and summary of a product are also used to construct the contextual information. The given document is classified as one of the news article clusters, and then cluster-relevant advertisement-keywords are used to identify keywords in the document. We could achieve 21% precision improvement by our proposed method.
https://doi.org/10.3745/KIPSTB.2008.15-B.1.73 인용 PDF KSCI

A Way of Avoiding Spurious Paths in Interprocedural Static Analysis (함수 호출을 구별하는 분석에서 가짜 경로를 없애는 한 방법)

Heo, Ki-Hong;Oh, Hak-Joo;Yi, Kwang-Keun
- Proceedings of the Korean Information Science Society Conference
- /
- 2011.06c
- /
- pp.474-477
- /
- 2011
함수 호출 문맥을 요약하는 프로그램 분석은 가짜 경로 문제로 인해 심각한 성능저하를 겪기 마련이다. 이는 함수 호출 문맥이 요약되면서 분석 정보를 어디로 흘려보내야 할지 정확히 알 수 없는 경우가 생기기 때문이다. 이 논문에서는 함수 호출을 구변하는 분석에서 가짜 경로를 없애는 새로운 알고리즘을 설명한다. 분석 순서를 프로그램의 실제 실행과 비슷하게 제한하고 알고리즘의 일부를 조금 바꾸면 재귀 함수가 아닌 경우 가짜 경로를 모두 제거할 수 있다. 이 방식은 기존 방식과 같거나 더 정확한 결과를 내고 속도는 훨씬 빠르다.

Long-KE-T5: Korean-English Language model for Long Sequences (Long-KE-T5: 긴 맥락 파악이 가능한 한국어-영어 언어 모델 구축)

San Kim;Jinyea Jang;Minyoung Jeung;Saim Shin
- Annual Conference on Human and Language Technology
- /
- 2023.10a
- /
- pp.168-170
- /
- 2023
이 논문에서는 7,400만개의 한국어, 영어 문서를 활용하여 최대 4,096개의 토큰을 입력으로하고 최대 1,024개의 토큰을 생성할 수 있도록 학습한 언어모델인 Long-KE-T5를 소개한다. Long-KE-T5는 문서에서 대표성이 높은 문장을 생성하도록 학습되었으며, 학습에 사용한 문서의 길이가 길기 때문에 긴 문맥이 필요한 태스크에 활용할 수 있다. Long-KE-T5는 다양한 한국어 벤치마크에서 높은 성능을 보였으며, 사전학습 모델링 방법이 텍스트 요약과 유사하기 때문에 문서 요약 태스크에서 기존 모델 대비 높은 성능을 보였다.
PDF

Discovery of Coordinate Terms and Context using the Title and Snippet in Web Search (Web 검색 엔진의 제목과 문서요약을 이용한 동위어와 문맥의 발견)

Han, Sang-Yong;Lee, Sang-Hoon
- Proceedings of the Korean Information Science Society Conference
- /
- 2007.10c
- /
- pp.210-215
- /
- 2007
웹상에서의 정보량이 증가함에 따라, 사용자가 알고 싶어 하는 단어에 대해서 연관된 단어를 통해서 이해하게 된다. 동위어란 공통의 상위어를 가지는 단어이다. 이를 위한 기존의 연구로서 동위어와 상위어, 하위어 등을 찾는 연구는 많이 있었지만, 웹상의 문서를 이용하여 거대한 코퍼스를 해석해서 결과를 구하는 데 많은 시간이 소요되었다. 이에 본 논문에서는 사용자의 질의어에 대해서 웹 검색엔진이 가지는 제목과 문서요악으로부터 동위어와 문맥을 빠른 시간 안에 발견하는 방법에 대해 제안한다. 어떤 단어에 대한 동위어가 병렬조사 #와#로 접속되는 것을 이용하여 웹 검색 엔진에 대한 질의어를 작성하고, 그 검색 결과로부터 동위어를 얻는다. 이와 동시에 발견된 동위어와 질의어의 배후에 있는 문맥도 얻는다. 이를 통해, 웹 검색에 있어서 질의어의 확장과 비교 대상의 발견 등 폭넓은 분야에서도 적용가능하다고 할 수 있다.
PDF

Multi-task learning for entity-centric fact correction on machine summaries (기계 요약의 개체명 사실 수정을 위한 다중 작업 학습 방법 제안)

Shin, JeongWan;Noh, Yunseok;Park, SangHeon;O, YoungSun;Park, Seyoung
- Annual Conference on Human and Language Technology
- /
- 2021.10a
- /
- pp.124-130
- /
- 2021
기계요약의 사실 불일치는 생성된 요약이 원문과 다른 사실 정보를 전달하는 현상이며, 특히 개체명이 잘못 사용되었을 때 기계요약의 신뢰성을 크게 훼손한다. 개체명의 수정을 위해서는 두 가지 작업을 수행해야한다. 먼저 요약 내 각 개체명이 올바르게 쓰였는지 판별을 해야하며, 이후 잘못된 개체명을 맞게 고치는 작업이 필요하다. 본 논문에서는 두 가지 작업 모두 각 개체명을 문맥적으로 이해함으로써 해결할 수 있다고 가정하고, 이에 따라 두 작업에 대한 다중 작업 학습 방법을 제안한다. 제안한 방법을 통해 학습한 모델은 생성된 기계요약에 대한 후처리 교정을 수행할 수 있다. 제안 모델을 평가하기 위해 강제적으로 개체명을 훼손시킨 요약데이터와 기계 요약 데이터에 대해서 성능을 평가 하였으며, 다른 개체명 수정 모델과 비교하였다. 제안모델은 개체명 수준에서 92.9%의 교정 정확도를 달성했으며, KoBART 요약모델이 만든 기계요약의 사실 정확도 4.88% 포인트 향상시켰다.
PDF

Search Result 30, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)