• 제목/요약/키워드: Text processing

검색결과 1,187건 처리시간 0.032초

The Applicability of Schema Theory to Scientific Texts

  • Im, Byung-Bin;Lee, Jong-Hee
    • 영어어문교육
    • /
    • 제10권1호
    • /
    • pp.1-22
    • /
    • 2004
  • The primary purpose of this study is to investigate the applicability of content and formal schemata for processing the scientific texts which encompass the human knowledge of the physical world. In general, schema theory is based on the culture-oriented background of a text. From this point of view, the problem as to whether both content and formal schemata are applicable to the comprehension of a scientific text deserves a focal attention in terms of information processing modes. The results of empirical study indicate that whereas the universality of general knowledge content about the natural world attenuates the tenets of schema theory, the rhetorical organization of scientific texts encourages the application of the schema-based approach; the reader's familiarity with the structural patterns of a text facilitates his reading comprehension.

  • PDF

의미적으로 확장된 문장 간 유사도를 이용한 한국어 텍스트 자동 요약 (Korean Text Automatic Summarization using Semantically Expanded Sentence Similarity)

  • 김희찬;이수원
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2014년도 추계학술발표대회
    • /
    • pp.841-844
    • /
    • 2014
  • 텍스트 자동 요약은 수많은 텍스트 데이터를 처리함에 있어 중요한 연구 분야이다. 이중 추출요약은 현재 가장 많이 연구가 되고 있는 자동 요약 분야이다. 본 논문은 추출 요약의 선두 연구인 TextRank는 문장 간 유사도를 계산할 때 문장 내 단어 간의 의미적 유사성을 충분히 고려하지 못하였다. 본 연구에서는 의미적 유사성을 고려한 새로운 단어 간 유사도 측정 방법을 제안한다. 추출된 문장 간 유사도는 그래프로 표현되며, TextRank의 랭킹 알고리즘과 동일한 랭킹 알고리즘을 사용하여 실험적으로 평가하였다. 그 결과 문장 간 유사성을 고려할 때 단어의 의미적 요소를 충분히 고려하여 정보의 유실을 최소화하여야 한다는 것을 실험 결과로써 확인할 수 있었다.

Joint Hierarchical Semantic Clipping and Sentence Extraction for Document Summarization

  • Yan, Wanying;Guo, Junjun
    • Journal of Information Processing Systems
    • /
    • 제16권4호
    • /
    • pp.820-831
    • /
    • 2020
  • Extractive document summarization aims to select a few sentences while preserving its main information on a given document, but the current extractive methods do not consider the sentence-information repeat problem especially for news document summarization. In view of the importance and redundancy of news text information, in this paper, we propose a neural extractive summarization approach with joint sentence semantic clipping and selection, which can effectively solve the problem of news text summary sentence repetition. Specifically, a hierarchical selective encoding network is constructed for both sentence-level and document-level document representations, and data containing important information is extracted on news text; a sentence extractor strategy is then adopted for joint scoring and redundant information clipping. This way, our model strikes a balance between important information extraction and redundant information filtering. Experimental results on both CNN/Daily Mail dataset and Court Public Opinion News dataset we built are presented to show the effectiveness of our proposed approach in terms of ROUGE metrics, especially for redundant information filtering.

Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words

  • Lee, Tae-Seok;Lee, Hyun-Young;Kang, Seung-Shik
    • Journal of Information Processing Systems
    • /
    • 제18권3호
    • /
    • pp.344-358
    • /
    • 2022
  • Text summarization is the task of producing a shorter version of a long document while accurately preserving the main contents of the original text. Abstractive summarization generates novel words and phrases using a language generation method through text transformation and prior-embedded word information. However, newly coined words or out-of-vocabulary words decrease the performance of automatic summarization because they are not pre-trained in the machine learning process. In this study, we demonstrated an improvement in summarization quality through the contextualized embedding of BERT with out-of-vocabulary masking. In addition, explicitly providing precise pointing and an optional copy instruction along with BERT embedding, we achieved an increased accuracy than the baseline model. The recall-based word-generation metric ROUGE-1 score was 55.11 and the word-order-based ROUGE-L score was 39.65.

딥러닝 기반 기계번역 개념을 활용한 Text-to-Ontology 변환 사례 (A case study on Text-to-Ontology transformation on the basis of neural translation)

  • 신유진;이지항
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2021년도 추계학술발표대회
    • /
    • pp.891-894
    • /
    • 2021
  • 온톨로지(Ontology)는 사람과 컴퓨터, 또는 컴퓨터 간의 개념 및 개념 표현을 공유하기 위한 개념화의 명시적 규약을 의미한다. 기존의 온톨로지 생성은 전문가에 의한 수작업에 의존되어 비용과 시간이 많이 드는 한계가 있다. 이에 본 논문에서는 딥러닝(Deep learning)기반의 기계번역 개념을 적용한 사례를 활용하여, 수작업의 의존성이 감소한 방법으로 텍스트로부터 온톨로지를 생성하는 방법을 구현하였다. 특히 기존 연구에서 제안한, 딥러닝을 이용해 텍스트로부터 지식 표현 시퀀스를 추출한 정보를 활용하여, 지식 표현 구조를 온톨로지로 변환하고 지식 베이스로 확장하는 과정을 통해 자동화 된 Text-to-Ontology 변환 방법론을 제안하고자 한다.

STT(Speech-To-Text)와 ChatGPT 를 활용한 강의 요약 애플리케이션 (A Lecture Summarization Application Using STT (Speech-To-Text) and ChatGPT)

  • 김진웅;금보성 ;김태국
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2023년도 추계학술발표대회
    • /
    • pp.297-298
    • /
    • 2023
  • COVID-19 가 사실상 종식됨에 따라 대학 강의가 비대면 온라인 강의에서 대면 강의로 전환되었다. 온라인 강의에서는 다시 보기를 통한 복습이 가능했지만, 대면강의에서는 녹음을 통해서 이를 대체하고 있다. 하지만 다시 보기와 녹음본은 원하는 부분을 찾거나 내용을 요약하는데 있어서 시간이 오래 걸리고 불편하다. 본 논문에서는 강의 내용을 STT(Speech-to-Text) 기술을 활용하여 텍스트로 변환하고 ChatGPT(Chat-Generative Pre-trained Transformer)로 요약하는 애플리케이션을 제안한다.

시맨틱 텍스트 마이닝을 위한 온톨로지 활용 방안 (Using Ontologies for Semantic Text Mining)

  • 유은지;김정철;이춘열;김남규
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제21권3호
    • /
    • pp.137-161
    • /
    • 2012
  • The increasing interest in big data analysis using various data mining techniques indicates that many commercial data mining tools now need to be equipped with fundamental text analysis modules. The most essential prerequisite for accurate analysis of text documents is an understanding of the exact semantics of each term in a document. The main difficulties in understanding the exact semantics of terms are mainly attributable to homonym and synonym problems, which is a traditional problem in the natural language processing field. Some major text mining tools provide a thesaurus to solve these problems, but a thesaurus cannot be used to resolve complex synonym problems. Furthermore, the use of a thesaurus is irrelevant to the issue of homonym problems and hence cannot solve them. In this paper, we propose a semantic text mining methodology that uses ontologies to improve the quality of text mining results by resolving the semantic ambiguity caused by homonym and synonym problems. We evaluate the practical applicability of the proposed methodology by performing a classification analysis to predict customer churn using real transactional data and Q&A articles from the "S" online shopping mall in Korea. The experiments revealed that the prediction model produced by our proposed semantic text mining method outperformed the model produced by traditional text mining in terms of prediction accuracy such as the response, captured response, and lift.

텍스트-배경무늬 혼합문서로부터 수리형태학을 이용한 문자열 추출 (String extraction from text-background mixed documents using mathematical morphology)

  • 성연진;어진우
    • 전자공학회논문지S
    • /
    • 제34S권10호
    • /
    • pp.104-111
    • /
    • 1997
  • It is known as a difficult problem to recognize text-background mixed documents. In this paper a new string extraction algorithm, using mathematical morphology for the document consisting of text and overlapped periodic background pattern, is proposed. The algorithm consists of pattern periodicity feature extraction and background removal. The extracted pattern periodicity feature is used to determine the shape of structuring elements for morphological pre- and post-processing to remove background. The effectiveness of the proposed algorithm over the existing one is also verified through the experiments with various test documents.

  • PDF

동영상에서 시간 영역 정보를 이용한 자막 검출 알고리듬 (Caption Detection Algorithm Using Temporal Information in Video)

  • 권철현;신청호;김수연;박상희
    • 대한전기학회논문지:시스템및제어부문D
    • /
    • 제53권8호
    • /
    • pp.606-610
    • /
    • 2004
  • A noble caption text detection and recognition algorithm using the temporal nature of video is proposed in this paper. A text registration technique is used to locate the temporal and spatial positions of captions in video from the accumulated frame difference information. Experimental results show that the proposed method is effective and robust. Also, a high processing speed is achieved since no time consuming operation is included.