• Title/Summary/Keyword: 문서 요약 자동화

Search Result 15, Processing Time 0.021 seconds

Text Extraction and Summarization from Web News (웹 뉴스의 기사 추출과 요약)

  • Han, Kwang-Rok;Sun, Bok-Keun;Yoo, Hyoung-Sun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.5
    • /
    • pp.1-10
    • /
    • 2007
  • Many types of information provided through the web including news contents contain unnecessary clutters. These clutters make it difficult to build automated information processing systems such as the summarization, extraction and retrieval of documents. We propose a system that extracts and summarizes news contents from the web. The extraction system receives news contents in HTML as input and builds an element tree similar to DOM tree, and extracts texts while removing clutters with the hyperlink attribute in the HTML tag from the element tree. Texts extracted through the extraction system are transferred to the summarization system, which extracts key sentences from the texts. We implement the summarization system using co-occurrence relation graph. The summarized sentences of this paper are expected to be transmissible to PDA or cellular phone by message services such as SMS.

  • PDF

수동형 UHF RFID 보안기술 국제 표준화 동향

  • Kang, You-Sung;Choi, Doo-Ho;Cho, Hyun-Sook
    • Review of KIISC
    • /
    • v.22 no.1
    • /
    • pp.31-38
    • /
    • 2012
  • 최근 들어, 자동화된 사물인식을 위한 대표적인 저가 경량의 전자 장치인 수동형 UHF RFID 태그의 확산을 위한 보안 기술 표준화 논의가 활발하게 전개되고 있다. 본 고에서는 수동형 UHF RFID 보안기술 표준화와 관련된 ISO/IEC JTC1 SC31의 표준문서와 EPCglobal 기술규격 관계를 정리하고, 특히 향후에 ISO/IEC JTC1 SC31 회의에서 논의될 것으로 보이는 주요 제안기술들을 분석한다. 분석 대상이 되는 주요 기술들은 유럽, 북미, 아시아 등 다양한 지역의 기업에서 제안된 기술들로서, 본 고에서는 제안기술의 주요 특정을 소개하며 향후 국제 표준화 전망을 요약하면서 결론을 맺는다.

A Study on Automated Fake News Detection Using Verification Articles (검증 자료를 활용한 가짜뉴스 탐지 자동화 연구)

  • Han, Yoon-Jin;Kim, Geun-Hyung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.12
    • /
    • pp.569-578
    • /
    • 2021
  • Thanks to web development today, we can easily access online news via various media. As much as it is easy to access online news, we often face fake news pretending to be true. As fake news items have become a global problem, fact-checking services are provided domestically, too. However, these are based on expert-based manual detection, and research to provide technologies that automate the detection of fake news is being actively conducted. As for the existing research, detection is made available based on contextual characteristics of an article and the comparison of a title and the main article. However, there is a limit to such an attempt making detection difficult when manipulation precision has become high. Therefore, this study suggests using a verifying article to decide whether a news item is genuine or not to be affected by article manipulation. Also, to improve the precision of fake news detection, the study added a process to summarize a subject article and a verifying article through the summarization model. In order to verify the suggested algorithm, this study conducted verification for summarization method of documents, verification for search method of verification articles, and verification for the precision of fake news detection in the finally suggested algorithm. The algorithm suggested in this study can be helpful to identify the truth of an article before it is applied to media sources and made available online via various media sources.

A Study on Layout Extraction from Internet Documents Through Xpath (Xpath에 의한 인터넷 문서의 레이아웃 추출 방법에 관한 연구)

  • Han Kwang-Rok;Sun Bok-Keun
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.4
    • /
    • pp.237-244
    • /
    • 2005
  • Currently most Internet documents including news data are made based on predefined templates, but templates are usually formed only for main data and are not helpful for information retrieval against indexes, advertisements, header data etc. Templates in such forms are not appropriate when Internet documents are used as data for information retrieval. In order to process Internet documents in various areas of information retrieval, it is necessary to detect additional information such as advertisements and page indexes. Thus this study proposes a method of detecting the layout of web pages by identifying the characteristics and structure of block tags that affect the layout of web pages and calculating distances between web pages. As a result of experiment, we can successfully extract 640 documents from 1000 samples and obtain 64% recall rate. This method is purposed to reduce the cost of web document automatic processing and improve its efficiency through applying the method to document preprocessing of information retrieval such as data extraction and document summarization.

  • PDF

Policy agenda proposals from text mining analysis of patents and news articles (특허 및 뉴스 기사 텍스트 마이닝을 활용한 정책의제 제안)

  • Lee, Sae-Mi;Hong, Soon-Goo
    • Journal of Digital Convergence
    • /
    • v.18 no.3
    • /
    • pp.1-12
    • /
    • 2020
  • The purpose of this study is to explore the trend of blockchain technology through analysis of patents and news articles using text mining, and to suggest the blockchain policy agenda by grasping social interests. For this purpose, 327 blockchain-related patent abstracts in Korea and 5,941 full-text online news articles were collected and preprocessed. 12 patent topics and 19 news topics were extracted with latent dirichlet allocation topic modeling. Analysis of patents showed that topics related to authentication and transaction accounted were largely predominant. Analysis of news articles showed that social interests are mainly concerned with cryptocurrency. Policy agendas were then derived for blockchain development. This study demonstrates the efficient and objective use of an automated technique for the analysis of large text documents. Additionally, specific policy agendas are proposed in this study which can inform future policy-making processes.