• Title/Summary/Keyword: Single Document

Search Result 159, Processing Time 0.025 seconds

Single Document Extractive Summarization Based on Deep Neural Networks Using Linguistic Analysis Features (언어 분석 자질을 활용한 인공신경망 기반의 단일 문서 추출 요약)

  • Lee, Gyoung Ho;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.8
    • /
    • pp.343-348
    • /
    • 2019
  • In recent years, extractive summarization systems based on end-to-end deep learning models have become popular. These systems do not require human-crafted features and adopt data-driven approaches. However, previous related studies have shown that linguistic analysis features such as part-of-speeches, named entities and word's frequencies are useful for extracting important sentences from a document to generate a summary. In this paper, we propose an extractive summarization system based on deep neural networks using conventional linguistic analysis features. In order to prove the usefulness of the linguistic analysis features, we compare the models with and without those features. The experimental results show that the model with the linguistic analysis features improves the Rouge-2 F1 score by 0.5 points compared to the model without those features.

A Study on Feature Selection for kNN Classifier using Document Frequency and Collection Frequency (문헌빈도와 장서빈도를 이용한 kNN 분류기의 자질선정에 관한 연구)

  • Lee, Yong-Gu
    • Journal of Korean Library and Information Science Society
    • /
    • v.44 no.1
    • /
    • pp.27-47
    • /
    • 2013
  • This study investigated the classification performance of a kNN classifier using the feature selection methods based on document frequency(DF) and collection frequency(CF). The results of the experiments, which used HKIB-20000 data, were as follows. First, the feature selection methods that used high-frequency terms and removed low-frequency terms by the CF criterion achieved better classification performance than those using the DF criterion. Second, neither DF nor CF methods performed well when low-frequency terms were selected first in the feature selection process. Last, combining CF and DF criteria did not result in better classification performance than using the single feature selection criterion of DF or CF.

Discrimination & Current Usage of Traditional Furniture (고가구에 대한 인식도 및 현대적 사용실태 조사연구)

  • 박영순
    • Journal of the Korean Home Economics Association
    • /
    • v.25 no.1
    • /
    • pp.69-82
    • /
    • 1987
  • The purpose of this study was to investigate the degree of discrimination and current usage of traditional furniture by people in contemporary society. Interest and preference for traditional furniture were also examined. The major findings were; 1) The traditional furniture owned by respondents were mainly document chest(mungab), dining table(soban) and open etagre(sabang-takja). Book cases(chaikjang) were rarely owned. Function of some furniture such as single shelf chest(danchung-jang), kitchen cabinet(chantak) and desk(suban) have been changed. 2) Highly discriminated tiradtional furniture were document chest(mungab), wardrobe(chest-jang) and dining table(soban). The degree of discrimination of letter rack(gobi), kitchen cabinet(chantak) and bookcases(chaikjang), however, were very low. 3) There was significant relation between discriminating ability and interest for the furniture and the status of posession of it. 4) Some socio-demographic variables were related to distriminating ability of the furniture. The group in high educational and economic level showed high discriminating ability of the furniture than those in lower levels. 5) These was positive correlation between discriminating ability and interest. The more interest, the higher discriminating ability. 6) Most preferred traditional furniture at present were three shelved clothing chest(samchung-jang), document chest(mungab), wardrobe chest(euiguri-jang) and open etagere(sabang-takja).

  • PDF

Running a SCRUM project within a Document Driven Process: An Experimental Case Study Report (문서 지향적 프로세스에서의 SCRUM 프로젝트 적용: 실험 사례연구)

  • Sawyer, Jonathan;Lee, Seok-Won
    • Journal of KIISE
    • /
    • v.42 no.9
    • /
    • pp.1133-1146
    • /
    • 2015
  • This paper examines how a Computer Engineering Graduate student team ran their Advanced Software Engineering Capstone project using SCRUM. The environment provided contextual challenges in terms of the on-site customer and upfront requirements document, not uncommon in a document driven single-step methodology. The paper details the methodology and practices used to run the project, and reflects on some of the challenges faced by the members of a typical software team when transitioning to a SCRUM process. The paper concludes by evaluating the success of the techniques and practices compared to the Agile Manifesto and Henrik Kniberg's Scrum checklist. The project was undertaken at South Korea's Ajou University.

Text Summarization on Large-scale Vietnamese Datasets

  • Ti-Hon, Nguyen;Thanh-Nghi, Do
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.4
    • /
    • pp.309-316
    • /
    • 2022
  • This investigation is aimed at automatic text summarization on large-scale Vietnamese datasets. Vietnamese articles were collected from newspaper websites and plain text was extracted to build the dataset, that included 1,101,101 documents. Next, a new single-document extractive text summarization model was proposed to evaluate this dataset. In this summary model, the k-means algorithm is used to cluster the sentences of the input document using different text representations, such as BoW (bag-of-words), TF-IDF (term frequency - inverse document frequency), Word2Vec (Word-to-vector), Glove, and FastText. The summary algorithm then uses the trained k-means model to rank the candidate sentences and create a summary with the highest-ranked sentences. The empirical results of the F1-score achieved 51.91% ROUGE-1, 18.77% ROUGE-2 and 29.72% ROUGE-L, compared to 52.33% ROUGE-1, 16.17% ROUGE-2, and 33.09% ROUGE-L performed using a competitive abstractive model. The advantage of the proposed model is that it can perform well with O(n,k,p) = O(n(k+2/p)) + O(nlog2n) + O(np) + O(nk2) + O(k) time complexity.

A Study on the Operation of SURF in the Bolero System (볼레로 시스템상의 SURF의 운영에 관한 연구)

  • Jeon, Soon-Hwan
    • The Journal of Information Technology
    • /
    • v.6 no.4
    • /
    • pp.163-175
    • /
    • 2003
  • SURF is a compliance engine, checking document content against detail in an established agreement. It provides a single vehicle for handling documentary trade settlement, regardless of the risk profile and financing requirements of the parties involved. That is, SURF, a Value Added Service connected to the Core Messaging Platform, is a documentary trade settlement service. It offers users of the system automated document compliance checking and a tool to manage the workflow in connection with documentary trade settlement. The Service supports varying degrees of risk transfer between buyer, sellers and banks and supports transactions from open account to more complex Letters of Credit.

  • PDF

Bibliographic metadata development for the efficient information resource sharing (효율적 정보자원 공유를 위한 서지 메타데이터 XML DTD 개발)

  • Lee, Hye-Jin;Song, In-Seok
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2004.11a
    • /
    • pp.427-433
    • /
    • 2004
  • Most information providers are offering integrated retrieval service based on the bibliographic metadata and schema corresponding to each type of document which are developed in a distributed and independent way. However, it is difficult to maintain the relational consistency of those single heterogeneous databases even though they obey the metadata standard like MARC or MODS. It is the main reason that those standards are restricted to present the general property of document regardless of its type and not to applied to define the relationship of document types. Therefore, It is necessary to define a comprehensive meta model to associate the related databases in a systematic way so that the semantically common part of them can be easily shared and reused without any additional effort like conversion or mapping. In this paper, we first outline the document types for designing meta model by the empirical analysis of various data schema of main information providers. We propose then data element definition, metadata model and modularized XML DTD which support the efficient and consistent management of multiple ducument types.

  • PDF

An Experimental Study on Feature Selection Using Wikipedia for Text Categorization (위키피디아를 이용한 분류자질 선정에 관한 연구)

  • Kim, Yong-Hwan;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.155-171
    • /
    • 2012
  • In text categorization, core terms of an input document are hardly selected as classification features if they do not occur in a training document set. Besides, synonymous terms with the same concept are usually treated as different features. This study aims to improve text categorization performance by integrating synonyms into a single feature and by replacing input terms not in the training document set with the most similar term occurring in training documents using Wikipedia. For the selection of classification features, experiments were performed in various settings composed of three different conditions: the use of category information of non-training terms, the part of Wikipedia used for measuring term-term similarity, and the type of similarity measures. The categorization performance of a kNN classifier was improved by 0.35~1.85% in $F_1$ value in all the experimental settings when non-learning terms were replaced by the learning term with the highest similarity above the threshold value. Although the improvement ratio is not as high as expected, several semantic as well as structural devices of Wikipedia could be used for selecting more effective classification features.

Hierarchical Organization of Neural Agents for Distributed Information Retrieval (분산 정보 검색을 위한 신경망 에이전트의 계층적 구성)

  • Choi, Yong S.
    • The Journal of Korean Association of Computer Education
    • /
    • v.8 no.6
    • /
    • pp.113-121
    • /
    • 2005
  • Since documents on the Web are naturally partitioned into many document databases, the efficient information retrieval (IR) process requires identifying the document databases that are most likely to provide relevant documents to the query and then querying the identified document databases. We first introduce a neural net agent for such an efficient IR, and then propose the hierarchically organized multi-agent IR system in order to scale our agent with the large number of document databases. In this system, the hierarchical organization of neural net agents reduced the total training cost at an acceptable level without degrading the IR effectiveness in terms of precision and recall. In the experiment, we introduce two neural net IR systems based on single agent approach and multi-agent approach respectively, and evaluate the performance of those systems by comparing their experimental results to those of the conventional statistical systems.

  • PDF

Policy System of Data Access Control for Web Service (웹 서비스를 위한 데이터 접근 제어의 정책 시스템)

  • Jo, Sun-Moon;Chung, Kyung-Yong
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.11
    • /
    • pp.25-32
    • /
    • 2008
  • Access control techniques should be flexible enough to support all protection granularity levels. Since access control policies are very likely to be specified in relation to document types, it is necessary to properly manage a situation in which documents fail to be dealt with by the existing access control policies. In terms of XML documents, it is necessary to describe policies more flexibly beyond simple authorization and to consider access control methods which can be selected. This paper describes and designs the access control policy system for authorization for XML document access and for efficient management to suggest a way to use the capacity of XML itself. The system in this paper is primarily characterized by consideration of who would exercise what access privileges on a specific XML document and by good adjustment of organization-wide demands from a policy manager and a single document writer.