• Title/Summary/Keyword: Document research

Search Result 1,342, Processing Time 0.03 seconds

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

A New Policy Study on Technical Document Review Changes and User-Centric Medical Device Advertising (사용자 중심의 의료기기 광고를 위한 기술문서 심사 변경의 새로운 정책 연구)

  • Ahn, Dae Ik;Ryu, Gyu Ha
    • Journal of Biomedical Engineering Research
    • /
    • v.42 no.1
    • /
    • pp.7-17
    • /
    • 2021
  • In the case of domestic medical device advertisements, it is possible to proceed with the advertisement after medical device certification, and pre-deliberation is possible based on the medical device technical document. However, there are some medical device advertisements that stakeholders in administrative procedures have no choice but to misunderstand in customs and laws that do not consider users. In addition, medical equipment and the pre-deliberation system were judged to be unconstitutional, and unconstitutional decisions were made in accordance with the principle of prohibiting pre-censorship based on the Constitution. This is because in domestic medical device advertisements, structural contradictions and user damage occur in the central structure of each stakeholder. It is necessary to reestablish stakeholder relationships, increase water solubility from customs and laws, and seek new policy proposals. In this study, we reestablish relationships with stakeholders by applying the Autopoiesis theory, and present the grounds and directions that can prevent hype and misidentified advertisements through the establishment of user-centered policies, and the measures to be taken by the Constitutional Court unconstitutional decision.

Hierarchical Automatic Classification of News Articles based on Association Rules (연관규칙을 이용한 뉴스기사의 계층적 자동분류기법)

  • Joo, Kil-Hong;Shin, Eun-Young;Lee, Joo-Il;Lee, Won-Suk
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.6
    • /
    • pp.730-741
    • /
    • 2011
  • With the development of the internet and computer technology, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The conventional document categorization method used only the keywords of related documents for document classification. However, this paper proposed keyword extraction method of based on association rule. This method extracts a set of related keywords which are involved in document's category and classifies representative keyword by using the classification rule proposed in this paper. In addition, this paper proposed the preprocessing method for efficient keywords creation and predicted the new document's category. We can design the classifier and measure the performance throughout the experiment to increase the profile's classification performance. When predicting the category, substituting all the classification rules one by one is the major reason to decrease the process performance in a profile. Finally, this paper suggested automatically categorizing plan which can be applied to hierarchical category architecture, extended from simple category architecture.

A Study on the Acceptance Conditions of a Freight Forwarder's Transport Document under UCP (신용장통일규칙(UCP)상 운송주선인 운송서류의 수리요건에 관한 연구)

  • Kang, Ho-Kyung
    • THE INTERNATIONAL COMMERCE & LAW REVIEW
    • /
    • v.51
    • /
    • pp.285-313
    • /
    • 2011
  • There can be analyzed severally on the acceptance conditions of freight forwarder's transport document under UCP. First, Bills of Lading issued by forwarding agents will be refused. This can be seen in the article 20 of 1933 Revision UCP(Brochure 82) and the article 20 of 1951 Revision UCP(Brochure 151). Second, Unless specifically authorized in the credit, Bills of Lading issued by forwarding agent will be rejected. It is prescribed in the front part (a) of article 17 of 1962 Revision UCP(Brochure 222) and the article 19 of 1974 Revision UCP(Publication No. 290). Third, Acceptance conditions are different according to the type of transport documents, that is either Bill of Lading or not. It is prescribed in the art 25 and article 26 of 1983 Revision UCP. Unless otherwise stipulated in the credit, transport document issued by a freight forwarder will be rejected unless it is the FIATA Combined Transport Bill of Lading approved by the International Chamber of Commerce or otherwise indicates that it is issued by a freight forwarder acting as a carrier or agent of a named carrier. On the other hand, unless otherwise stipulated in the credit, marine bill of lading issued by a freight forwarder will be rejected, unless it indicates that it is issued by such freight forwarder acting as a carrier, or as the agent of a named carrier. Fourth, transport documents issued by a freight forwarder will be accepted. This can be found in the article 30 of 1993 Revision UCP(ICC Publication No. 500) and the article 14 l of 2007 Revision UCP(ICC Publication No. 600). According to the former unless otherwise authorized in the Credit, transport document issued by a freight forwarder will only be accepted if it is appears on its face to indicate the name of the freight forwarder as a carrier or multimodal transport operator or its agent. The latter prescribed that a transport document will be accepted if it is issued by a freight forwarder by a agent of carrier or freight forwarder itself.

  • PDF

The Document Clustering using Multi-Objective Genetic Algorithms (다목적 유전자 알고리즘을 이용한문서 클러스터링)

  • Lee, Jung-Song;Park, Soon-Cheol
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.2
    • /
    • pp.57-64
    • /
    • 2012
  • In this paper, the multi-objective genetic algorithm is proposed for the document clustering which is important in the text mining field. The most important function in the document clustering algorithm is to group the similar documents in a corpus. So far, the k-means clustering and genetic algorithms are much in progress in this field. However, the k-means clustering depends too much on the initial centroid, the genetic algorithm has the disadvantage of coming off in the local optimal value easily according to the fitness function. In this paper, the multi-objective genetic algorithm is applied to the document clustering in order to complement these disadvantages while its accuracy is analyzed and compared to the existing algorithms. In our experimental results, the multi-objective genetic algorithm introduced in this paper shows the accuracy improvement which is superior to the k-means clustering(about 20 %) and the general genetic algorithm (about 17 %) for the document clustering.

User-based Document Summarization using Non-negative Matrix Factorization and Wikipedia (비음수행렬분해와 위키피디아를 이용한 사용자기반의 문서요약)

  • Park, Sun;Jeong, Min-A;Lee, Seong-Ro
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.2
    • /
    • pp.53-60
    • /
    • 2012
  • In this paper, we proposes a new document summarization method using the expanded query by wikipedia and the semantic feature representing inherent structure of document set. The proposed method can expand the query from user's initial query using the relevance feedback based on wikipedia in order to reflect the user require. It can well represent the inherent structure of documents using the semantic feature by the non-negative matrix factorization (NMF). In addition, it can reduce the semantic gap between the user require and the result of document summarization to extract the meaningful sentences using the expanded query and semantic features. The experimental results demonstrate that the proposed method achieves better performance than the other methods to summary document.

Development of Ontology for Intelligent Document Transformation System (지능형 문서변환시스템을 위한 온톨로지구축)

  • Lim, Sung-Shin;Lee, Seok-Yong;Park, Nam-Kyu;Seo, Chang-Gab
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.1128-1131
    • /
    • 2005
  • The document transformation system is more widely used in order to transform business documents efficiently in diverse organization. In established researches on document transformation systems have been carried mainly focused on XML however, it is not only transformed XML form but also EDI or local form in realistic import and export process. Particularly, in the most completed research relate on document transformation, they used ontology to get rid of non-efficiency in the connection of XML schema by manual. Hence, those researches are lack of features, which are construct and modify the domain ontology automatically and the size wasn't enough to realize itself. In this paper we study development of ontology and basic system, which is critical in intelligent document conversion system. And we develop an ontology with editor can be modified and complemented by users, as well as we make it used in real import and export business process.

  • PDF

Security Elevation of XML Document Using DTD Digital Signature (DTD 전자서명을 이용한 XML문서의 보안성 향상)

  • Park, Dou-Joon;Min, Hye-Lan;Lee, Joon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.1
    • /
    • pp.1080-1083
    • /
    • 2005
  • Can speak that DTD is meta data that define meaning of expressed data on XML document. Therefore, in case DTD information is damaged this information to base security of XML document dangerous. Not that attach digital signature on XML document at send-receive process of XML document in this research, proposed method to attach digital signature to DTD. As reading DTD file to end first, do parsing, and store abstracted element or attribute entitys in hash table. Read hash table and achieve message digest if parsing is ended. Compose and create digital signature with individual key after achievement. When sign digital, problem that create entirely other digest cost because do not examine about order that change at message digest process is happened. This solved by method to create DTD's digital signature using DOM that can embody tree structure for standard structure and document.

  • PDF

A Storage and Retrieval System for Structured SGML Documents using Grove (Grove를 이용한 구조적 SGML문서의 저장 및 검색)

  • Kim, Hak-Gyoon;Cho, Sung-Bae
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.5
    • /
    • pp.501-509
    • /
    • 2002
  • SGML(ISO 8879) has been proliferated to support various document styles and to transfer documents into different platforms. SGML documents have logical structure information in addition to contents. As SGML documents are widely used, there is an increasing need for database storage and retrieval system using the logical structure of documents. However. traditional search engines using document indexes cannot exploit the logical structure. In this Paper, we have developed an SGML document storage system, which is DTD-independent and store the document type and the document instance separately by using Grove which is the document model for DSSSL and HyTime. We have used the Object Store, an object-oriented DBMS, to store the structure information appropriately without any loss of structural information. Also, we have supported a index structure for search efficiency like the relational DBMS, and constructed an effective user interface which combines content-based search with structure-based search.