• Title/Summary/Keyword: Virtual Relevant Document

Search Result 5, Processing Time 0.016 seconds

Performance Improvement by a Virtual Documents Technique in Text Categorization (문서분류에서 가상문서기법을 이용한 성능 향상)

  • Lee, Kyung-Soon;An, Dong-Un
    • The KIPS Transactions:PartB
    • /
    • v.11B no.4
    • /
    • pp.501-508
    • /
    • 2004
  • This paper proposes a virtual relevant document technique in the teaming phase for text categorization. The method uses a simple transformation of relevant documents, i.e. making virtual documents by combining document pairs in the training set. The virtual document produced by this method has the enriched term vector space, with greater weights for the terms that co-occur in two relevant documents. The experimental results showed a significant improvement over the baseline, which proves the usefulness of the proposed method: 71% improvement on TREC-11 filtering test collection and 11% improvement on Routers-21578 test set for the topics with less than 100 relevant documents in the micro average F1. The result analysis indicates that the addition of virtual relevant documents contributes to the steady improvement of the performance.

An Email Vaccine Cloud System for Detecting Malcode-Bearing Documents (악성코드 은닉 문서파일 탐지를 위한 이메일 백신 클라우드 시스템)

  • Park, Choon-Sik
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.5
    • /
    • pp.754-762
    • /
    • 2010
  • Nowadays, email-based targeted attacks using malcode-bearing documents have been steadily increased. To improve the success rate of the attack and avoid anti-viruses, attackers mainly employ zero-day exploits and relevant social engineering techniques. In this paper, we propose an architecture of the email vaccine cloud system to prevent targeted attacks using malcode-bearing documents. The system extracts attached document files from email messages, performs behavior analysis as well as signature-based detection in the virtual machine environment, and completely removes malicious documents from the messages. In the process of behavior analysis, the documents are regarded as malicious ones in cases of creating executable files, launching new processes, accessing critical registry entries, connecting to the Internet. The email vaccine cloud system will help prevent various cyber terrors such as information leakages by preventing email based targeted attacks.

Design and Implementation of On-line Standards Development System on the World Wide Web (WWW상에서의 온라인 정보통신표준 개발 시스템 설계 및 구현)

  • 구경철;김형준;박기식;송기평;조인준;정회경
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.2 no.4
    • /
    • pp.559-573
    • /
    • 1998
  • Recently Standards Developments Organizations (SDO$\_S$) in the field of Information and Communication recognize that "More new and more complex standards should be developed in shorter time". To cope with this challenge they try to construct Standards Information Cooperation Network (SICN) or Electronic Document Handling (EDH) systems for efficient standards development process. This paper presents the design and implementation of an Extranet based Web system dedicated to effective on-line standards making environments. The system, which is called SICN (Standards Information Cooperation Network), is a workflow-based network application created with a view to fostering faster standards development with functionalities such as an electronic signature mechanism, electronic voting, comment gathering and dynamic links for ready retrieval of standards information stored in a database. This paper also describes the concept of a VSDO (Virtual Standards Development Organization) that supports all the features needed by the relevant standards making bodies to carry out their activities in dynamic on-line environments.ironments.

  • PDF

Model Design and Applicability Analysis of Interactive Electronic Technical Manual for Planning Stage of Construction Projects (건설공사 기획단계 전자매뉴얼의 적용 모형 구성 및 효과 분석)

  • Kwak, Joong-Min;Kang, Leen-Seok
    • Land and Housing Review
    • /
    • v.12 no.2
    • /
    • pp.121-139
    • /
    • 2021
  • Technical documents in the construction field are changing from paper documents to electronic ones. As a result, the industry witnesses a trend of using portable electronic devices in searching or retrieving necessary information such as relevant regulations. Despite the improvement in the accessibility to general technical documents, a limitation is still found in accessing the electronic documents on the regulations. We see the barrier for field engineers to enhance their technical knowledge. One of major barriers is that videos, animations, and virtual reality information to enhance the visual understanding of technical content related to regulations are not linked. It is the interactive electronic technical manual (IETM) that can address such issues. The IETM is an electronic document system that enables real-time information acquisition while operating in the form of conversations with users by linking multimedia functions to document types such as specifications and guidelines. This study establishes a model of the IETM that can be operated in the planning stage of a construction project. The study also verifies its usability with a hypothetical case study. This study aims to improve the usability of the IETM in the construction project by analyzing the application effect of the IETM using the AHP technique.

Methods for Integration of Documents using Hierarchical Structure based on the Formal Concept Analysis (FCA 기반 계층적 구조를 이용한 문서 통합 기법)

  • Kim, Tae-Hwan;Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.63-77
    • /
    • 2011
  • The World Wide Web is a very large distributed digital information space. From its origins in 1991, the web has grown to encompass diverse information resources as personal home pasges, online digital libraries and virtual museums. Some estimates suggest that the web currently includes over 500 billion pages in the deep web. The ability to search and retrieve information from the web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte.syze precompiled web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not nessarily appear at the top of the query output order. Also, current search tools can not retrieve the documents related with retrieved document from gigantic amount of documents. The most important problem for lots of current searching systems is to increase the quality of search. It means to provide related documents or decrease the number of unrelated documents as low as possible in the results of search. For this problem, CiteSeer proposed the ACI (Autonomous Citation Indexing) of the articles on the World Wide Web. A "citation index" indexes the links between articles that researchers make when they cite other articles. Citation indexes are very useful for a number of purposes, including literature search and analysis of the academic literature. For details of this work, references contained in academic articles are used to give credit to previous work in the literature and provide a link between the "citing" and "cited" articles. A citation index indexes the citations that an article makes, linking the articleswith the cited works. Citation indexes were originally designed mainly for information retrieval. The citation links allow navigating the literature in unique ways. Papers can be located independent of language, and words in thetitle, keywords or document. A citation index allows navigation backward in time (the list of cited articles) and forwardin time (which subsequent articles cite the current article?) But CiteSeer can not indexes the links between articles that researchers doesn't make. Because it indexes the links between articles that only researchers make when they cite other articles. Also, CiteSeer is not easy to scalability. Because CiteSeer can not indexes the links between articles that researchers doesn't make. All these problems make us orient for designing more effective search system. This paper shows a method that extracts subject and predicate per each sentence in documents. A document will be changed into the tabular form that extracted predicate checked value of possible subject and object. We make a hierarchical graph of a document using the table and then integrate graphs of documents. The graph of entire documents calculates the area of document as compared with integrated documents. We mark relation among the documents as compared with the area of documents. Also it proposes a method for structural integration of documents that retrieves documents from the graph. It makes that the user can find information easier. We compared the performance of the proposed approaches with lucene search engine using the formulas for ranking. As a result, the F.measure is about 60% and it is better as about 15%.