• Title/Summary/Keyword: Document research

Search Result 1,350, Processing Time 0.031 seconds

A Fast Algorithm for the k-Keyword Ordered Proximity Problem (순서를 고려하는 k-키워드 근접도 문제를 위한 빠른 알고리즘)

  • Kim, Jin-Wook
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.3
    • /
    • pp.281-288
    • /
    • 2010
  • In the web search engines, the proximity is used to compute the relevance of a document to the given query. There exist various research results about the proximity problems and the ordered proximity problems. In this paper, we present O(n) time algorithms for the k-keyword ordered proximity problems where n is the total number of occurrences of the k keywords in a document. Experimental results show that the proposed algorithms are about 1.2 times and over 3 times faster than the previous results when k=2 and k=5, respectively.

Generative probabilistic model with Dirichlet prior distribution for similarity analysis of research topic

  • Milyahilu, John;Kim, Jong Nam
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.4
    • /
    • pp.595-602
    • /
    • 2020
  • We propose a generative probabilistic model with Dirichlet prior distribution for topic modeling and text similarity analysis. It assigns a topic and calculates text correlation between documents within a corpus. It also provides posterior probabilities that are assigned to each topic of a document based on the prior distribution in the corpus. We then present a Gibbs sampling algorithm for inference about the posterior distribution and compute text correlation among 50 abstracts from the papers published by IEEE. We also conduct a supervised learning to set a benchmark that justifies the performance of the LDA (Latent Dirichlet Allocation). The experiments show that the accuracy for topic assignment to a certain document is 76% for LDA. The results for supervised learning show the accuracy of 61%, the precision of 93% and the f1-score of 96%. A discussion for experimental results indicates a thorough justification based on probabilities, distributions, evaluation metrics and correlation coefficients with respect to topic assignment.

A Comparison of Electronic book metadata formats and Development of Electronic Book of Korea Standard metadata (eBook 메타데이터 비교 및 한국전자책표준의 메타데이터 개발)

  • 김경옥;김성혁;임순범;최윤철
    • Proceedings of the CALSEC Conference
    • /
    • 2001.08a
    • /
    • pp.511-521
    • /
    • 2001
  • This paper is to develop metadata format for eBook document standard at Korea. Metadata formats of OEBF, JepaX and AAP were compared and analyzed on the criteria such as purpose, basic elements, characteristics, compatibility, extensibility and convertibility. EBKS metadata format based on Dublin Core was developed in terms of easy to use, resources descriptions and discovery, extensibility and compatibility between other metadata formats such as MARC and Dublin Core. Finally, research and revision direction of the eBook document standard were proposed for the future study.

  • PDF

A Study on the application of International Transport Law to electronic bill of lading (전자식(電子式) 선하증권(船荷證券)과 국제운송규칙(國際運送規則))

  • Yang, Jung-Ho
    • THE INTERNATIONAL COMMERCE & LAW REVIEW
    • /
    • v.20
    • /
    • pp.369-385
    • /
    • 2003
  • Contracts of carriage evidenced by bill of lading which are made between carrier and unidentified number of the shipper are to a large extent regulated by statute law such as Hague-Visby Rules and Hamburg Rules. These rules qualifies the contractual liberty of parties and especially restrains the carrier from introducing exemption from his liability beyond those admitted by the Rules. However, these Rules are applied only to goods in respect of which a bill of lading or similar document of title has been issued. In this reason, it is possible that liability of carrier in respect of goods shipped could become an issue where electronic bill of lading is used instead of paper bill of lading because electronic bill of lading is not generally recognised document of title in existing rule. Thus, this article discuss the relation between the carrier who create electronic bill of lading and the Rules regulating liability of carrier. Also, new Rules which has been examining in UNCITRAL will be introduced.

  • PDF

A Design and Implementation of XML Repository System based on EJB Components (EJB 컴포넌트 기반의 XML 저장관리시스템 설계 및 구현)

  • 이정수;정상혁;주경수
    • Journal of Internet Computing and Services
    • /
    • v.3 no.3
    • /
    • pp.75-85
    • /
    • 2002
  • Nowadays for reliable software and cost reduction there are many research works on software development based on component, One of the challenge in designing a component-based system is determining which components are required and where they fit in the overall system architecture. In this paper, we developed three EJB components, the first is for transforming XML DTD to relational database schema, the second is for storing XML document in relational database, and the third is for transforming XML document by retrieving relational database, By assembling these three components, we developed XML Repository system finally.

  • PDF

문서지문기법을 이용한 웹 문서의 자동 분류

  • Kim Jin-Hwa
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.407-429
    • /
    • 2004
  • As documents in webs are increasing explosively due to the rapid development of electronic documents, an efficient system classifying documents automatically is required. In this study, a new document classification method, which is called Document Finger Print Method, is suggested to classify web documents automatically and efficiently. The performance of the suggested method is evaluated alone with other existing methods such as key words based method, weighted key words based method, neural networks, and decision trees. An experiment is designed with 10 documents categories and 59 randomly selected words. The result shows that the suggested algorithm has a superior classifying performance compared to other methods. The most important advantage of this method is that the suggested method works well without the size limits of the number of words in documents.

  • PDF

Information Retrieval System : Condor (콘도르 정보 검색 시스템)

  • 박순철;안동언
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.8 no.4
    • /
    • pp.31-37
    • /
    • 2003
  • This paper is a review of the large-scale information retrieval system, CONDOR. This system was developed by the consortium that consists of Chonbuk National University, Searchline Co. and Carnegie Mellon University. This system is based on the probabilistic model of information retrieval systems. The multi-language query processing, online document summarization based on query and dynamic hierarchy clustering of this system make difference of other systems. We test this system with 30 million web documents successfully.

  • PDF

Developing A Document-based Work-flow Modeling Support System A Case-based Reasoning Approach

  • Kim, Jaeho;Woojong Suh;Lee, Heeseok
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.06a
    • /
    • pp.445-454
    • /
    • 2001
  • A workflow model is useful fur business process analysis and has often been implemented for office automation through information technology. Accordingly, the results of workflow modeling need to be systematically managed as information assets. In order to manage the modeling process effectively, it is necessary to enhance the efficiency of their reuse. Therefore, this paper creates a Document-barred Workflow Modeling Support System (DWMSS) using a case-based reasoning (CBR) approach. It proposes a system architecture, and the corresponding modeling process is developed. Furthermore, a repository, which consists of a case base and vocabulary base, is built. A carte study is illustrated to demonstrate the usefulness of th is system.

  • PDF

A Method on Associated Document Recommendation with Word Correlation Weights (단어 연관성 가중치를 적용한 연관 문서 추천 방법)

  • Kim, Seonmi;Na, InSeop;Shin, Juhyun
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.2
    • /
    • pp.250-259
    • /
    • 2019
  • Big data processing technology and artificial intelligence (AI) are increasingly attracting attention. Natural language processing is an important research area of artificial intelligence. In this paper, we use Korean news articles to extract topic distributions in documents and word distribution vectors in topics through LDA-based Topic Modeling. Then, we use Word2vec to vector words, and generate a weight matrix to derive the relevance SCORE considering the semantic relationship between the words. We propose a way to recommend documents in order of high score.

Research on the Hybrid Paragraph Detection System Using Syntactic-Semantic Analysis (구문의미 분석을 활용한 복합 문단구분 시스템에 대한 연구)

  • Kang, Won Seog
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.1
    • /
    • pp.106-116
    • /
    • 2021
  • To increase the quality of the system in the subjective-type question grading and document classification, we need the paragraph detection. But it is not easy because it is accompanied by semantic analysis. Many researches on the paragraph detection solve the detection problem using the word based clustering method. However, the word based method can not use the order and dependency relation between words. This paper suggests the paragraph detection system using syntactic-semantic relation between words with the Korean syntactic-semantic analysis. This system is the hybrid system of word based, concept based, and syntactic-semantic tree based detection. The experiment result of the system shows it has the better result than the word based system. This system will be utilized in Korean subjective question grading and document classification.