• Title/Summary/Keyword: Intelligent document processing

Search Result 45, Processing Time 0.024 seconds

An Automatic Summarization of Call-For-Paper Documents Using a 2-Phase hidden Markov Model (2단계 은닉 마코프 모델을 이용한 논문 모집 공고의 자동 요약)

  • Kim, Jeong-Hyun;Park, Seong-Bae;Lee, Sang-Jo;Park, Se-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.2
    • /
    • pp.243-250
    • /
    • 2008
  • This paper proposes a system which extracts necessary information from call-for-paper (CFP) documents using a hidden Markov model (HMM). Even though a CFP does not follow a strict form, there is, in general, a relatively-fixed sequence of information within most CFPs. Therefore, a hiden Markov model is adopted to analyze CFPs which has an advantage of processing consecutive data. However, when CFPs are intuitively modeled with a hidden Markov model, a problem arises that the boundaries of the information are not recognized accurately. In order to solve this problem, this paper proposes a two-phrase hidden Markov model. In the first step, the P-HMM (Phrase hidden Markov model) which models a document with phrases recognizes CFP documents locally. Then, the D-HMM (Document hidden Markov model) grasps the overall structure and information flow of the document. The experiments over 400 CFP documents grathered on Web result in 0.49 of F-score. This performance implies 0.15 of F-measure improvement over the HMM which is intuitively modeled.

Linear Path Query Processing using Backward Label Path on XML Documents (역방향 레이블 경로를 이용한 XML 문서의 선형 경로 질의 처리)

  • Park, Chung-Hee;Koo, Heung-Seo;Lee, Sang-Joon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.6
    • /
    • pp.766-772
    • /
    • 2007
  • As XML is widely used, many researches on the XML storage and query processing have been done. But, previous works on path query processing have mainly focused on the storage and retrieval methods for a large XML document or XML documents had a same DTD. Those researches did not efficiently process partial match queries on the differently-structured document set. To resolve the problem, we suggested a new index structure using relational table. The method constructs the $B^+$-tree index using backward label paths instead of forward label paths used in previous researches for storing path information and allows for finding the label paths that match the partial match queries efficiently using it when process the queries.

Retrieval of Broadcast News Using Audio Content Analysis

  • Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.3E
    • /
    • pp.74-79
    • /
    • 2007
  • In this paper, we report our recent work on a indexing and retrieval system of broadcast news using audio content analysis. Key issues addressed in this work are two major parts of the audio indexing system: anchorperson detection based on audio segmentation, and phone-based spoken document retrieval, developed in the framework of the emerging MPEG-7 standard. Experiments are conducted on a database of Britisch broadcast news videos. We discuss the development of the retrieval system, and the evaluation of each part and the retrieval system.

Automated Essay Grading: An Application For Historical Malay Text

  • Syed Mustapha, S.M.F.D;Idris, N.
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.237-245
    • /
    • 2001
  • Automated essay grading has been proposed for over thirty years. Only recently have practical implementations been constructed and tested. This paper investigated the role of the nearest-neighbour algorithm within the information retrieval as a way of grading the essay automatically called Automated Essay Grading System. It intended to offer teachers an individualized assistance in grading the student\`s essay. The system involved several processes, which are the indexing, the structuring of the model answer and the grade processing. The indexing process comprised the document indexing and query processing which are mainly used for representing the documents and the query. Structuring the model answer is actually preparing the marking scheme and the grade processing is the process of assessing the essay. To test the effectiveness of the developed algorithms, the algorithms are tested against the History text in Malay. The result showed that th information retrieval and the nearest-neighbour algorithm are practical combination that offer acceptable performance for grading the essay.

  • PDF

Intelligent Character Recognition System for Account Payable by using SVM and RBF Kernel

  • Farooq, Muhammad Umer;Kazi, Abdul Karim;Latif, Mustafa;Alauddin, Shoaib;Kisa-e-Zehra, Kisa-e-Zehra;Baig, Mirza Adnan
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.11
    • /
    • pp.213-221
    • /
    • 2022
  • Intelligent Character Recognition System for Account Payable (ICRS AP) Automation represents the process of capturing text from scanned invoices and extracting the key fields from invoices and storing the captured fields into properly structured document format. ICRS plays a very critical role in invoice data streamlining, we are interested in data like Vendor Name, Purchase Order Number, Due Date, Total Amount, Payee Name, etc. As companies attempt to cut costs and upgrade their processes, accounts payable (A/P) is an example of a paper-intensive procedure. Invoice processing is a possible candidate for digitization. Most of the companies dealing with an enormous number of invoices, these manual invoice matching procedures start to show their limitations. Receiving a paper invoice and matching it to a purchase order (PO) and general ledger (GL) code can be difficult for businesses. Lack of automation leads to more serious company issues such as accruals for financial close, excessive labor costs, and a lack of insight into corporate expenditures. The proposed system offers tighter control on their invoice processing to make a better and more appropriate decision. AP automation solutions provide tighter controls, quicker clearances, smart payments, and real-time access to transactional data, allowing financial managers to make better and wiser decisions for the bottom line of their organizations. An Intelligent Character Recognition System for AP Automation is a process of extricating fields like Vendor Name, Purchase Order Number, Due Date, Total Amount, Payee Name, etc. based on their x-axis and y-axis position coordinates.

A Natural Language Question Answering System-an Application for e-learning

  • Gupta, Akash;Rajaraman, Prof. V.
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.285-291
    • /
    • 2001
  • This paper describes a natural language question answering system that can be used by students in getting as solution to their queries. Unlike AI question answering system that focus on the generation of new answers, the present system retrieves existing ones from question-answer files. Unlike information retrieval approaches that rely on a purely lexical metric of similarity between query and document, it uses a semantic knowledge base (WordNet) to improve its ability to match question. Paper describes the design and the current implementation of the system as an intelligent tutoring system. Main drawback of the existing tutoring systems is that the computer poses a question to the students and guides them in reaching the solution to the problem. In the present approach, a student asks any question related to the topic and gets a suitable reply. Based on his query, he can either get a direct answer to his question or a set of questions (to a maximum of 3 or 4) which bear the greatest resemblance to the user input. We further analyze-application fields for such kind of a system and discuss the scope for future research in this area.

  • PDF

PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Genomics & Informatics
    • /
    • v.2 no.2
    • /
    • pp.99-106
    • /
    • 2004
  • In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein­protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.

R&D of Intelligent Document Recognition Library for utilizing image data (이미지데이터 활용을 위한 지능형 인식 라이브러리 연구 개발)

  • Kwag, Hee Kue;Kim, Sung Hun;Lee, Jung Woo;Yoo, Ji Hun;Lee, Hyun Joo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.329-330
    • /
    • 2009
  • 본 연구는 공공기관이 소장한 이미지데이터 활용성을 높이기 위한 전문검색서비스 구현 시 필수적인 문서인식시스템의 고도화에 있으며, 주요한 연구방향은 공공기관이 소장하고 있는 데이터의 분석을 통해 이미지분석 기술 및 라이브러리를 개발하고 특화된 지식베이스를 구성하는 것이다. 또한, 향후 확장성을 고려하여 지식베이스를 지속적으로 관리할 수 있는 툴을 개발하는 것이다. 본 연구는 현재 지능형 인식 라이브러리를 결합한 프로토타입(prototype) 시스템 개발이 완료된 바, 방대한 국가기록원내 소장자료를 대상으로 다양한 성능평가를 위한 테스트베드 구축이 진행되고 있다.

A Study on the Development of LDA Algorithm-Based Financial Technology Roadmap Using Patent Data

  • Koopo KWON;Kyounghak LEE
    • Korean Journal of Artificial Intelligence
    • /
    • v.12 no.3
    • /
    • pp.17-24
    • /
    • 2024
  • This study aims to derive a technology development roadmap in related fields by utilizing patent documents of financial technology. To this end, patent documents are extracted by dragging technical keywords from prior research and related reports on financial technology. By applying the TF-IDF (Term Frequency-Inverse Document Frequency) technique in the extracted patent document, which is a text mining technique, to the extracted patent documents, the Latent Dirichlet Allocation (LDA) algorithm was applied to identify the keywords and identify the topics of the core technologies of financial technology. Based on the proportion of topics by year, which is the result of LDA, promising technology fields and convergence fields were identified through trend analysis and similarity analysis between topics. A first-stage technology development roadmap for technology field development and a second-stage technology development roadmap for convergence were derived through network analysis about the technology data-based integrated management system of the high-dimensional payment system using RF and intelligent cards, as well as the security processing methodology for data information and network payment, which are identified financial technology fields. The proposed method can serve as a sufficient reason basis for developing financial technology R&D strategies and technology roadmaps.

W3C XQuery Update facility on SQL hosts (관계형 테이블을 이용한 W3C XQuery 변경 기능의 지원)

  • Hong, Dong-Kweon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.3
    • /
    • pp.306-310
    • /
    • 2008
  • XQuery is a new recommendation for XML query. As an efforts for extending XQuery capabilities XML insertion and deletion are being studied and its standardization are going on. Initially XML databases are developed simply for XML document management. Now their functions are extending to OLTP. In this paper we are adding updating functions to XQuery processing system that is developed only for XQuery retrievals. We suggest the structure of tables, numbering schemes for hierarchical structures, and the methods for SQL translations for XQuery updates.