• Title/Summary/Keyword: Document research

Search Result 1,342, Processing Time 0.029 seconds

Document classification using a deep neural network in text mining (텍스트 마이닝에서 심층 신경망을 이용한 문서 분류)

  • Lee, Bo-Hui;Lee, Su-Jin;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.615-625
    • /
    • 2020
  • The document-term frequency matrix is a term extracted from documents in which the group information exists in text mining. In this study, we generated the document-term frequency matrix for document classification according to research field. We applied the traditional term weighting function term frequency-inverse document frequency (TF-IDF) to the generated document-term frequency matrix. In addition, we applied term frequency-inverse gravity moment (TF-IGM). We also generated a document-keyword weighted matrix by extracting keywords to improve the document classification accuracy. Based on the keywords matrix extracted, we classify documents using a deep neural network. In order to find the optimal model in the deep neural network, the accuracy of document classification was verified by changing the number of hidden layers and hidden nodes. Consequently, the model with eight hidden layers showed the highest accuracy and all TF-IGM document classification accuracy (according to parameter changes) were higher than TF-IDF. In addition, the deep neural network was confirmed to have better accuracy than the support vector machine. Therefore, we propose a method to apply TF-IGM and a deep neural network in the document classification.

A study on the problems of transport document as a proof of delivery on INCOTERMS 2000 (매도인(賣渡人)이 제공하는 인도증빙서류(引渡證憑書類)의 문제점(問題點)에 관한 연구(硏究) (INCOTERMS 2000을 중심(中心)으로))

  • Oh, Won-Suk
    • THE INTERNATIONAL COMMERCE & LAW REVIEW
    • /
    • v.14
    • /
    • pp.7-35
    • /
    • 2000
  • The purpose of this paper is to examine the meanings of delivery of each trade term in INCOTERMS 2000, to investigate various kinds of transport document as a proof of delivery, and finally to find their problems. As a result of examination, following problems are considered to happen practically. First, a multimodal transport document referred in FOB term seems to be unappropriate because FOB term can be used in sea or inland waterway transport. Second, Assuming resale in transit in CFR or CIF term, non-negotiable Sea Waybill seems to be inappropriate. Third, As Sea Waybill is not a document of title, it can not be a security when the bank negotiate seller's draft. Fourth, INCOTERMS 2000 deleted the reference to charter party in CFR or CIF term. This deletion may raise any legal problems for the liabilities of carrier when the contradictions happen between the charter party B/L and charter party. Finally, if CFR or CIF means symbolic delivery, other documents besides B/L can not be a symbols of goods.

  • PDF

A Study on the Improvement of Retrieval Effectiveness to Clustered and Filtered Document through Query Expansion (질의어 확장에 기반을 둔 클러스터링 및 필터링 문서의 검색효율 제고에 관한 연구)

  • 노동조
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.14 no.1
    • /
    • pp.219-230
    • /
    • 2003
  • The purpose of this study is to improve of retrieval effectiveness to clustered and filtered document through query expansion. The result of this research prove that extended queries and documents, information in encyclopedia, clustering and filtering techniques are effective to promote retrieval effectiveness.

  • PDF

SysML-based Document Modeling Case (SysML 기반 문서 모델링 사례)

  • Lee, Taekyong;Cha, Jae-Min;Kim, Joon-Young;Salim, Shelly
    • Journal of the Korean Society of Systems Engineering
    • /
    • v.14 no.2
    • /
    • pp.8-15
    • /
    • 2018
  • In traditional Document Based Configuration Management(DBCM) environment, changes in a system's configurations are hard to be reflected to existing engineering documents. This nature of DBCM triggers unconformities of system configurations which could become great risks. Model-based Configuration Management(MBCM) has been introduced to solve the problem of DBCM by managing system's configurations through an unified model. Therefore, it is important to model engineering documents in a general modeling language, down to low-level information items to develop traceability and flexibility of a system's engineering information. So, in the research, to explore the possibility of Model-based Approach(MBA) in the field of configuration management, a development of a systems requirement document model using SysML based Views & Viewpoints concept has been studied.

Document Structure Understanding on Subjects Registration Table

  • Ito, Yuichi;Ohno, Masanaga;Tsuruoka, Shinji;Yoshikawa, Tomohiro;Tsuyoshi, Shinogi
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.571-574
    • /
    • 2003
  • This research is aimed to automate the generating process of the database from paper based table forms like this work. The registration table has so complicate table structures, ana in this research we used the registration tables as an example of general table structure understanding. We propose a table structure understanding system for some table types, and it has some steps. The first step is that the document images on paper are read from the image scanner. The second step is that a document image segments into some tables. In the third step, the character strings is extracted using image processing technology and the property of the character strings is determined. And the structured database is generated automatically. The proposed system consists of two systems. "Master document generation system" is used for the table form definition, and it doesn′t include the handwritten characters. "Structure analysis system for complete d table" is used for the written form, and it analyzes the table form filled in the handwritten character. We implemented the system using MS Visual C++ on Windows, and it can get the correct extraction rate 98% among 51 registration tables written by the different students.

  • PDF

A Study on the Korean Diaspora Information Resource Management (코리안 디아스포라 정보자원관리에 관한 연구)

  • Chang, Woo-Kwon
    • Journal of Korean Library and Information Science Society
    • /
    • v.43 no.4
    • /
    • pp.403-425
    • /
    • 2012
  • This research aims to present a development plan on the Korean diaspora information resource management in a viewpoint document information and archives. This study consists of two aspects : a document investigation based on Korean diaspora and information resource and in a practical examine based on a spot of life, settlement, and movement of overseas. A lot of Korea Diaspora have immigrated to Japan, China and Russia mostly with other people in the political and economic dominant cause. They are produced a various of document information and was formed information resource management and archives in a library, a newspaper office and a publishing company. The result of this research was looked forward to help to R&D of information resource management in values and competencies for Korean diaspora.

A Study on the DB-IR Integration: Per-Document Basis Online Index Maintenance

  • Jin, Du-Seok;Jung, Hoe-Kyung
    • Journal of information and communication convergence engineering
    • /
    • v.7 no.3
    • /
    • pp.275-280
    • /
    • 2009
  • While database(DB) and information retrieval(IR) have been developed independently, there have been emerging requirements that both data management and efficient text retrieval should be supported simultaneously in an information system such as health care, customer support, XML data management, and digital libraries. The great divide between DB and IR has caused different manners in index maintenance for newly arriving documents. While DB has extended its SQL layer to cope with text fields due to lack of intact mechanism to build IR-like index, IR usually treats a block of new documents as a logical unit of index maintenance since it has no concept of integrity constraint. However, In the DB-IR integrations, a transaction on adding or updating a document should include maintenance of the posting lists accompanied by the document. Although DB-IR integration has been budded in the research filed, the issue will remain difficult and rewarding areas for a while. One of the primary reasons is lack of efficient online transactional index maintenance. In this paper, performance of a few strategies for per-document basis transactional index maintenance - direct index update, pulsing auxiliary index and posting segmentation index - will be evaluated. The result shows that the pulsing auxiliary strategy and posting segmentation indexing scheme, can be a challenging candidates for text field indexing in DB-IR integration.

Access Control Mechanism for CouchDB

  • Ashwaq A., Al-otaibi;Reem M., Alotaibi;Nermin, Hamza
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.12
    • /
    • pp.107-115
    • /
    • 2022
  • Recently, big data applications need another database different from the Relation database. NoSQL databases are used to save and handle massive amounts of data. NoSQL databases have many advantages over traditional databases like flexibility, efficiently processing data, scalability, and dynamic schemas. Most of the current applications are based on the web, and the size of data is in increasing. NoSQL databases are expected to be used on a more and large scale in the future. However, NoSQL suffers from many security issues, and one of them is access control. Many recent applications need Fine-Grained Access control (FGAC). The integration of the NoSQL databases with FGAC will increase their usability in various fields. It will offer customized data protection levels and enhance security in NoSQL databases. There are different NoSQL database models, and a document-based database is one type of them. In this research, we choose the CouchDB NoSQL document database and develop an access control mechanism that works at a fain-grained level. The proposed mechanism uses role-based access control of CouchDB and restricts read access to work at the document level. The experiment shows that our mechanism effectively works at the document level in CouchDB with good execution time.

A Study on Constructing Approach of Enterprise Document Management Architecture in Semiconductor Business (반도체 산업에서의 Enterprise Document Management Architecture 구현에 관한 연구)

  • 장현성;이영중;송하석;한영준;안정삼
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2001.10a
    • /
    • pp.11-14
    • /
    • 2001
  • A systematic construction and re-use of technology related to the product development and production has been the most important for the semiconductor industry dependent on process and equipment. Therefore, numerous outputs in the form of paper has been produced in the process of information management ranging from the creation to recycling and disposal of technologies. In this research, the technology and documents necessary for the business management in the field of semiconductor manufacturing were classified in an effort to solve problems while the modeling of document management architecture at the enterprise level was performed by properly setting up the security system to prevent the unauthorized disclosure of the product development technology to the third parties. Especially, the product and process specification are designed in such a way as to ensure a real-time response in interface with the production system in order to shorten the development lead-time and improve the productivity. This paper is to discuss the modeling approach, the strategy to construct the system and its results.

  • PDF

An Ontology for a Content-Based Expert System Document Categorization (내용기반 문서분류 전문가시스템을 위한 온톨로지 연구)

  • Seo, Lai-Won
    • The Journal of Engineering Research
    • /
    • v.3 no.1
    • /
    • pp.47-56
    • /
    • 1998
  • This is a study on an ontology development for a content-based Expert System Document Categorization. The objectives of this study were to set up the concept of ontology and to find out the effect of ontology on Expert System. Based on this concept of ontology, it found noems for development of Expert System in the field of fine arts and showed Ontology hierarchical categorization.

  • PDF