• Title/Summary/Keyword: Document of Title

Search Result 76, Processing Time 0.02 seconds

Automatic Title Detection by Spatial Feature and Projection Profile for Document Images (공간 정보와 투영 프로파일을 이용한 문서 영상에서의 타이틀 영역 추출)

  • Park, Hyo-Jin;Kim, Bo-Ram;Kim, Wook-Hyun
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.3
    • /
    • pp.209-214
    • /
    • 2010
  • This paper proposes an algorithm of segmentation and title detection for document image. The automated title detection method that we have developed is composed of two phases, segmentation and title area detection. In the first phase, we extract and segment the document image. To perform this operation, the binary map is segmented by combination of morphological operation and CCA(connected component algorithm). The first phase provides segmented regions that would be detected as title area for the second stage. Candidate title areas are detected using geometric information, then we can extract the title region that is performed by removing non-title regions. After classification step that removes non-text regions, projection is performed to detect a title region. From the fact that usually the largest font is used for the title in the document, horizontal projection is performed within text areas. In this paper, we proposed a method of segmentation and title detection for various forms of document images using geometric features and projection profile analysis. The proposed system is expected to have various applications, such as document title recognition, multimedia data searching, real-time image processing and so on.

A Study on the Some Points for Practical Attention of Transport Documents in the UCP 600 (UCP 600 운송서류 규정의 실무상의 유의점에 관한 연구)

  • Park, Suk-Jae
    • THE INTERNATIONAL COMMERCE & LAW REVIEW
    • /
    • v.43
    • /
    • pp.101-115
    • /
    • 2009
  • More than two years have passed since the latest UCP, Uniform Customs and Practice for Documentary Credits, 2007 Revision, ICC Publication No.600 became valid on July 1, 2007. There has been a lot of confusions in connection with the interpretation of the new UCP since July 1, 2007. Especially the transport document articles of the new UCP are the most confusing ones. Therefore, this work intends to study some points for practical attention of transport documents in the UCP 600. Transport documents can be divided into two kinds of documents such as marine transport documents and other transport documents. But most sellers and buyers distinguish two kinds of transport documents : the document of title and the document of non-title. Most traders consider the document of title importantly and ignore the document of non-title.

  • PDF

A Study on the Establishment of Master Plan on the Operation of the Electronic Bill of Lading in Korea (한국에서의 전자선화증권 운용을 위한 종합계획수립에 관한 연구)

  • Choi, Seok-Beom;Shen, Reng-Guang
    • International Commerce and Information Review
    • /
    • v.6 no.1
    • /
    • pp.129-153
    • /
    • 2004
  • In order to succeed in e-Trade Project in Korea, the related parties endeavor to introduce the e-title registry model such as Bolero model(Title Registry Model) and TEDI model (Repository Service Provider Model). Prior to mentioning the Model of e-Title Registry, desirable Global e-Trade Platform is the platform connecting Global e-Trade Platforms in each country and Bolero's Core Messaging Platform and e-Title Registry must be established in each country. Each e-Title Registry must be connected with another registry. Like Korea, Recognized Electronic Document Repository must be established as proxy for preserving, attesting, carrying out sending and receiving electronic document to prevent from disputes between parties and to carry out notarizing electronic documents and related transaction. The purpose of this paper is to promote the introduction of e-Title Registry in the e-Trade and to streamline the procedures of the electronic bill of lading.

  • PDF

Cross-Lingual Style-Based Title Generation Using Multiple Adapters (다중 어댑터를 이용한 교차 언어 및 스타일 기반의 제목 생성)

  • Yo-Han Park;Yong-Seok Choi;Kong Joo Lee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.8
    • /
    • pp.341-354
    • /
    • 2023
  • The title of a document is the brief summarization of the document. Readers can easily understand a document if we provide them with its title in their preferred styles and the languages. In this research, we propose a cross-lingual and style-based title generation model using multiple adapters. To train the model, we need a parallel corpus in several languages with different styles. It is quite difficult to construct this kind of parallel corpus; however, a monolingual title generation corpus of the same style can be built easily. Therefore, we apply a zero-shot strategy to generate a title in a different language and with a different style for an input document. A baseline model is Transformer consisting of an encoder and a decoder, pre-trained by several languages. The model is then equipped with multiple adapters for translation, languages, and styles. After the model learns a translation task from parallel corpus, it learns a title generation task from monolingual title generation corpus. When training the model with a task, we only activate an adapter that corresponds to the task. When generating a cross-lingual and style-based title, we only activate adapters that correspond to a target language and a target style. An experimental result shows that our proposed model is only as good as a pipeline model that first translates into a target language and then generates a title. There have been significant changes in natural language generation due to the emergence of large-scale language models. However, research to improve the performance of natural language generation using limited resources and limited data needs to continue. In this regard, this study seeks to explore the significance of such research.

Method of Related Document Recommendation with Similarity and Weight of Keyword (키워드의 유사도와 가중치를 적용한 연관 문서 추천 방법)

  • Lim, Myung Jin;Kim, Jae Hyun;Shin, Ju Hyun
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.11
    • /
    • pp.1313-1323
    • /
    • 2019
  • With the development of the Internet and the increase of smart phones, various services considering user convenience are increasing, so that users can check news in real time anytime and anywhere. However, online news is categorized by media and category, and it provides only a few related search terms, making it difficult to find related news related to keywords. In order to solve this problem, we propose a method to recommend related documents more accurately by applying Doc2Vec similarity to the specific keywords of news articles and weighting the title and contents of news articles. We collect news articles from Naver politics category by web crawling in Java environment, preprocess them, extract topics using LDA modeling, and find similarities using Doc2Vec. To supplement Doc2Vec, we apply TF-IDF to obtain TC(Title Contents) weights for the title and contents of news articles. Then we combine Doc2Vec similarity and TC weight to generate TC weight-similarity and evaluate the similarity between words using PMI technique to confirm the keyword association.

A Study on the application of International Transport Law to electronic bill of lading (전자식(電子式) 선하증권(船荷證券)과 국제운송규칙(國際運送規則))

  • Yang, Jung-Ho
    • THE INTERNATIONAL COMMERCE & LAW REVIEW
    • /
    • v.20
    • /
    • pp.369-385
    • /
    • 2003
  • Contracts of carriage evidenced by bill of lading which are made between carrier and unidentified number of the shipper are to a large extent regulated by statute law such as Hague-Visby Rules and Hamburg Rules. These rules qualifies the contractual liberty of parties and especially restrains the carrier from introducing exemption from his liability beyond those admitted by the Rules. However, these Rules are applied only to goods in respect of which a bill of lading or similar document of title has been issued. In this reason, it is possible that liability of carrier in respect of goods shipped could become an issue where electronic bill of lading is used instead of paper bill of lading because electronic bill of lading is not generally recognised document of title in existing rule. Thus, this article discuss the relation between the carrier who create electronic bill of lading and the Rules regulating liability of carrier. Also, new Rules which has been examining in UNCITRAL will be introduced.

  • PDF

Automatic Document Title Generation with RNN and Reinforcement Learning (RNN과 강화 학습을 이용한 자동 문서 제목 생성)

  • Cho, Sung-Min;Kim, Wooseng
    • Journal of Information Technology Applications and Management
    • /
    • v.27 no.1
    • /
    • pp.49-58
    • /
    • 2020
  • Lately, a large amount of textual data have been poured out of the Internet and the technology to refine them is needed. Most of these data are long text and often have no title. Therefore, in this paper, we propose a technique to combine the sequence-to-sequence model of RNN and the REINFORCE algorithm to generate the title of the long text automatically. In addition, the TextRank algorithm was applied to extract a summarized text to minimize information loss in order to protect the shortcomings of the sequence-to-sequence model in which an information is lost when long texts are used. Through the experiment, the techniques proposed in this study are shown to be superior to the existing ones.

Document Classification Model Using Web Documents for Balancing Training Corpus Size per Category

  • Park, So-Young;Chang, Juno;Kihl, Taesuk
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.4
    • /
    • pp.268-273
    • /
    • 2013
  • In this paper, we propose a document classification model using Web documents as a part of the training corpus in order to resolve the imbalance of the training corpus size per category. For the purpose of retrieving the Web documents closely related to each category, the proposed document classification model calculates the matching score between word features and each category, and generates a Web search query by combining the higher-ranked word features and the category title. Then, the proposed document classification model sends each combined query to the open application programming interface of the Web search engine, and receives the snippet results retrieved from the Web search engine. Finally, the proposed document classification model adds these snippet results as Web documents to the training corpus. Experimental results show that the method that considers the balance of the training corpus size per category exhibits better performance in some categories with small training sets.

An Innovative Approach of Bangla Text Summarization by Introducing Pronoun Replacement and Improved Sentence Ranking

  • Haque, Md. Majharul;Pervin, Suraiya;Begum, Zerina
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.752-777
    • /
    • 2017
  • This paper proposes an automatic method to summarize Bangla news document. In the proposed approach, pronoun replacement is accomplished for the first time to minimize the dangling pronoun from summary. After replacing pronoun, sentences are ranked using term frequency, sentence frequency, numerical figures and title words. If two sentences have at least 60% cosine similarity, the frequency of the larger sentence is increased, and the smaller sentence is removed to eliminate redundancy. Moreover, the first sentence is included in summary always if it contains any title word. In Bangla text, numerical figures can be presented both in words and digits with a variety of forms. All these forms are identified to assess the importance of sentences. We have used the rule-based system in this approach with hidden Markov model and Markov chain model. To explore the rules, we have analyzed 3,000 Bangla news documents and studied some Bangla grammar books. A series of experiments are performed on 200 Bangla news documents and 600 summaries (3 summaries are for each document). The evaluation results demonstrate the effectiveness of the proposed technique over the four latest methods.