• Title/Summary/Keyword: Document Expansion

Search Result 94, Processing Time 0.025 seconds

Improving Classification Accuracy in Hierarchical Trees via Greedy Node Expansion

  • Byungjin Lim;Jong Wook Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.6
    • /
    • pp.113-120
    • /
    • 2024
  • With the advancement of information and communication technology, we can easily generate various forms of data in our daily lives. To efficiently manage such a large amount of data, systematic classification into categories is essential. For effective search and navigation, data is organized into a tree-like hierarchical structure known as a category tree, which is commonly seen in news websites and Wikipedia. As a result, various techniques have been proposed to classify large volumes of documents into the terminal nodes of category trees. However, document classification methods using category trees face a problem: as the height of the tree increases, the number of terminal nodes multiplies exponentially, which increases the probability of misclassification and ultimately leads to a reduction in classification accuracy. Therefore, in this paper, we propose a new node expansion-based classification algorithm that satisfies the classification accuracy required by the application, while enabling detailed categorization. The proposed method uses a greedy approach to prioritize the expansion of nodes with high classification accuracy, thereby maximizing the overall classification accuracy of the category tree. Experimental results on real data show that the proposed technique provides improved performance over naive methods.

A Study of Implementation of Defense Configuration Management System based on PLM (PLM 기반의 국방 형상관리 정보체계 구축 사례연구)

  • Lim, Chae-O
    • Korean Journal of Computational Design and Engineering
    • /
    • v.13 no.4
    • /
    • pp.305-313
    • /
    • 2008
  • A configuration management system was implemented by applying PLM to the defense field. The PLM system has recently been incorporated in a wide range of industries, and it has allowed for improvements in work productivity and expansion of related services by comprehensively managing and securing connection regarding configuration information in the defense field. Implementations include acquisition of configuration related information and reinforcement of BOM-oriented configuration management function, securing compatibility among 3D drawings of different agencies, improvement of drawing and document management functions, comprehensive systematic configuration management focused on product structure, strengthened configuration control functions, a management system according to the work flow and life cycle functions, an integrated configuration management system of 3D model CAD resources and an enhanced management system. This paper covers a case study reviewing the implementation of a PLM-based configuration management information system and its results, so that the information can be made available to other agencies and companies seeking to apply PLM in their organizations.

Local Knowledge on Trees Utilization and Their Existing Threats in Rashad District of Nuba Mountains, Sudan

  • Adam, Yahia Omar
    • Journal of Forest and Environmental Science
    • /
    • v.30 no.4
    • /
    • pp.342-350
    • /
    • 2014
  • Rural people of Sudan are endowed with a deep knowledge concerning the utilization of different tree species. However research on the local knowledge related to tree species utilization still lacks adequate attention. The study objectives were to identify the existing local knowledge related to the utilization of the tree species and the existing threats to the availability of the trees. A total of 300 respondents were selected randomly from Rashad district in Nuba Mountains in 2011. Semi-structured interview, direct observation, group discussion, preference ranking and direct matrix ranking were used to collect the data. The study results revealed that people of Nuba Mountains utilize different tree species for food, medicinal purposes, fodder, firewood, construction and cultural ceremonies. The study results also indicated that the availability of trees is negatively influenced by firewood collection, agricultural expansion, drought, overgrazing and charcoal production. The study concluded that local knowledge has crucial role in tree species utilization in Nuba Mountains. Further researches to document and substantiate the local knowledge on useful tree species are highly recommended.

An Expansion of Vector Space for Document Classifications (문서 분류에 이용 가능한 벡터 공간의 확장 방법)

  • Lee, Samuel Sangkon;Yoo, Kyungseok
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.04a
    • /
    • pp.782-784
    • /
    • 2015
  • 본 논문에서는 한국어 문서의 분류 정밀도 향상을 위해 애매어와 해소어 정보를 이용한 확장된 벡터 공간 모델을 제안하였다. 벡터 공간 모델에 사용된 벡터는 같은 정도의 가중치를 갖는 축이 하나 더 존재하지만, 기존의 방법은 그 축에 아무런 처리가 이루어지지 않았기 때문에 벡터끼리의 비교를 할 때 문제가 발생한다. 같은 가중치를 갖는 축이 되는 단어를 애매어라 정의하고, 단어와 분야 사이의 상호정보량을 계산하여 애매어를 결정하였다. 애매어에 의해 애매성을 해소하는 단어를 해소어라 정의하고, 애매어와 동일한 문서에서 출현하는 단어 중에서 상호정보량을 계산하여 해소어의 세기를 결정하였다. 본 논문에서는 애매어와 해소어를 이용하여 벡터의 차원을 확장하여 문서 분류의 정밀도를 향상시키는 방법을 제안하였다.

The Establishment and Change of Busan Ami-dong Crematorium in Japanese Colonial Period (일제강점기 부산 아미동 화장장의 설립과 변천)

  • Song, Hye-Young
    • Journal of the Architectural Institute of Korea Planning & Design
    • /
    • v.34 no.5
    • /
    • pp.89-96
    • /
    • 2018
  • Ami-dong Crematorium in Busan was established as one of the public facilities in 1929(the period of Japanese Occupation). It is the originator of Busan Yeongnak-Park(永樂公園), the funeral facilities of Busan municipality. The crematorium of Busan region was accepted at an earlier stage inside Japanese Concession in accordance with the opening a port. As Ami-dong Crematorium was constructed as a public facilities, the precedent has been maintained so far, providing a background equipped with the leading public corporation facilities in Busan area. This study was based on the expansion construction document founded by National Archives in Korea. Above all things, this research revealed the establishment and change of Busan Ami-dong Crematorium as the historical point for the formation process of recent public funeral facilities.

COVID19 Innate Immunity through Natural Medicine in Palau

  • Christopher U. Kitalong;Tmong Udui;Terepkul Ngiraingas;Pearl Marumoto;Victor Yano
    • Proceedings of the Plant Resources Society of Korea Conference
    • /
    • 2020.12a
    • /
    • pp.15-15
    • /
    • 2020
  • In an internal document, CORONA-VIRUS DISEASE 2019 (COVID-19) PLAN, release developed stated that "on January 22, 2020, Palau Ministry of Health activated its emergency operations center, and since then has prepared and put in place measures in response to this global pandemic." The actions eventually led to the closure of most flights coming into Palau as a method to protect its population. The population of is at high risk with COVID19 due to the very elevated rate of NCD's, as well as the limited access to proper testing and treatment facilities. Increased use of traditional medicines in the population has reduced the co-morbidities by reducing risk factors. Furthermore, the expansion of tradtional NCD therapies, especially that of DAK reduce pressure due to obesity and diabetes therefore allowing for unimpaired immune systems to combat deadly infectious diseases such as COVID19.

  • PDF

The Effect of Disclosure System through XBRL (XBRL이 전자공시 시스템에 미치는 영향)

  • Shin, Seung-Jung;Kim, Jung-Ihl;Lee, Tai-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.8 no.5
    • /
    • pp.229-234
    • /
    • 2008
  • XBRL is fundamentally defined as a structure that enterprises expand based on KGAAP 2.1 developed by the Korea branch of XBRL. Each enterprise selects its type of business and must check and choose the tag suitable to each enterprise. Since a complicated part such as expanding tag exists, there is a difficulty to prepare document through the step of tag expansion and data input. Although the expressive way of style provides the standards using the presentation structure and label structure provided fundamentally by XBRL, there is a complicated problem of using XBRL Processor. The electronic public disclosure system of the Financial Supervisory Service (DART) and Korea Exchange (KIND, KEDIS) use the Markup Language of SGML, XML and XBRL as format language. The procedure of defining document and process method vary in accordance with the characteristics of each language. This study analyzes the effect by step in accordance with format language of each electronic disclosure system and studies a direction of format language.

  • PDF

ER2XML: An Implementation of XML Schema Generator based on the Entity-Relationship Model (ER2XML :개체-관계 모델을 기반으로한 XML Schema 생성기의 구현)

  • Kim Chang Suk;Son Dong-Cheul
    • The KIPS Transactions:PartD
    • /
    • v.12D no.1 s.97
    • /
    • pp.1-12
    • /
    • 2005
  • The XML is emerging as standard language for data exchange on the Web. Therefore a demand of XML Schema(W3C MLL Schema Spec.) that verifies XML document becomes increasing. However, XML Schema has a weak point for design because of its complication despiteof various data and abundant expressiveness. This paper shows a simple way of design for XML Schema using a fundamental means for database design, the Entity-Relationship model. The conversion from the Entity-Relationship model to XML Schema can not be directly on account of discordance between the two models. So we present some algorithms to generate XML Schema from the Entity-Relationship model. The algorithms produce XML Schema codes using a hierarchical view representation. An important objective of this automatic generation is to preserve XML Schema's characteristics such as reusability, global and local ability, ability of expansion and various type changes.

Resampling Feedback Documents Using Overlapping Clusters (중첩 클러스터를 이용한 피드백 문서의 재샘플링 기법)

  • Lee, Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • v.16B no.3
    • /
    • pp.247-256
    • /
    • 2009
  • Typical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-based resampling method to select better pseudo-relevant documents based on the relevance model. The main idea is to use document clusters to find dominant documents for the initial retrieval set, and to repeatedly feed the documents to emphasize the core topics of a query. Experimental results on large-scale web TREC collections show significant improvements over the relevance model. For justification of the resampling approach, we examine relevance density of feedback documents. The resampling approach shows higher relevance density than the baseline relevance model on all collections, resulting in better retrieval accuracy in pseudo-relevance feedback. This result indicates that the proposed method is effective for pseudo-relevance feedback.

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.