• Title/Summary/Keyword: XML Data Index

Search Result 58, Processing Time 0.021 seconds

A Study on the DB-IR Integration: Per-Document Basis Online Index Maintenance

  • Jin, Du-Seok;Jung, Hoe-Kyung
    • Journal of information and communication convergence engineering
    • /
    • v.7 no.3
    • /
    • pp.275-280
    • /
    • 2009
  • While database(DB) and information retrieval(IR) have been developed independently, there have been emerging requirements that both data management and efficient text retrieval should be supported simultaneously in an information system such as health care, customer support, XML data management, and digital libraries. The great divide between DB and IR has caused different manners in index maintenance for newly arriving documents. While DB has extended its SQL layer to cope with text fields due to lack of intact mechanism to build IR-like index, IR usually treats a block of new documents as a logical unit of index maintenance since it has no concept of integrity constraint. However, In the DB-IR integrations, a transaction on adding or updating a document should include maintenance of the posting lists accompanied by the document. Although DB-IR integration has been budded in the research filed, the issue will remain difficult and rewarding areas for a while. One of the primary reasons is lack of efficient online transactional index maintenance. In this paper, performance of a few strategies for per-document basis transactional index maintenance - direct index update, pulsing auxiliary index and posting segmentation index - will be evaluated. The result shows that the pulsing auxiliary strategy and posting segmentation indexing scheme, can be a challenging candidates for text field indexing in DB-IR integration.

DISSECTION TECHNIQUE FOR EFFICIENT JOIN OPERATION ON SEMI-STRUCTURED DOCUMENT STREAM

  • Seo, Dong-Hyeok;Lee, Dong-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.11-13
    • /
    • 2007
  • There has been much interest in stream query processing. Various index techniques and advanced join techniques have been proposed to efficiently process data stream queries. Previous proposals support rapid and advanced response to the data stream queries. However, the amount of data stream is increasing and the data stream query processing needs more speedup than before. In this paper, we proposed novel query processing techniques for large number of incoming documents stream. We proposed Dissection Technique for efficient query processing in the data stream environment. We focused on the dissection technique in join query processing. Our technique shows efficient operation performance comparing with the other proposal in the data stream. Proposed technique is applied to the sensor network system and XML database.

  • PDF

A Study of Path-based Retrieval for JSON Data Using Suffix Arrays (접미사 배열을 이용한 JSON 데이터의 경로 기반 검색에 대한 연구)

  • Kim, Sung Wan
    • Journal of Creative Information Culture
    • /
    • v.7 no.3
    • /
    • pp.157-165
    • /
    • 2021
  • As the use of various application services utilizing Web and IoT and the need for large amounts of data management expand accordingly, the importance of efficient data expression and exchange scheme and data query processing is increasing. JSON, characterized by its simplicity, is being used in various fields as a format for data exchange and data storage instead of XML, which is a standard data expression and exchange language on the Web. This means that it is important to develop indexing and query processing techniques to effectively access and search large amounts of data expressed in JSON. Therefore, in this paper, we modeled JSON data with a hierarchical structure in a tree form, and proposed indexing and query processing using the path concept. In particular, we designed an index structure using a suffix array widely used in text search and introduced simple and complex path-based JSON data query processing methods.

TV-Anytime Metadata Management System based on a Set-Top Box for Digital Broadcasting (디지털 방송을 위한 Set-Top Box기반 TV-Anytime 메타데이터 관리 시스템)

  • Park, Jong-Hyun;Kang, Ji-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.4
    • /
    • pp.71-78
    • /
    • 2008
  • Digital Broadcasting serves a variety of broadcasting services for satisfying the requirement of customers. One of main factors for new broadcasting environment is interoperability between providers and consumers. For this interoperability, metadata standards are proposed for the digital broadcasting and TV-Anytime metadata is one of these standards. On the one hand, there are some researches for efficiently managing the broadcasting metadata on Set-Top Box. This paper proposes the metadata management system for efficiently managing the broadcasting metadata based on the STB which is low-cost and low-setting. Our system consists of a storage engine to store the metadata and an XQuery engine to search the stored metadata and uses special index for storing and searching. We expect that our system will keep the interoperability amongst a variety of applications for broadcasting because we adopts the XQuery for searching the metadata and the XQuery is a standard language for searching XML data.

  • PDF

Resource Scheduling Framework based on Resource Parameter Graph (자원인자 기반 스케줄링 프레임워크)

  • 배재환;권성호;김덕수;이강우
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.8 no.3
    • /
    • pp.19-31
    • /
    • 2003
  • For the implementation of large scale GRID systems, the performance scalability in resource scheduling is clearly to be addressed. In this research, we analyzed existing scheduling frameworks from the viewpoint of the performance and propose a novel resource scheduling framework called resource parameter based scheduling. Proposed scheduling framework consists of three components. The first is the resource parameter graph that expresses resource information via inter-resource relation and the composition base on the hierarchical structure. The second component is the resource parameter tree to be used for the implementation of the memory-based index of resource information. The third component is the resource information repository which mostly consists of static data to be used for the general resource information services. This paper presents the details of the framework.

  • PDF

Digital Humanities, and Applications of the "Successful Exam Passers List" (과거 합격자 시맨틱 데이터베이스를 활용한 디지털 인문학 연구)

  • LEE, JAE OK
    • (The)Study of the Eastern Classic
    • /
    • no.70
    • /
    • pp.303-345
    • /
    • 2018
  • In this article, how the Bangmok(榜目) documents, which are essentially lists of successful passers for the civil competitive examination system of the $Chos{\breve{o}}n$ dynasty, when rendered into digitalized formats, could serve as source of information, which would not only lets us know the $Chos{\breve{o}}n$ individuals' social backgrounds and bloodlines but also enables us to understand the intricate nature that the Yangban network had, will be discussed. In digitalized humanity studies, the Bangmok materials, literally a list of leading elites of the $Chos{\breve{o}}n$ period, constitute a very interesting and important source of information. Based upon these materials, we can see how the society -as well as the Yangban community- was like. Currently, all data inside these Bangmok lists are rendered in XML(eXtensible Makrup Language) format and are being served through DBMS(Database Management System), so anyone who would want to examine the statistics could freely do so. Also, by connecting the data in these Bangmok materials with data from genealogy records, we could identify an individual's marital relationship, home town, and political affiliation, and therefore create a complex narrative that would be effective in describing that individual's life in particular. This is a graphic database, which shows-when Bangmok data is punched in-successful passers as individual nodes, and displays blood and marital relations in a very visible way. Clicking upon the nodes would provide you with access to all kinds of relationships formed among more than 90 thousand successful passers, and even the overall marital network, once the genealogical data is input. In Korea, since 2005 and through now, the task of digitalizing data from the Civil exam Bangmok(Mun-gwa Bangmok), Military exam Bangmok (Mu-gwa Bangmok), the "Sa-ma" Bangmok and "Jab-gwa" Bangmok materials, has been completed. They can be accessed through a website(http://people.aks.ac.kr/index.aks) which has information on numerous famous past Korean individuals. With this kind of source of information, we are now able to extract professional Jung-in figures from these lists. However, meaningful and practical studies using this data are yet to be announced. This article would like to remind everyone that this information should be used as a window through which we could see not only the lives of individuals, but also the society.

A study on the improving and constructing the content for the Sijo database in the Period of Modern Enlightenment (계몽기·근대시조 DB의 개선 및 콘텐츠화 방안 연구)

  • Chang, Chung-Soo
    • Sijohaknonchong
    • /
    • v.44
    • /
    • pp.105-138
    • /
    • 2016
  • Recently with the research function, "XML Digital collection of Sijo Texts in the Period of Modern Enlightenment" DB data is being provided through the Korean Research Memory (http://www.krm.or.kr) and the foundation for the constructing the contents of Sijo Texts in the Period of Modern Enlightenment has been laid. In this paper, by reviewing the characteristics and problems of Digital collection of Sijo Texts in the Period of Modern Enlightenment and searching for the improvement, I tried to find a way to make it into the content. This database has the primary meaning in the integrating and glancing at the vast amounts of Sijo in the Period of Modern Enlightenment to reaching 12,500 pieces. In addition, it is the first Sijo data base which is provide the variety of search features according to literature, name of poet, title of work, original text, per period, and etc. However, this database has the limits to verifying the overall aspects of the Sijo in the Period of Modern Enlightenment. The title and original text, which is written in the archaic word or Chinese character, could not be searched, because the standard type text of modern language is not formatted. And also the works and the individual Sijo works released after 1945 were missing in the database. It is inconvenient to extract the datum according to the poet, because poets are marked in the various ways such as one's real name, nom de plume and etc. To solve this kind of problems and improve the utilization of the database, I proposed the providing the standard type text of modern language, giving the index terms about content, providing the information on the work format and etc. Furthermore, if the Sijo database in the Period of Modern Enlightenment which is prepared the character of the Sijo Culture Information System could be built, it could be connected with the academic, educational contents. For the specific plan, I suggested as follow, - learning support materials for the Modern history and the national territory recognition on the Modern Age - source materials for studying indigenous animals and plants characters creating the commercial characters - applicability as the Sijo learning tool such as Sijo Game.

  • PDF

Development of a Metamodel-Based Healthcare Service System using OSGi Component Platform (OSGi 컴포넌트 플랫폼을 이용한 메타모델 기반의 건강관리 서비스 시스템 개발)

  • Kim, Tae-Woong;Kim, Hee-Cheol
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.1
    • /
    • pp.121-132
    • /
    • 2011
  • A healthcare system is a type of medical information system that performs early detection and prevention in diseases by checking one's health condition periodically. Such a healthcare system is based on the signal obtained from the body. However, the developed existing system represents certain differences in the storage and description of vital signs according to medicare devices and the evaluation method of the system. It brings some disadvantages, such as lacks in the interoperability between systems, increases in the development cost of systems, and absence of a unified system. Thus, this study develops a healthcare system based on a meta model. For establishing this objective, this study describes and stores vital sign data based on the standard meta model of HL7 and applies OCL, which is a mathematical specification language, for defining wellness indexes and extracting data in order to evaluate health risk appraisals in health. In addition, this study implements components based on OSGi and assemble them in order to easily extend various devices and systems. By describing vital data based on the meta model, it represents some advantages that it makes possible to ensure the interoperability between systems and introduce the standardization of the evaluation method of health conditions through defining the wellness index using OCL. Also, it provides dear specifications.