• Title/Summary/Keyword: Schema graph

Search Result 50, Processing Time 0.027 seconds

An XML Data Management System and Its Application to Genome Databases (XML 데이타 관리시스템과 유전체 데이타베이스에의 응용)

  • 이경희;김태경;김선신;이충세;조완섭
    • Journal of KIISE:Databases
    • /
    • v.31 no.4
    • /
    • pp.432-443
    • /
    • 2004
  • As the XML data has been widely used in the Internet, it is necessary to store and retrieve the XML data by using DBMSs. However, relational DBMSs suffer from the model difference between graph structure of the XML data and table forms in relational databases. We propose an ORDBMS-based DTD-dependent XML data management system Xing. Xing stores XML data in a DTD-dependent form in an object database. Since the object database schema has a graph structure and supports multi-valued attributes, mapping from an XML data model and queries into an object data model and OQLs is a simple problem. For rapid storing of large quantities of the XML data, we use SAX parser with customized Xing-tree which requires a small memory space compared with the DOM-tree. Xing also returns the query result in an XML document form. We have implemented the Xing system on top of UniSQL object-relational DBMS for the validity checking and performance comparison. For XML genome data from GenBank, and experimental evaluation shows that Xing can provide significant performance improvement (maximum 10 times) compared with the relational approach.

Ontology Knowledge based Information Retrieval for User Query Interpretation (사용자 질의 의미 해석을 위한 온톨로지 지식 기반 검색)

  • Kim, Nanju;Pyo, Hyejin;Jeong, Hoon;Choi, Euiin
    • Journal of Digital Convergence
    • /
    • v.12 no.6
    • /
    • pp.245-252
    • /
    • 2014
  • Semantic search promises to provide more accurate result than present-day keyword matching-based search by using the knowledge base represented logically. But, the ordinary users don't know well the complex formal query language and schema of the knowledge base. So, the system should interpret the meaning of user's keywords. In this paper, we describe a user query interpretation system for the semantic retrieval of multimedia contents. Our system is ontological knowledge base-driven in the sense that the interpretation process is integrated into a unified structure around a knowledge base, which is built on domain ontologies.

Design of Visual Object-Oriented Database Query Language and Implementation of the Query Processor (시각적 객체지향 데이터베이스 질의어의 설계 및 질의처리기의 구현)

  • Lee, Suk-Kyoon;Nah, Yun-Mook;Suh, Yong-Moo
    • Asia pacific journal of information systems
    • /
    • v.11 no.2
    • /
    • pp.121-139
    • /
    • 2001
  • VOQL* query language, recently proposed, is a visual language for object-oriented databases. It is based on Ven Diagram and graph, so that the underlying schema structure can be naturally implied in query expressions. In VOQL*, structural relationship among the objects used in a query expression is represented graphically and thus it has formal semantics that can be inductively defined, as well as it can be used with ease. In this paper, we proposed revised VOQL* and introduced its query processor, InQs(Intelligent Querying System). While retaining the merit of VOQL* that it allows the structural relationship among the objects to be represented visually, the revised VOQL* has another merit that users can formulate a query interactively using various forms supplied by InQs. As a query processor that translates queries in revised VOQL into those in ODMG OQL, InQs provides an environment in which users express queries in revised VOQL* and then the system automatically translates them into those in ODMG OQL. Translation algorithm of InQs is much simpler and intuitive than other algorithms used in QUIVER and other systems, since it reflects the formal semantics of VOQL*, which is defined inductively.

  • PDF

Simulation Application for Functional Electrical Stimulator (기능적 전기 자극 시뮬레이션 응용프로그램)

  • Jeon, Hyo Chan
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.10 no.1
    • /
    • pp.59-64
    • /
    • 2016
  • In this study, the application was developed for the simulation of functional electrical stimulation. It functions to calculate the electrical energy which is transmitted to the patient, to visualize the electrical stimulation waveform, Therapy, Burst and Pulse section as setup of time period requested. The application was verified by comparing the oscilloscope and the graph of the application. XML schema was developed to utilize the contents of simulation which consist of the standard codes that are identified by OID. Using the application, medical experts will be able to research and share the contents of simulation.

The Query Optimization Techniques for XML Data using DTDs (DTD를 이용한 XML 데이타에 대한 질의 최적화 기법)

  • Chung, Tae-Sun;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.723-731
    • /
    • 2001
  • As XML has become and emerging standard for information exchange on the World Wide Web it has gained attention in database communities of extract information from XML seen as a database model. Data in XML can be mapped to semistructured dta model based on edge-labeled graph and queries can be processed against it Here we propose new query optimization techniques using DTDs(Document Type Definitions) which have the schema information about XML data. Our techniques reduce traditional index techniques Also, as they preserve source database structure, they can process many kinds of complex queries. we implemented our techniques and provided preliminary performance results.

  • PDF

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

A Ranking Algorithm for Semantic Web Resources: A Class-oriented Approach (시맨틱 웹 자원의 랭킹을 위한 알고리즘: 클래스중심 접근방법)

  • Rho, Sang-Kyu;Park, Hyun-Jung;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.17 no.4
    • /
    • pp.31-59
    • /
    • 2007
  • We frequently use search engines to find relevant information in the Web but still end up with too much information. In order to solve this problem of information overload, ranking algorithms have been applied to various domains. As more information will be available in the future, effectively and efficiently ranking search results will become more critical. In this paper, we propose a ranking algorithm for the Semantic Web resources, specifically RDF resources. Traditionally, the importance of a particular Web page is estimated based on the number of key words found in the page, which is subject to manipulation. In contrast, link analysis methods such as Google's PageRank capitalize on the information which is inherent in the link structure of the Web graph. PageRank considers a certain page highly important if it is referred to by many other pages. The degree of the importance also increases if the importance of the referring pages is high. Kleinberg's algorithm is another link-structure based ranking algorithm for Web pages. Unlike PageRank, Kleinberg's algorithm utilizes two kinds of scores: the authority score and the hub score. If a page has a high authority score, it is an authority on a given topic and many pages refer to it. A page with a high hub score links to many authoritative pages. As mentioned above, the link-structure based ranking method has been playing an essential role in World Wide Web(WWW), and nowadays, many people recognize the effectiveness and efficiency of it. On the other hand, as Resource Description Framework(RDF) data model forms the foundation of the Semantic Web, any information in the Semantic Web can be expressed with RDF graph, making the ranking algorithm for RDF knowledge bases greatly important. The RDF graph consists of nodes and directional links similar to the Web graph. As a result, the link-structure based ranking method seems to be highly applicable to ranking the Semantic Web resources. However, the information space of the Semantic Web is more complex than that of WWW. For instance, WWW can be considered as one huge class, i.e., a collection of Web pages, which has only a recursive property, i.e., a 'refers to' property corresponding to the hyperlinks. However, the Semantic Web encompasses various kinds of classes and properties, and consequently, ranking methods used in WWW should be modified to reflect the complexity of the information space in the Semantic Web. Previous research addressed the ranking problem of query results retrieved from RDF knowledge bases. Mukherjea and Bamba modified Kleinberg's algorithm in order to apply their algorithm to rank the Semantic Web resources. They defined the objectivity score and the subjectivity score of a resource, which correspond to the authority score and the hub score of Kleinberg's, respectively. They concentrated on the diversity of properties and introduced property weights to control the influence of a resource on another resource depending on the characteristic of the property linking the two resources. A node with a high objectivity score becomes the object of many RDF triples, and a node with a high subjectivity score becomes the subject of many RDF triples. They developed several kinds of Semantic Web systems in order to validate their technique and showed some experimental results verifying the applicability of their method to the Semantic Web. Despite their efforts, however, there remained some limitations which they reported in their paper. First, their algorithm is useful only when a Semantic Web system represents most of the knowledge pertaining to a certain domain. In other words, the ratio of links to nodes should be high, or overall resources should be described in detail, to a certain degree for their algorithm to properly work. Second, a Tightly-Knit Community(TKC) effect, the phenomenon that pages which are less important but yet densely connected have higher scores than the ones that are more important but sparsely connected, remains as problematic. Third, a resource may have a high score, not because it is actually important, but simply because it is very common and as a consequence it has many links pointing to it. In this paper, we examine such ranking problems from a novel perspective and propose a new algorithm which can solve the problems under the previous studies. Our proposed method is based on a class-oriented approach. In contrast to the predicate-oriented approach entertained by the previous research, a user, under our approach, determines the weights of a property by comparing its relative significance to the other properties when evaluating the importance of resources in a specific class. This approach stems from the idea that most queries are supposed to find resources belonging to the same class in the Semantic Web, which consists of many heterogeneous classes in RDF Schema. This approach closely reflects the way that people, in the real world, evaluate something, and will turn out to be superior to the predicate-oriented approach for the Semantic Web. Our proposed algorithm can resolve the TKC(Tightly Knit Community) effect, and further can shed lights on other limitations posed by the previous research. In addition, we propose two ways to incorporate data-type properties which have not been employed even in the case when they have some significance on the resource importance. We designed an experiment to show the effectiveness of our proposed algorithm and the validity of ranking results, which was not tried ever in previous research. We also conducted a comprehensive mathematical analysis, which was overlooked in previous research. The mathematical analysis enabled us to simplify the calculation procedure. Finally, we summarize our experimental results and discuss further research issues.

A Linkage between IndoorGML and CityGML using External Reference (외부참조를 통한 IndoorGML과 CityGML의 결합)

  • Kim, Joon-Seok;Yoo, Sung-Jae;Li, Ki-Joune
    • Spatial Information Research
    • /
    • v.22 no.1
    • /
    • pp.65-73
    • /
    • 2014
  • Recently indoor navigation with indoor map such as Indoor Google Maps is served. For the services, constructing indoor data are required. CityGML and IFC are widely used as standards for representing indoor data. The data models contains spatial information for the indoor visualization and analysis, but indoor navigation requires semantic and topological information like graph as well as geometry. For this reason, IndoorGML, which is a GML3 application schema and data model for representation, storage and exchange of indoor geoinformation, is under standardization of OGC. IndoorGML can directly describe geometric property and refer elements in external documents. Because a lot of data in CityGML or IFC have been constructed, a huge amount of construction time and cost for IndoorGML data will be reduced if CityGML can help generate data in IndoorGML. Thus, this paper suggest practical use of CityGML including deriving from and link to CityGML. We analyze relationships between IndoorGML and CityGML. In this paper, issues and solutions for linkage of IndoorGML and CityGML are addressed.

An XML based Mobile Information Visualization System for Mobile Devices using Information layout Techniques (Rectangle Layout을 이용한 XML 기반 모바일 정보 시각화 시스템)

  • Yoo Hee-Yong;Cheon Suh-Hyun
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.9
    • /
    • pp.776-786
    • /
    • 2006
  • This paper proposes XML based mobile information visualization system using rectangle layout to show effectively XML based information to user on mobile devices which do not have a rich display feature. We define XML schema that can describes information in graph with cycle as well as information in tree form. It suggests using rectangle layout method that is an improvement of the traditional method of the radial layout because the specificity of the mobile display should be considered when XML information is rendered on the screen. And then, it applies DOI of fisheye view algorithm to information on the rectangle layout to represent all and user interest information. And it also suggests an effective method considering capability of mobile devices to decrease user's confusion and improve awareness of user when a user Interest target selected. The proposed information visualization system in the form of focus+context supports an effective interface for information retrieval via mobile devices, such as PDA, cellular phone and smart phone, that usually have less power of CPU than that of PC and the constraints of display and memory. In this paper, it performs experiments and makes an evaluation comparing information visualization method between the traditional radial layout and the proposed rectangle layout.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.