• Title/Summary/Keyword: Graph Databases

Search Result 92, Processing Time 0.025 seconds

A Parameter-Free Approach for Clustering and Outlier Detection in Image Databases (이미지 데이터베이스에서 매개변수를 필요로 하지 않는 클러스터링 및 아웃라이어 검출 방법)

  • Oh, Hyun-Kyo;Yoon, Seok-Ho;Kim, Sang-Wook
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.1
    • /
    • pp.80-91
    • /
    • 2010
  • As the volume of image data increases dramatically, its good organization of image data is crucial for efficient image retrieval. Clustering is a typical way of organizing image data. However, traditional clustering methods have a difficulty of requiring a user to provide the number of clusters as a parameter before clustering. In this paper, we discuss an approach for clustering image data that does not require the parameter. Basically, the proposed approach is based on Cross-Association that finds a structure or patterns hidden in data using the relationship between individual objects. In order to apply Cross-Association to clustering of image data, we convert the image data into a graph first. Then, we perform Cross-Association on the graph thus obtained and interpret the results in the clustering perspective. We also propose the method of hierarchical clustering and the method of outlier detection based on Cross-Association. By performing a series of experiments, we verify the effectiveness of the proposed approach. Finally, we discuss the finding of a good value of k used in k-nearest neighbor search and also compare the clustering results with symmetric and asymmetric ways used in building a graph.

Protein Structure Alignment Based on Maximum of Residue Pair Distance and Similarity Graph (정렬된 잔기 사이의 최대거리와 유사도 그래프에 기반한 단백질 구조 정렬)

  • Kim, Woo-Cheol;Park, Sang-Hyun;Won, Jung-Im
    • Journal of KIISE:Databases
    • /
    • v.34 no.5
    • /
    • pp.396-408
    • /
    • 2007
  • After the Human Genome Project finished the sequencing of a human DNA sequence, the concerns on protein functions are increasing. Since the structures of proteins are conserved in divergent evolution, their functions are determined by their structures rather than by their amino acid sequences. Therefore, if similarities between two protein structures are observed, we could expect them to have common biological functions. So far, a lot of researches on protein structure alignment have been performed. However, most of them use RMSD(Root Mean Square Deviation) as a similarity measure with which it is hard to judge the similarity level of two protein structures intuitively. In addition, they retrieve only one result having the highest alignment score with which it is hard to satisfy various users of different purpose. To overcome these limitations, we propose a novel protein structure alignment algorithm based on MRPD(Maximum of Residue Pair Distance) and SG (Similarity Graph). MRPD is more intuitive similarity measure by which fast tittering of unpromising pairs of protein pairs is possible, and SG is a compact representation method for multiple alignment results with which users can choose the most plausible one among various users' needs by providing multiple alignment results without compromising the time to align protein structures.

Design of Visual Object-Oriented Database Query Language and Implementation of the Query Processor (시각적 객체지향 데이터베이스 질의어의 설계 및 질의처리기의 구현)

  • Lee, Suk-Kyoon;Nah, Yun-Mook;Suh, Yong-Moo
    • Asia pacific journal of information systems
    • /
    • v.11 no.2
    • /
    • pp.121-139
    • /
    • 2001
  • VOQL* query language, recently proposed, is a visual language for object-oriented databases. It is based on Ven Diagram and graph, so that the underlying schema structure can be naturally implied in query expressions. In VOQL*, structural relationship among the objects used in a query expression is represented graphically and thus it has formal semantics that can be inductively defined, as well as it can be used with ease. In this paper, we proposed revised VOQL* and introduced its query processor, InQs(Intelligent Querying System). While retaining the merit of VOQL* that it allows the structural relationship among the objects to be represented visually, the revised VOQL* has another merit that users can formulate a query interactively using various forms supplied by InQs. As a query processor that translates queries in revised VOQL into those in ODMG OQL, InQs provides an environment in which users express queries in revised VOQL* and then the system automatically translates them into those in ODMG OQL. Translation algorithm of InQs is much simpler and intuitive than other algorithms used in QUIVER and other systems, since it reflects the formal semantics of VOQL*, which is defined inductively.

  • PDF

The Query Optimization Techniques for XML Data using DTDs (DTD를 이용한 XML 데이타에 대한 질의 최적화 기법)

  • Chung, Tae-Sun;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.723-731
    • /
    • 2001
  • As XML has become and emerging standard for information exchange on the World Wide Web it has gained attention in database communities of extract information from XML seen as a database model. Data in XML can be mapped to semistructured dta model based on edge-labeled graph and queries can be processed against it Here we propose new query optimization techniques using DTDs(Document Type Definitions) which have the schema information about XML data. Our techniques reduce traditional index techniques Also, as they preserve source database structure, they can process many kinds of complex queries. we implemented our techniques and provided preliminary performance results.

  • PDF

Join Query Performance Optimization Based on Convergence Indexing Method (융합 인덱싱 방법에 의한 조인 쿼리 성능 최적화)

  • Zhao, Tianyi;Lee, Yong-Ju
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.1
    • /
    • pp.109-116
    • /
    • 2021
  • Since RDF (Resource Description Framework) triples are modeled as graph, we cannot directly adopt existing solutions in relational databases and XML technology. In order to store, index, and query Linked Data more efficiently, we propose a convergence indexing method combined R*-tree and K-dimensional trees. This method uses a hybrid storage system based on HDD (Hard Disk Drive) and SSD (Solid State Drive) devices, and a separated filter and refinement index structure to filter unnecessary data and further refine the immediate result. We perform performance comparisons based on three standard join retrieval algorithms. The experimental results demonstrate that our method has achieved remarkable performance compared to other existing methods such as Quad and Darq.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

An Automatic Relational Schema Generating System for an XML Schema (XML Schema에 대한 관계형 스키마 자동 생성 시스템)

  • 김정섭;박창원;정진완
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.527-539
    • /
    • 2004
  • As more and more documents are published in XML, generating relational schemas to store XML documents in a relational database is also getting important. This paper describes a technique as well as its implementation to produce a relational schema from the XML Schema, a standard recently recommended by W3C. The DTD-based inlining technique cannot be applied to the XML Schema, because the XML Schema has many new features, which don't exist in the DTD. Various built-in data types, inheritance, and polymorphism, for example, strengthen the XML Schema, but make the generation of a relational schema from an XML Schema more difficult. We propose an XML Schema Inlining Technique, based on the previous work. The technique first maps various data types in the XML Schema to those of the relational database. After that, it construct the schema graph and the type graph from types and elements defined in the XML Schema. The relational schema is generated while traversing the type graphs. Besides, we describe techniques for handling xsi:type, used for the polymorphism, and the anon#moos type. We also propose a couple of heuristic methods for enhancing the performance of the system. Finally, we conducted experiments to show that our technique is better than the binary table approach.

A Rewriting Algorithm for Inferrable SPARQL Query Processing Independent of Ontology Inference Models (온톨로지 추론 모델에 독립적인 SPARQL 추론 질의 처리를 위한 재작성 알고리즘)

  • Jeong, Dong-Won;Jing, Yixin;Baik, Doo-Kwon
    • Journal of KIISE:Databases
    • /
    • v.35 no.6
    • /
    • pp.505-517
    • /
    • 2008
  • This paper proposes a rewriting algorithm of OWL-DL ontology query in SPARQL. Currently, to obtain inference results of given SPARQL queries, Web ontology repositories construct inference ontology models and match the SPARQL queries with the models. However, an inference model requires much larger space than its original base model, and reusability of the model is not available for other inferrable SPARQL queries. Therefore, the aforementioned approach is not suitable for large scale SPARQL query processing. To resolve tills issue, this paper proposes a novel SPARQL query rewriting algorithm that can obtain results by rewriting SPARQL queries and accomplishing query operations against the base ontology model. To achieve this goal, we first define OWL-DL inference rules and apply them on rewriting graph pattern in queries. The paper categorizes the inference rules and discusses on how these rules affect the query rewriting. To show the advantages of our proposal, a prototype system based on lena is implemented. For comparative evaluation, we conduct an experiment with a set of test queries and compare of our proposal with the previous approach. The evaluation result showed the proposed algorithm supports an improved performance in efficiency of the inferrable SPARQL query processing without loss of completeness and soundness.

An Index Structure for Substructure Searching In Chemical Databases (화학 데이타베이스에서 부분구조 검색을 위한 인덱스 구조)

  • Lee Hwangu;Cha Jaehyuk
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.641-649
    • /
    • 2004
  • The relationship between chemical structures and biological activities is researched briskly in the area of 'Medicinal Chemistry' At the base of these structure-based drug design tries, medicinal chemists search the existing drugs of similar chemical structure to target drug for the development of a new drug. Therefore, it is such necessary that an automatic system selects drug files that have a set of chemical moieties matching a user-defined query moiety. Substructure searching is the process of identifying a set of chemical moieties that match a specific query moiety. Testing for substructure searching was developed in the late 1950s. In graph theoretical terms, this problem corresponds to determining which graphs in a set are subgraph isomorphic to a specified query moiety. Testing for subgraph isomorphism has been proved, in the general case, to be an NP- complete problem. For the purpose of overcoming this difficulty, there were computational approaches. On the 1990s, a US patent has been granted on an atom-centered indexing scheme, used by the RS3 system; this has the virtue that the indexes generated can be searched by direct text comparison. This system is commercially used(http://www.acelrys.com/rs3). We define the RS3 system's drawback and present a new indexing scheme. The RS3 system treats substructure searching with substring matching by means of expressing chemical structure aspredefined strings. However, it has insufficient 'rerall' and 'precision‘ because it is impossible to index structures uniquely for same atom and same bond. To resolve this problem, we make the minimum-cost- spanning tree for one centered atom and describe a structure with paths per levels. Expressing 2D chemical structure into 1D a string has limit. Therefore, we break 2D chemical structure into 1D structure fragments. We present in this paper a new index technique to improve recall and precision surprisingly.

A Semantic Classification Model for e-Catalogs (전자 카탈로그를 위한 의미적 분류 모형)

  • Kim Dongkyu;Lee Sang-goo;Chun Jonghoon;Choi Dong-Hoon
    • Journal of KIISE:Databases
    • /
    • v.33 no.1
    • /
    • pp.102-116
    • /
    • 2006
  • Electronic catalogs (or e-catalogs) hold information about the goods and services offered or requested by the participants, and consequently, form the basis of an e-commerce transaction. Catalog management is complicated by a number of factors and product classification is at the core of these issues. Classification hierarchy is used for spend analysis, custom3 regulation, and product identification. Classification is the foundation on which product databases are designed, and plays a central role in almost all aspects of management and use of product information. However, product classification has received little formal treatment in terms of underlying model, operations, and semantics. We believe that the lack of a logical model for classification Introduces a number of problems not only for the classification itself but also for the product database in general. It needs to meet diverse user views to support efficient and convenient use of product information. It needs to be changed and evolved very often without breaking consistency in the cases of introduction of new products, extinction of existing products, class reorganization, and class specialization. It also needs to be merged and mapped with other classification schemes without information loss when B2B transactions occur. For these requirements, a classification scheme should be so dynamic that it takes in them within right time and cost. The existing classification schemes widely used today such as UNSPSC and eClass, however, have a lot of limitations to meet these requirements for dynamic features of classification. In this paper, we try to understand what it means to classify products and present how best to represent classification schemes so as to capture the semantics behind the classifications and facilitate mappings between them. Product information implies a plenty of semantics such as class attributes like material, time, place, etc., and integrity constraints. In this paper, we analyze the dynamic features of product databases and the limitation of existing code based classification schemes. And describe the semantic classification model, which satisfies the requirements for dynamic features oi product databases. It provides a means to explicitly and formally express more semantics for product classes and organizes class relationships into a graph. We believe the model proposed in this paper satisfies the requirements and challenges that have been raised by previous works.