• 제목/요약/키워드: Graph Databases

검색결과 91건 처리시간 0.025초

The Status Quo of Graph Databases in Construction Research

  • Jeon, Kahyun;Lee, Ghang
    • 국제학술발표논문집
    • /
    • The 9th International Conference on Construction Engineering and Project Management
    • /
    • pp.800-807
    • /
    • 2022
  • This study aims to review the use of graph databases in construction research. Based on the diagnosis of the current research status, a future research direction is proposed. The use of graph databases in construction research has been increasing because of the efficiency in expressing complex relations between entities in construction big data. However, no study has been conducted to review systematically the status quo of graph databases. This study analyzes 42 papers in total that deployed a graph model and graph database in construction research, both quantitatively and qualitatively. A keyword analysis, topic modeling, and qualitative content analysis were conducted. The review identified the research topics, types of data sources that compose a graph, and the graph database application methods and algorithms. Although the current research is still in a nascent stage, the graph database research has great potential to develop into an advanced stage, fused with artificial intelligence (AI) in the future, based on the active usage trends this study revealed.

  • PDF

다양성을 지원하는 그래프 데이터베이스 벤치마킹 시스템 (Graph Database Benchmarking Systems Supporting Diversity)

  • 최도진;백연희;이소민;김윤아;김남영;최재용;이현병;임종태;복경수;송석일;유재수
    • 한국콘텐츠학회논문지
    • /
    • 제21권12호
    • /
    • pp.84-94
    • /
    • 2021
  • 객체 간의 관계를 표현하기 위해 정점과 간선으로 구성된 그래프 데이터를 효율적으로 저장하고 질의 처리하기 위한 그래프 데이터베이스가 개발되었다. 그래프 데이터베이스는 질의 유형이 기존 NoSQL 데이터베이스와 매우 다른 특성을 보이기 때문에 그래프 데이터베이스의 성능을 검증하기 위해서는 그래프 데이터베이스에 알맞은 벤치마킹 도구가 필요하다. 본 논문에서는 그래프 입력과 질의에 대한 다양성을 지원하는 효율적인 그래프 데이터베이스 벤치마킹 시스템을 제안한다. 제안하는 시스템은 그래프 데이터베이스에 대한 벤치마킹을 테스트하기 위해서 OrientDB를 활용한다. 입력 그래프와 질의 그래프의 다양성을 지원하기 위해서 기존 그래프 데이터 생성 도구인 LDBC를 이용한다. 벤치마킹 결과 분석을 통해 제안하는 기법의 타당성 및 실효성을 입증한다. 성능 평가 결과 제안하는 시스템은 사용자 정의 가능한 가상 그래프 데이터가 생성이 가능하며, 생성된 그래프 데이터를 기반으로 벤치마킹이 가능함을 보였다.

빅데이터환경에서의 그래프데이터베이스 활용방안 (Application Plan of Graph Databases in the Big Data Environment)

  • 박승범;이상원;안현섭;정인환
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2013년도 추계학술대회
    • /
    • pp.247-249
    • /
    • 2013
  • 관계형 데이터베이스가 많은 기업에서 널리 사용되고 있지만, 개체간의 관계를 효과적이고 효율적으로 관리하지는 못하고 있다. 빅데이터를 분석하기 위해서는 다양한 개체간의 관계를 그래프로 표현할 필요가 절실하다. 본 논문에서는 그래프 데이터베이스와 그의 구조를 정의하고, 트랜잭션, 일관성, 가용성, 검색 기능 및 확장 등의 그 특성에 대해 살펴본다. 또한, 그래프 데이터베이스를 적용해야할 분야와 적용하지 말아야 할 분야에 대해 살펴본다.

  • PDF

Use of Graph Database for the Integration of Heterogeneous Biological Data

  • Yoon, Byoung-Ha;Kim, Seon-Kyu;Kim, Seon-Young
    • Genomics & Informatics
    • /
    • 제15권1호
    • /
    • pp.19-27
    • /
    • 2017
  • Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.

관계형데이터를 이용한 그래프 데이터베이스의 모델별 구조 분석과 쿼리 성능 비교 연구 (Structural Analysis and Performance Test of Graph Databases using Relational Data)

  • 배석민;김진형;유재민;양성열;정재진
    • 한국멀티미디어학회논문지
    • /
    • 제22권9호
    • /
    • pp.1036-1045
    • /
    • 2019
  • Relational databases have a notion of normalization, in which the model for storing data is standardized according to the organization's business processes or data operations. However, the graph database is relatively early in this standardization and has a high degree of freedom in modeling. Therefore various models can be created with the same data, depending on the database designers. The essences of the graph database are two aspects. First, the graph database allows accessing relationships between the objects semantically. Second, it makes relationships between entities as important as individual data. Thus increasing the degree of freedom in modeling and providing the modeling developers with a more creative system. This paper introduces different graph models with test data. It compares the query performances by the results of response speeds to the query executions per graph model to find out how the efficiency of each model can be maximized.

Combining Local and Global Features to Reduce 2-Hop Label Size of Directed Acyclic Graphs

  • Ahn, Jinhyun;Im, Dong-Hyuk
    • Journal of Information Processing Systems
    • /
    • 제16권1호
    • /
    • pp.201-209
    • /
    • 2020
  • The graph data structure is popular because it can intuitively represent real-world knowledge. Graph databases have attracted attention in academia and industry because they can be used to maintain graph data and allow users to mine knowledge. Mining reachability relationships between two nodes in a graph, termed reachability query processing, is an important functionality of graph databases. Online traversals, such as the breadth-first and depth-first search, are inefficient in processing reachability queries when dealing with large-scale graphs. Labeling schemes have been proposed to overcome these disadvantages. The state-of-the-art is the 2-hop labeling scheme: each node has in and out labels containing reachable node IDs as integers. Unfortunately, existing 2-hop labeling schemes generate huge 2-hop label sizes because they only consider local features, such as degrees. In this paper, we propose a more efficient 2-hop label size reduction approach. We consider the topological sort index, which is a global feature. A linear combination is suggested for utilizing both local and global features. We conduct experiments over real-world and synthetic directed acyclic graph datasets and show that the proposed approach generates smaller labels than existing approaches.

A Metabolic Pathway Drawing Algorithm for Reducing the Number of Edge Crossings

  • Song Eun-Ha;Kim Min-Kyung;Lee Sang-Ho
    • Genomics & Informatics
    • /
    • 제4권3호
    • /
    • pp.118-124
    • /
    • 2006
  • For the direct understanding of flow, pathway data are usually represented as directed graphs in biological journals and texts. Databases of metabolic pathways or signal transduction pathways inevitably contain these kinds of graphs to show the flow. KEGG, one of the representative pathway databases, uses the manually drawn figure which can not be easily maintained. Graph layout algorithms are applied for visualizing metabolic pathways in some databases, such as EcoCyc. Although these can express any changes of data in the real time, it exponentially increases the edge crossings according to the increase of nodes. For the understanding of genome scale flow of metabolism, it is very important to reduce the unnecessary edge crossings which exist in the automatic graph layout. We propose a metabolic pathway drawing algorithm for reducing the number of edge crossings by considering the fact that metabolic pathway graph is scale-free network. The experimental results show that the number of edge crossings is reduced about $37{\sim}40%$ by the consideration of scale-free network in contrast with non-considering scale-free network. And also we found that the increase of nodes do not always mean that there is an increase of edge crossings.

랜섬웨어 탐지를 위한 그래프 데이터베이스 설계 및 구현 (Graph Database Design and Implementation for Ransomware Detection)

  • 최도현
    • 융합정보논문지
    • /
    • 제11권6호
    • /
    • pp.24-32
    • /
    • 2021
  • 최근 랜섬웨어(ransomware) 공격은 이메일, 피싱(phishing), 디바이스(Device) 해킹 등 다양한 경로를 통해 감염되어 피해 규모가 급증하는 추세이다. 그러나 기존 알려진 악성코드(정적/동적) 분석 엔진은 APT(Aadvanced Persistent Threat)공격처럼 발전된 신종 랜섬웨어에 대한 탐지/차단이 매우 어렵다. 본 연구는 그래프 데이터베이스를 기반으로 랜섬웨어 악성 행위를 모델링(Modeling)하고 랜섬웨어에 대한 새로운 다중 복합 악성 행위를 탐지하는 방법을 제안한다. 연구 결과 기존 관계형 데이터베이스와 다른 새로운 그래프 데이터 베이스 환경에서 랜섬웨어의 패턴 탐지가 가능함을 확인하였다. 또한, 그래프 이론의 연관 관계 분석 기법이 랜섬웨어 분석 성능에 크게 효율적임을 증명하였다.

A Horizontal Partition of the Object-Oriented Database for Efficient Clustering

  • Chung, Chin-Wan;Kim, Chang-Ryong;Lee, Ju-Hong
    • Journal of Electrical Engineering and information Science
    • /
    • 제1권1호
    • /
    • pp.164-172
    • /
    • 1996
  • The partitioning of related objects should be performed before clustering for an efficient access in object-oriented databases. In this paper, a horizontal partition of related objects in object-oriented databases is presented. All subclass nodes in a class inheritance hierarchy of a schema graph are shrunk to a class node in the graph that is called condensed schema graph because the aggregation hierarchy has more influence on the partition than the class inheritance hierarchy. A set function and an accessibility function are defined to find a maximal subset of related objects among the set of objects in a class. A set function maps a subset of the domain class objects to a subset of the range class objects. An accessibility function maps a subset of the objects of a class into a subset of the objects of the same class through a composition of set functions. The algorithm derived in this paper is to find the related objects of a condensed schema graph using accessibility functions and set functions. The existence of a maximal subset of the related objects in a class is proved to show the validity of the partition algorithm using the accessibility function.

  • PDF

Contribution to Improve Database Classification Algorithms for Multi-Database Mining

  • Miloudi, Salim;Rahal, Sid Ahmed;Khiat, Salim
    • Journal of Information Processing Systems
    • /
    • 제14권3호
    • /
    • pp.709-726
    • /
    • 2018
  • Database classification is an important preprocessing step for the multi-database mining (MDM). In fact, when a multi-branch company needs to explore its distributed data for decision making, it is imperative to classify these multiple databases into similar clusters before analyzing the data. To search for the best classification of a set of n databases, existing algorithms generate from 1 to ($n^2-n$)/2 candidate classifications. Although each candidate classification is included in the next one (i.e., clusters in the current classification are subsets of clusters in the next classification), existing algorithms generate each classification independently, that is, without taking into account the use of clusters from the previous classification. Consequently, existing algorithms are time consuming, especially when the number of candidate classifications increases. To overcome the latter problem, we propose in this paper an efficient approach that represents the problem of classifying the multiple databases as a problem of identifying the connected components of an undirected weighted graph. Theoretical analysis and experiments on public databases confirm the efficiency of our algorithm against existing works and that it overcomes the problem of increase in the execution time.