• Title/Summary/Keyword: Subgraph mining

Search Result 7, Processing Time 0.018 seconds

Mining Highly Reliable Dense Subgraphs from Uncertain Graphs

  • LU, Yihong;HUANG, Ruizhi;HUANG, Decai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.6
    • /
    • pp.2986-2999
    • /
    • 2019
  • The uncertainties of the uncertain graph make the traditional definition and algorithms on mining dense graph for certain graph not applicable. The subgraph obtained by maximizing expected density from an uncertain graph always has many low edge-probability data, which makes it low reliable and low expected edge density. Based on the concept of ${\beta}$-subgraph, to overcome the low reliability of the densest subgraph, the concept of optimal ${\beta}$-subgraph is proposed. An efficient greedy algorithm is also developed to find the optimal ${\beta}$-subgraph. Simulation experiments of multiple sets of datasets show that the average edge-possibility of optimal ${\beta}$-subgraph is improved by nearly 40%, and the expected edge density reaches 0.9 on average. The parameter ${\beta}$ is scalable and applicable to multiple scenarios.

Efficient Mining of Frequent Subgraph with Connectivity Constraint

  • Moon, Hyun-S.;Lee, Kwang-H.;Lee, Do-Heon
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.267-271
    • /
    • 2005
  • The goal of data mining is to extract new and useful knowledge from large scale datasets. As the amount of available data grows explosively, it became vitally important to develop faster data mining algorithms for various types of data. Recently, an interest in developing data mining algorithms that operate on graphs has been increased. Especially, mining frequent patterns from structured data such as graphs has been concerned by many research groups. A graph is a highly adaptable representation scheme that used in many domains including chemistry, bioinformatics and physics. For example, the chemical structure of a given substance can be modelled by an undirected labelled graph in which each node corresponds to an atom and each edge corresponds to a chemical bond between atoms. Internet can also be modelled as a directed graph in which each node corresponds to an web site and each edge corresponds to a hypertext link between web sites. Notably in bioinformatics area, various kinds of newly discovered data such as gene regulation networks or protein interaction networks could be modelled as graphs. There have been a number of attempts to find useful knowledge from these graph structured data. One of the most powerful analysis tool for graph structured data is frequent subgraph analysis. Recurring patterns in graph data can provide incomparable insights into that graph data. However, to find recurring subgraphs is extremely expensive in computational side. At the core of the problem, there are two computationally challenging problems. 1) Subgraph isomorphism and 2) Enumeration of subgraphs. Problems related to the former are subgraph isomorphism problem (Is graph A contains graph B?) and graph isomorphism problem(Are two graphs A and B the same or not?). Even these simplified versions of the subgraph mining problem are known to be NP-complete or Polymorphism-complete and no polynomial time algorithm has been existed so far. The later is also a difficult problem. We should generate all of 2$^n$ subgraphs if there is no constraint where n is the number of vertices of the input graph. In order to find frequent subgraphs from larger graph database, it is essential to give appropriate constraint to the subgraphs to find. Most of the current approaches are focus on the frequencies of a subgraph: the higher the frequency of a graph is, the more attentions should be given to that graph. Recently, several algorithms which use level by level approaches to find frequent subgraphs have been developed. Some of the recently emerging applications suggest that other constraints such as connectivity also could be useful in mining subgraphs : more strongly connected parts of a graph are more informative. If we restrict the set of subgraphs to mine to more strongly connected parts, its computational complexity could be decreased significantly. In this paper, we present an efficient algorithm to mine frequent subgraphs that are more strongly connected. Experimental study shows that the algorithm is scaling to larger graphs which have more than ten thousand vertices.

  • PDF

A Methodology for Searching Frequent Pattern Using Graph-Mining Technique (그래프마이닝을 활용한 빈발 패턴 탐색에 관한 연구)

  • Hong, June Seok
    • Journal of Information Technology Applications and Management
    • /
    • v.26 no.1
    • /
    • pp.65-75
    • /
    • 2019
  • As the use of semantic web based on XML increases in the field of data management, a lot of studies to extract useful information from the data stored in ontology have been tried based on association rule mining. Ontology data is advantageous in that data can be freely expressed because it has a flexible and scalable structure unlike a conventional database having a predefined structure. On the contrary, it is difficult to find frequent patterns in a uniformized analysis method. The goal of this study is to provide a basis for extracting useful knowledge from ontology by searching for frequently occurring subgraph patterns by applying transaction-based graph mining techniques to ontology schema graph data and instance graph data constituting ontology. In order to overcome the structural limitations of the existing ontology mining, the frequent pattern search methodology in this study uses the methodology used in graph mining to apply the frequent pattern in the graph data structure to the ontology by applying iterative node chunking method. Our suggested methodology will play an important role in knowledge extraction.

Improved approach of calculating the same shape in graph mining (그래프 마이닝에서 그래프 동형판단연산의 향상기법)

  • No, Young-Sang;Yun, Un-Il;Kim, Myung-Jun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.10
    • /
    • pp.251-258
    • /
    • 2009
  • Data mining is a method that extract useful knowledges from huge size of data. Recently, a focussing research part of data mining is to find interesting patterns in graph databases. More efficient methods have been proposed in graph mining. However, graph analysis methods are in NP-hard problem. Graph pattern mining based on pattern growth method is to find complete set of patterns satisfying certain property through extending graph pattern edge by edge with avoiding generation of duplicated patterns. This paper suggests an efficient approach of reducing computing time of pattern growth method through pattern growth's property that similar patterns cause similar tasks. we suggest pruning methods which reduce search space. Based on extensive performance study, we discuss the results and the future works.

BINGO: Biological Interpretation Through Statistically and Graph-theoretically Navigating Gene $Ontology^{TM}$

  • Lee, Sung-Geun;Yang, Jae-Seong;Chung, Il-Kyung;Kim, Yang-Seok
    • Molecular & Cellular Toxicology
    • /
    • v.1 no.4
    • /
    • pp.281-283
    • /
    • 2005
  • Extraction of biologically meaningful data and their validation are very important for toxicogenomics study because it deals with huge amount of heterogeneous data. BINGO is an annotation mining tool for biological interpretation of gene groups. Several statistical modeling approaches using Gene Ontology (GO) have been employed in many programs for that purpose. The statistical methodologies are useful in investigating the most significant GO attributes in a gene group, but the coherence of the resultant GO attributes over the entire group is rarely assessed. BINGO complements the statistical methods with graph-theoretic measures using the GO directed acyclic graph (DAG) structure. In addition, BINGO visualizes the consistency of a gene group more intuitively with a group-based GO subgraph. The input group can be any interesting list of genes or gene products regardless of its generation process if the group is built under a functional congruency hypothesis such as gene clusters from DNA microarray analysis.

An efficient approach of avoiding extensions of duplicated graph patterns in cyclic graph mining (순환 그래프 마이닝에서 중복된 그래프 패턴의 확장을 피하는 효율적인 기법)

  • No, Young-Sang;Yun, Un-Il;Pyun, Gwang-Bum;Ryang, Heung-Mo;Lee, Gang-In;Ryu, Keun-Ho;Lee, Kyung-Min
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.12
    • /
    • pp.33-41
    • /
    • 2011
  • From Complicated graph structures, duplicated operations can be executed and the operations give low efficiency. In this paper, we propose an efficient graph mining algorithm of minimizing the extension of duplicated graph patterns in which the priorities of cyclic edges are considered. In our approach, the cyclic edges with lower priorities are first extended and so duplicated extensions can be reduced. For performance test, we implement our algorithm and compare our algorithm with a state of the art, Gaston algorithm. Finally, We show that ours outperforms Gaston algorithm.

The Empirical Study on the Effect of Technology Exchanges in the Fourth Industrial Revolution between Korea and China: Focused on the Firm Social Network Analysis (한중 4차산업혁명 기술교류 및 효과에 대한 실증연구: 기업 소셜 네트워크 분석 중심으로)

  • Zhou, Zhenxin;Sohn, Kwonsang;Hwang, Yoon Min;Kwon, Ohbyung
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.3
    • /
    • pp.41-61
    • /
    • 2020
  • China's rapid development and commercialization of high-tech technologies in the fourth industrial revolution has led to effective technology exchanges between Korean and Chinese firms becoming more important to Korea's mid-term and long-term industrial development. However, there is still a lack of empirical research on how technology exchanges between Korean and Chinese firms proceed and their effectiveness. In response, this study conducted a social network analysis based on text mining data of Korea-China business technology exchange and cooperation articles introduced in the news from 2018 to March 2020 on the current status and effects of Korea-China technology exchanges related to the fourth industrial revolution, and conducted a regression analysis how network centrality effect on the firm performance. According to the results, most of the Korean major electronic firms are actively networking with Chinese firms and institutions, showing high centrality in the centrality index. Korean telecommunication firms showed high betweenness centrality and subgraph centrality, and Korean Internet service providers and broadcasting contents firms showed high eigenvector centrality. In addition, Chinese firms showed higher betweenness centrality than Korean firms, and Chinese service firms showed higher closeness centrality than manufacturing firms. As a result of regression analysis, this network centrality had a positive effect on firm performance. To the best of our knowledge, this is the first to analyze the impact of the technical cooperation between Korean and Chinese firms under the fourth industrial revolution context. This study has theoretical implications that suggested the direction of social network analysis-based empirical research in global firm cooperation. Also, this study has practical implications that the guidelines for network analysis in setting the direction of technical cooperation between Korea and China by firms or governments.