• Title/Summary/Keyword: FP-Growth 알고리즘

Search Result 20, Processing Time 0.032 seconds

Memory Improvement Method for Extraction of Frequent Patterns in DataBase (데이터베이스에서 빈발패턴의 추출을 위한 메모리 향상기법)

  • Park, In-Kyu
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.2
    • /
    • pp.127-133
    • /
    • 2019
  • Since frequent item extraction so far requires searching for patterns and traversal for the FP-Tree, it is more likely to store the mining data in a tree and thus CPU time is required for its searching. In order to overcome these drawbacks, in this paper, we provide each item with its location identification of transaction data without relying on conditional FP-Tree and convert transaction data into 2-dimensional position information look-up table, resulting in the facilitation of time and spatial accessibility. We propose an algorithm that considers the mapping scheme between the location of items and items that guarantees the linear time complexity. Experimental results show that the proposed method can reduce many execution time and memory usage based on the data set obtained from the FIMI repository website.

Text Assocation Pattern Extraction using NFP-tree Algorithm (NFP-Algorithm 알고리즘을 기반한 텍스트 연관 패턴 추출)

  • Yu, Soo-Kung;Kim, Kio-chung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.97-100
    • /
    • 2004
  • 인터넷상에서 존재하는 많은 데이터베이스들 중 현실적으로 유용한 정보를 가지고 있는 것은 텍스트 데이타베이스이다. 텍스트 마이닝 기법에서 비구조적인 특징을 가진 텍스트 데이타로부터 유용한 정보를 분석하고 추출하여 연관된 패턴을 탐색하는 과정은 중요한 연구과제이다. 이에 본 논문은 인터넷에서 저장된 텍스트 데이터를 가지고 기존 텍스트 마이닝 기법 중 연관탐색 기법을 적용하여 사용자 중심의 연관된 패턴을 찾아서 의미있는 정보를 얻고자 한다. 탐색하기 위해 먼저 전처리 작업으로 용어의 객체를 추출하고. 추출된 각 객체들은 대용량 데이터에서 시간적, 공간적면에서 효율적인 연관탐색 기법인 NFP-Algorithm(N-most interesting k-itemsets Using FP-tree and FP-Growth)을 적용시켜서 의미있는 정보를 추출했다. 또한 Apriori계 Algorithm, FP-Algorithm, NFP-Algorithm을 비교하여 NFP-Algorithm이 시간적면에서 효율적임을 보여주었다.

  • PDF

IRFP-tree: Intersection Rule Based FP-tree (IRFP-tree(Intersection Rule Based FP-tree): 메모리 효율성을 향상시키기 위해 교집합 규칙 기반의 패러다임을 적용한 FP-tree)

  • Lee, Jung-Hun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.3
    • /
    • pp.155-164
    • /
    • 2016
  • For frequency pattern analysis of large databases, the new tree-based frequency pattern analysis algorithm which can compensate for the disadvantages of the Apriori method has been variously studied. In frequency pattern tree, the number of nodes is associated with memory allocation, but also affects memory resource consumption and processing speed of the growth. Therefore, reducing the number of nodes in the tree is very important in the frequency pattern mining. However, the absolute criteria which need to order the transaction items for construction frequency pattern tree has lowered the compression ratio of the tree nodes. But most of the frequency based tree construction methods adapted the absolute criteria. FP-tree is typically frequency pattern tree structure which is an extended prefix-tree structure for storing compressed frequent crucial information about frequent patterns. For construction the tree, all the frequent items in different transactions are sorted according to the absolute criteria, frequency descending order. CanTree also need to absolute criteria, canonical order, to construct the tree. In this paper, we proposed a novel frequency pattern tree construction method that does not use the absolute criteria, IRFP-tree algorithm. IRFP-tree(Intersection Rule based FP-tree). IRFP-tree is constituted with the new paradigm of the intersection rule without the use of the absolute criteria. It increased the compression ratio of the tree nodes, and reduced the tree construction time. Our method has the additional advantage that it provides incremental mining. The reported test result demonstrate the applicability and effectiveness of the proposed approach.

Advanced Improvement for Frequent Pattern Mining using Bit-Clustering (비트 클러스터링을 이용한 빈발 패턴 탐사의 성능 개선 방안)

  • Kim, Eui-Chan;Kim, Kye-Hyun;Lee, Chul-Yong;Park, Eun-Ji
    • Journal of Korea Spatial Information System Society
    • /
    • v.9 no.1
    • /
    • pp.105-115
    • /
    • 2007
  • Data mining extracts interesting knowledge from a large database. Among numerous data mining techniques, research work is primarily concentrated on clustering and association rules. The clustering technique of the active research topics mainly deals with analyzing spatial and attribute data. And, the technique of association rules deals with identifying frequent patterns. There was an advanced apriori algorithm using an existing bit-clustering algorithm. In an effort to identify an alternative algorithm to improve apriori, we investigated FP-Growth and discussed the possibility of adopting bit-clustering as the alternative method to solve the problems with FP-Growth. FP-Growth using bit-clustering demonstrated better performance than the existing method. We used chess data in our experiments. Chess data were used in the pattern mining evaluation. We made a creation of FP-Tree with different minimum support values. In the case of high minimum support values, similar results that the existing techniques demonstrated were obtained. In other cases, however, the performance of the technique proposed in this paper showed better results in comparison with the existing technique. As a result, the technique proposed in this paper was considered to lead to higher performance. In addition, the method to apply bit-clustering to GML data was proposed.

  • PDF

Design and Implementation of PMSL for Information Retrieval (의미있는 정보 검색을 위한 개인화된 다중 전략 학습 모듈의 설계 및 구현)

  • 유수경;김교정
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.208-210
    • /
    • 2004
  • 오늘날 인터넷상에서 존재하는 않은 정보들은 다양한 사용자의 개인 특성에 안게 새로운 정보의 지식으로 제공되어지기를 원한다. 기존의 연구는 단일 학술 기법을 통해 정보를 추출했으나 사용자에게 보다 의미 있는 정보를 제공하기 위해 다중 전략 학습 기법인 PMSL(Personalized Multi-Strategy Learning) 모듈 시스템을 제안하고자 한다. PMSL 모듈은 인터넷의 정보를 여과하여 필터링하고, 사용자 개인화의 키워드를 중심으로 연관된 객체를 추출한다. 이때 연관된 객체 추출시 대용량 데이터에서 시간적, 공간적면에서 효율적인 연관 탐색 기법인 Fp-Tree와 Fp-Growth 알고리즘을 적용시킴으로 결과의 효율성을 높이고자 하였으며, 연관규칙의 문제점을 보완하기 위해 가중치 기법인 TF*IDF 학습 기법을 적용시켰다. PMSL 모듈을 실행한 결과 기존 학습 기법에 비해 보다 더 의미 있는 연관 지식을 추출하게 되었다.

  • PDF

TF-IDF Based Association Rule Analysis System for Medical Data (의료 정보 추출을 위한 TF-IDF 기반의 연관규칙 분석 시스템)

  • Park, Hosik;Lee, Minsu;Hwang, Sungjin;Oh, Sangyoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.3
    • /
    • pp.145-154
    • /
    • 2016
  • Because of the recent interest in the u-Health and development of IT technology, a need of utilizing a medical information data has been increased. Among previous studies that utilize various data mining algorithms for processing medical information data, there are studies of association rule analysis. In the studies, an association between the symptoms with specified diseases is the target to discover, however, infrequent terms which can be important information for a disease diagnosis are not considered in most cases. In this paper, we proposed a new association rule mining system considering the importance of each term using TF-IDF weight to consider infrequent but important items. In addition, the proposed system can predict candidate diagnoses from medical text records using term similarity analysis based on medical ontology.

High Utility Pattern Mining using a Prefix-Tree (Prefix-Tree를 이용한 높은 유틸리티 패턴 마이닝 기법)

  • Jeong, Byeong-Soo;Ahmed, Chowdhury Farhan;Lee, In-Gi;Yong, Hwan-Seong
    • Journal of KIISE:Databases
    • /
    • v.36 no.5
    • /
    • pp.341-351
    • /
    • 2009
  • Recently high utility pattern (HUP) mining is one of the most important research issuer in data mining since it can consider the different weight Haloes of items. However, existing mining algorithms suffer from the performance degradation because it cannot easily apply Apriori-principle for pattern mining. In this paper, we introduce new high utility pattern mining approach by using a prefix-tree as in FP-Growth algorithm. Our approach stores the weight value of each item into a node and utilizes them for pruning unnecessary patterns. We compare the performance characteristics of three different prefix-tree structures. By thorough experimentation, we also prove that our approach can give performance improvement to a degree.

A Study on the Real-time user purchase pattern analysis User Product Recommendation System in E-Commerce Environment (E-commerce 환경에서 실시간 사용자 구매 패턴 분석을 통한 사용자 상품 추천 시스템 연구)

  • Beom Jung Kim;Ji Hye Huh;Hyeopgeon Lee;Young Woon Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.413-414
    • /
    • 2023
  • IT 기술의 발달로 E-Commerce 분야는 실시간으로 발생되는 데이터양이 증가하고 있으며, 발생된 데이터는 개인화 맞춤 서비스에 많이 활용되고 있다. 그러나 신생 E-commerce 기업은 신규 상품 및 기존 상품에 대한 정보와 고객 간의 상호 작용 데이터가 존재하지 않아 콜드 스타트 문제가 발생한다. 이에 본 논문에서는 E-commerce 환경에서 실시간 사용자 구매패턴 분석을 통한 사용자 상품 추천 시스템을 제안한다. 제안하는 시스템은 Kafka와 Spark를 사용해 실시간 스트림을 데이터를 처리한다. 주요 기능은 ALS 알고리즘과, FP-Growth 알고리즘을 적용해 콜트 스타트 문제를 해결하며, 사용자 구매 패턴 분석을 통한 분석 결과에 맞는 상품을 사용자에게 추천한다.

Discovering Association Rules using Item Clustering on Frequent Pattern Network (빈발 패턴 네트워크에서 아이템 클러스터링을 통한 연관규칙 발견)

  • Oh, Kyeong-Jin;Jung, Jin-Guk;Ha, In-Ay;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.1
    • /
    • pp.1-17
    • /
    • 2008
  • Data mining is defined as the process of discovering meaningful and useful pattern in large volumes of data. In particular, finding associations rules between items in a database of customer transactions has become an important thing. Some data structures and algorithms had been proposed for storing meaningful information compressed from an original database to find frequent itemsets since Apriori algorithm. Though existing method find all association rules, we must have a lot of process to analyze association rules because there are too many rules. In this paper, we propose a new data structure, called a Frequent Pattern Network (FPN), which represents items as vertices and 2-itemsets as edges of the network. In order to utilize FPN, We constitute FPN using item's frequency. And then we use a clustering method to group the vertices on the network into clusters so that the intracluster similarity is maximized and the intercluster similarity is minimized. We generate association rules based on clusters. Our experiments showed accuracy of clustering items on the network using confidence, correlation and edge weight similarity methods. And We generated association rules using clusters and compare traditional and our method. From the results, the confidence similarity had a strong influence than others on the frequent pattern network. And FPN had a flexibility to minimum support value.

  • PDF

Finding the time sensitive frequent itemsets based on data mining technique in data streams (데이터 스트림에서 데이터 마이닝 기법 기반의 시간을 고려한 상대적인 빈발항목 탐색)

  • Park, Tae-Su;Chun, Seok-Ju;Lee, Ju-Hong;Kang, Yun-Hee;Choi, Bum-Ghi
    • Journal of The Korean Association of Information Education
    • /
    • v.9 no.3
    • /
    • pp.453-462
    • /
    • 2005
  • Recently, due to technical improvements of storage devices and networks, the amount of data increase rapidly. In addition, it is required to find the knowledge embedded in a data stream as fast as possible. Huge data in a data stream are created continuously and changed fast. Various algorithms for finding frequent itemsets in a data stream are actively proposed. Current researches do not offer appropriate method to find frequent itemsets in which flow of time is reflected but provide only frequent items using total aggregation values. In this paper we proposes a novel algorithm for finding the relative frequent itemsets according to the time in a data stream. We also propose the method to save frequent items and sub-frequent items in order to take limited memory into account and the method to update time variant frequent items. The performance of the proposed method is analyzed through a series of experiments. The proposed method can search both frequent itemsets and relative frequent itemsets only using the action patterns of the students at each time slot. Thus, our method can enhance the effectiveness of learning and make the best plan for individual learning.

  • PDF