Improved Association Rule Mining by Modified Trimming

트리밍 방식 수정을 통한 연관규칙 마이닝 개선

  • Hwang, Won-Tae (School of Electrical and Electronics Engineering, Chung-Ang University) ;
  • Kim, Dong-Seung (School of Electrical Engineering, Korea University)
  • 황원태 (중앙대학교 전자전기공학부) ;
  • 김동승 (고려대학교 전기전자전파공학부)
  • Published : 2008.05.25

Abstract

This paper presents a new association mining algorithm that uses two phase sampling for shortening the execution time at the cost of precision of the mining result. Previous FAST(Finding Association by Sampling Technique) algorithm has the weakness in that it only considered the frequent 1-itemsets in trimming/growing, thus, it did not have ways of considering mulit-itemsets including 2-itemsets. The new algorithm reflects the multi-itemsets in sampling transactions. It improves the mining results by adjusting the counts of both missing itemsets and false itemsets. Experimentally, on a representative synthetic database, the algorithm produces a sampled subset of results with an increased accuracy in terms of the 2-itemsets while it maintains the same 1uality of the data set.

본 논문은 2단 샘플링을 통해 정확도는 줄지만 신속하게 연관규칙을 추출하는 새로운 마이닝 알고리즘을 제안한다. 직전 연구인 FAST(Finding Association by Sampling Technique) 기법은 빈발1항목만 최적샘플 형성과정에 적용하여 빈발2항목 및 그이상의 빈발항목을 샘플 추출에 반영하지 못하였다. 이 논문은 그러한 약점을 보완하여 트리밍 과정에서 손실항목과 오류항목의 비중을 동시에 고려하여 다수 빈발항목에 대한 마이닝의 정확성을 높였다. 대표적인 데이터 세트를 써서 실험한 결과 이전연구와 비교해서 동일한 품질하에서 새 알고리즘의 정확도가 향상됨을 확인하였다.

Keywords

References

  1. R. Agrawal and R. Srikant. "Fast algorithms for mining association rules". In Proc. VLDB Conf., 1994, pp.487-499
  2. B. Chen, P. Haas, and P. Scheuermann, "A new two-phase sampling based algorithm for discovering association rules", SIGKDD, 2002
  3. J. Han, J. Pei, and Y. Yin, "Mining frequent patterns without candidate generation", SIGMOD, 2000
  4. M. Lee and D. Kim, "Modified association rule mining based on two-stage data sampling", Procs. KISS (Korea Information Systems Society) Conf. on Parallel Processing System, Vol. 16 No. 1, pp.69-74, Jan. 2005
  5. G. Liu, H. Liu, Y. Xu, and J.X. Yu, "Ascending frequency ordered prefix-tree: efficient mining of frequent patterns", Procs. DASFAA 200
  6. I. Pramudiono and M. Kitsuregawa, "Parallel FP-growth on PC cluster", In Proc. 7th Pacific Asia Conference on Knowledge Discovery and Data Mining, pp. 467-473, 2003 https://doi.org/10.1007/3-540-36175-8_47
  7. R. Toivonen, "Sampling large databases for association rules", In Proc. VLDB Conf., 1996
  8. 이문환 (M. Lee), Improved Association Rule Mining Based on FAST(Finding Associations from Sampled Transactions) Algorithm, master thesis, Korea University, July, 2004