Proceedings of the IEEK Conference (대한전자공학회:학술대회논문집)
- 2003.07c
- /
- Pages.2561-2564
- /
- 2003
Parallel Data Mining with Distributed Frequent Pattern Trees
분산형 FP트리를 활용한 병렬 데이터 마이닝
Abstract
Data mining is an effective method of the discovery of useful information such as rules and previously unknown patterns existing in large databases. The discovery of association rules is an important data mining problem. We have developed a new parallel mining called Distributed Frequent Pattern Tree (abbreviated by DFPT) algorithm on a distributed shared nothing parallel system to detect association rules. DFPT algorithm is devised for parallel execution of the FP-growth algorithm. It needs only two full disk data scanning of the database by eliminating the need for generating the candidate items. We have achieved good workload balancing throughout the mining process by distributing the work equally to all processors. We implemented the algorithm on a PC cluster system, and observed that the algorithm outperformed the Improved Count Distribution scheme.
Keywords