DOI QR코드

DOI QR Code

A single-phase algorithm for mining high utility itemsets using compressed tree structures

  • Bhat B, Anup (Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education) ;
  • SV, Harish (Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education) ;
  • M, Geetha (Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education)
  • 투고 : 2020.08.05
  • 심사 : 2021.01.22
  • 발행 : 2021.12.01

초록

Mining high utility itemsets (HUIs) from transaction databases considers such factors as the unit profit and quantity of purchased items. Two-phase tree-based algorithms transform a database into compressed tree structures and generate candidate patterns through a recursive pattern-growth procedure. This procedure requires a lot of memory and time to construct conditional pattern trees. To address this issue, this study employs two compressed tree structures, namely, Utility Count Tree and String Utility Tree, to enumerate valid patterns and thus promote fast utility computation. Furthermore, the study presents an algorithm called single-phase utility computation (SPUC) that leverages these two tree structures to mine HUIs in a single phase by incorporating novel pruning strategies. Experiments conducted on both real and synthetic datasets demonstrate the superior performance of SPUC compared with IHUP, UP-Growth, and UP-Growth+algorithms.

키워드

과제정보

This work was supported by Manipal Academy of Higher Education Dr. T.M.A Pai Research Scholarship under Research Registration No. 170900117.

참고문헌

  1. W. Zhang et al., Text clustering using frequent itemsets, Knowl-Based Syst. 23 (2010), no. 5, 379-388. https://doi.org/10.1016/j.knosys.2010.01.011
  2. S. Naulaerts, et al., A primer to frequent itemset mining for bioinformatics, Brief Bioinform. 16 (2015), 216-231. https://doi.org/10.1093/bib/bbt074
  3. R. Harpaz, H. S. Chase, and C. Friedman, Mining multi-item drug adverse effect associations in spontaneous reporting systems, BMC Bioinform. 11 (2010), no. 9, S7.
  4. J. Han et al., Frequent pattern mining: Current status and future directions, Data Min. Knowl. Disc. 15 (2007), no. 1, 55-86. https://doi.org/10.1007/s10618-006-0059-1
  5. H. Yao, H. J. Hamilton, and C. J. Butz, A foundational approach to mining itemset utilities from databases, in Proc. SIAM Int. Conf. Data Min. (Lake Buena Vista, FL, USA), Apr. 2004, pp. 482-486.
  6. H. Yao and H. J. Hamilton. Mining itemset utilities from transaction databases, Data Knowl. Eng. 59 (2006), no. 3, 603-626. https://doi.org/10.1016/j.datak.2005.10.004
  7. Y. Liu and W.-K. Liao, A fast high utility itemsets mining algorithm, in Proc. Int. Workshop Utility-Based Data Min. (New York, NY, USA), Aug. 2005, pp. 90-99.
  8. Y. Liu, W.-K. Liao, and A. Choudhary, A two-phase algorithm for fast discovery of high utility itemsets, in Advances in Knowledge Discovery and Data Mining, Springer, Berlin, Heidelberg, Germany, 2005, pp. 689-695.
  9. Y. Liu et al., High utility itemsets mining, Int. J. Inf. Tech. Decis. Making 9 (2010), no. 6, 905-934. https://doi.org/10.1142/S0219622010004159
  10. R. Agrawal and R. Srikant, Fast algorithms for mining association rules, in Proc. Int. Conf. Very Large Data Bases (Santiago, Chile), Sept. 1994, 487-499.
  11. C. W. Lin, T. P. Hong, and W. H. Lu, An effective tree structure for mining high utility itemsets, Expert Syst. Appl. 38 (2011), no. 6, 7419-7424. https://doi.org/10.1016/j.eswa.2010.12.082
  12. C. F. Ahmed et al., HUC-Prune: An efficient candidate pruning technique to mine high utility patterns, Appl. Intell. 34 (2011), no. 2, 181-198. https://doi.org/10.1007/s10489-009-0188-5
  13. C. F. Ahmed et al., Efficient tree structures for high utility pattern mining in incremental databases, IEEE Trans. Knowl. Data Eng. 21 (2009), no. 12, 1708-1721. https://doi.org/10.1109/TKDE.2009.46
  14. V. S. Tseng et al., UP-Growth: An efficient algorithm for high utility itemset mining, Discov. Data Min. (New York, NY, USA), July (2010), 253-262.
  15. V. S. Tseng et al., Efficient algorithms for mining high utility itemsets from transactional databases, IEEE Trans. Knowl. Data Eng. 28 (2016), no 1, 54-67. https://doi.org/10.1109/TKDE.2015.2458860
  16. J. Han et al., Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Min. Knowl. Disc. 8 (2004), no. 1, 53-87. https://doi.org/10.1023/B:DAMI.0000005258.31418.83
  17. M Liu and J Qu, Mining high utility itemsets without candidate generation, in Proc. ACM Int. Conf. Inform. Knowl. Manag. (New York, NY, USA), Oct. 2012, pp. 55-64.
  18. S. Krishnamoorthy, Pruning strategies for mining high utility itemsets, Expert Syst. Appl. 42 (2015), no. 5, 2371-2381. https://doi.org/10.1016/j.eswa.2014.11.001
  19. P. Fournier-Viger et al., Fhm: Faster high-utility itemset mining using estimated utility co-occurrence pruning, in International Symposium on Methodologies for Intelligent Systems, Springer, Berlin, Heidelberg, Germany, 2014, pp. 83-92.
  20. C. Zhang et al., An empirical evaluation of high utility itemset mining algorithms, Expert Syst. Appl. 101 (2018), 91-115. https://doi.org/10.1016/j.eswa.2018.02.008
  21. S. Zida et al., Efim: A fast and memory efficient algorithm for high-utility itemset mining, Knowl. Inf. Syst. 51 (2017), no. 2, 595-625. https://doi.org/10.1007/s10115-016-0986-0
  22. J. Liu, K. E. Wang, and B. C. M. Fung, Direct discovery of high utility itemsets without candidate generation, in Proc. IEEE Int. Conf. Data Min. (Brussels, Belgium), Dec. 2012, pp. 984-989.
  23. J. Liu, K. Wang, and B. C. M. Fung, Mining high utility patterns in one phase without generating candidates, IEEE Trans. Knowl. Data Eng. 28 (2016), no. 5, 1245-1257. https://doi.org/10.1109/TKDE.2015.2510012
  24. S. Dawar, D. Bera, and V. Goyal, High-utility itemset mining for subadditive monotone utility functions, arXiv preprint, CoRR, 2018, arXiv:1812.07208.
  25. V. S. Ananthanarayana, D. K. Subramanian, and M. N. Murty, Scalable, distributed and dynamic mining of association rules, in High Performance Computing-HiPC 2000, vol. 1970, Springer, Berlin, Heidelberg, Germany, 2000, pp. 559-566.
  26. M. Geetha and R. J. D'souza, An efficient reduced pattern count tree method for discovering most accurate set of frequent itemsets, Int. J. Comp. Sci. Netw. Sec. 8 (2008), no. 8, 121-126.
  27. P. Fournier-Viger, SPMF An Open-Source Data Mining Library, Developer's Guide, 2020, available at https://www.philippe-fournier-viger.com/spmf/index.php?link=developers.php
  28. P. Fournier-Viger, SPMF An Open-Source Data Mining Library, Datasets, 2020. available at https://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php