DOI QR코드

DOI QR Code

Effective and Efficient Similarity Measures for Purchase Histories Considering Product Taxonomy

  • Yang, Yu-Jeong (Dept. of Computer Science, Sookmyung Women's University) ;
  • Lee, Ki Yong (Dept. of Computer Science, Sookmyung Women's University)
  • Received : 2020.03.19
  • Accepted : 2020.07.03
  • Published : 2021.02.28

Abstract

In an online shopping site or offline store, products purchased by each customer over time form the purchase history of the customer. Also, in most retailers, products have a product taxonomy, which represents a hierarchical classification of products. Considering the product taxonomy, the lower the level of the category to which two products both belong, the more similar the two products. However, there has been little work on similarity measures for sequences considering a hierarchical classification of elements. In this paper, we propose new similarity measures for purchase histories considering not only the purchase order of products but also the hierarchical classification of products. Unlike the existing methods, where the similarity between two elements in sequences is only 0 or 1 depending on whether two elements are the same or not, the proposed method can assign any real number between 0 and 1 considering the hierarchical classification of elements. We apply this idea to extend three existing representative similarity measures for sequences. We also propose an efficient computation method for the proposed similarity measures. Through various experiments, we show that the proposed method can measure the similarity between purchase histories very effectively and efficiently.

Keywords

Acknowledgement

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No.2018R1D1A1B07045643).

References

  1. M. Sforna, "Data mining in a power company customer database," Electric Power Systems Research, vol. 55, no. 3, pp. 201-209, 2000. https://doi.org/10.1016/S0378-7796(00)00086-9
  2. C. Rygielski, J. C. Wang, and D. C. Yen, "Data mining techniques for customer relationship management," Technology in Society, vol. 24, no. 4, pp. 483-502, 2002. https://doi.org/10.1016/S0160-791X(02)00038-6
  3. M. Kaur and S. Kang, "Market Basket Analysis: identify the changing trends of market data using association rule mining," Procedia Computer Science, vol. 85, pp. 78-85, 2016. https://doi.org/10.1016/j.procs.2016.05.180
  4. C. Yin, S. Ding, and J. Wang, "Mobile marketing recommendation method based on user location feedback," Human-centric Computing and Information Sciences, vol. 9, article no. 14, 2019.
  5. V. I. Levenshtein, "Binary codes capable of correcting deletions, insertions, and reversals," in Soviet Physics Doklady, vol. 10, no. 8, pp. 707-710, 1996.
  6. S. B. Needleman and C. D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins," Journal of Molecular Biology, vol. 48, no. 3, pp. 443-453, 1970. https://doi.org/10.1016/0022-2836(70)90057-4
  7. D. J. Berndt and J. Clifford, "Using dynamic time warping to find patterns in time series," in Knowledge Discovery in Databases: Papers from the 1994 AAAI Workshop, Seattle, Washington. Melon Park, CA: AAAI Press, 1994, pp. 359-370.
  8. S. Park, N. C. Suresh, and B. K. Jeong, "Sequence-based clustering for Web usage mining: a new experimental framework and ANN-enhanced K-means algorithm," Data & Knowledge Engineering, vol. 65, no. 3, pp. 512-543, 2008. https://doi.org/10.1016/j.datak.2008.01.002
  9. E. Zorita, P. Cusco, and G. J. Filion, "Starcode: sequence clustering based on all-pairs search," Bioinformatics, vol. 31, no. 12, pp. 1913-1919, 2015. https://doi.org/10.1093/bioinformatics/btv053
  10. M. A. Alqarni, S. H. Chauhdary, M. N. Malik, M. Ehatisham-ul-Haq, and M. A. Azam, "Identifying smartphone users based on how they interact with their phones," Human-centric Computing and Information Sciences, vol. 10, article no. 7, 2020.
  11. M. H. Pandi, O. Kashefi, and B. Minaei, "A novel similarity measure for sequence data," Journal of Information Processing Systems, vol. 7, no. 3, pp. 413-424, 2011. https://doi.org/10.3745/JIPS.2011.7.3.413
  12. X. Sun and J. Zhang, "miRNA pattern discovery from sequence alignment," Journal of Information Processing Systems, vol. 13, no. 6, pp. 1527-1543, 2017. https://doi.org/10.3745/JIPS.04.0051
  13. P. Senin, "Dynamic time warping algorithm review," Information and Computer Science Department, University of Hawaii at Manoa, Honolulu, HI, 2008.
  14. I. Boulnemour and B. Boucheham, "QP-DTW: upgrading dynamic time warping to handle quasi periodic time series alignment," Journal of Information Processing Systems, vol. 14, no. 4, pp. 851-876, 2018 https://doi.org/10.3745/JIPS.02.0090
  15. F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction. New York, NY: Springer Science & Business Media, 1985.
  16. M. A. Bender, M. Farach-Colton, G. Pemmasani, S. Skiena, and P. Sumazin, "Lowest common ancestors in trees and directed acyclic graphs," Journal of Algorithms, vol. 57, no. 2, pp. 75-94, 2005. https://doi.org/10.1016/j.jalgor.2005.08.001
  17. C. F. Su, "High-speed packet classification using segment tree," in Proceedings of IEEE Global Telecommunications Conference (Cat. No. 00CH37137), San Francisco, CA, 2000, pp. 582-586.