DOI QR코드

DOI QR Code

AraProdMatch: A Machine Learning Approach for Product Matching in E-Commerce

  • Alabdullatif, Aisha (Department of Management Information Systems, College of Business Administration King Saud University) ;
  • Aloud, Monira (Department of Management Information Systems, College of Business Administration King Saud University)
  • 투고 : 2021.04.05
  • 발행 : 2021.04.30

초록

Recently, the growth of e-commerce in Saudi Arabia has been exponential, bringing new remarkable challenges. A naive approach for product matching and categorization is needed to help consumers choose the right store to purchase a product. This paper presents a machine learning approach for product matching that combines deep learning techniques with standard artificial neural networks (ANNs). Existing methods focused on product matching, whereas our model compares products based on unstructured descriptions. We evaluated our electronics dataset model from three business-to-consumer (B2C) online stores by putting the match products collectively in one dataset. The performance evaluation based on k-mean classifier prediction from three real-world online stores demonstrates that the proposed algorithm outperforms the benchmarked approach by 80% on average F1-measure.

키워드

과제정보

The authors thank the Deanship of Scientific Research and RSSU at King Saud University for their technical support.

참고문헌

  1. CITC, "ICT Report - ECommerce in Saudi Arabia," 2017.
  2. P. Ristoski, P. Petrovski, P. Mika, and H. Paulheim, "A Machine Learning Approach for Product Matching and Categorization Use case: Enriching Product Ads with Semantic Structured Data," Semant. Web-Interoperability, Usability, Appl. an IOS Press, vol. 0, 2016.
  3. L. Akritidis, A. Fevgas, P. Bozanis, and C. Makris. A self-verifying clustering approach to unsupervised matching of product titles. Artificial Intelligence Review. 2020;53(7): pp. 4777-4820. doi:10.1007/s10462-020-09807-8
  4. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios Knowl Data Eng 19(1): pp. 1-16
  5. W. H. Gomaaand A. A. Fahmy (2013) A survey of text similarity approaches. Int J Comput Appl 68(13):13-18 https://doi.org/10.5120/11638-7118
  6. S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani (2003) Robust and efcient fuzzy match for online data cleaning. In: Proceedings of the 2003 ACM international conference on management of data (SIGMOD), pp 313-324
  7. J. Wang, G. Li, and J. Fe (2011) Fast-join: An efcient method for fuzzy token matching based string similarity join. In: Proceedings of the 27th IEEE international conference on data engineering (ICDE), pp 458-469
  8. M. de Bakker, D. Vandic, F. Frasincar, and U. Kaymak, "Model words-driven approaches for duplicate detection on the web," Proc. 28th Annu. ACM Symp. Appl. Comput. - SAC '13, p. 717, 2013.
  9. R. Van Bezu, J. Verhagen, and R. Rijkse, "Multicomponent Similarity Method for Web Product Duplicate Detection," pp. 761-768, 2015.
  10. A. Horch, H. Kett, and A. Weisbecker, "Matching Product Offers of E-Shops," Springer, Cham, 2016, pp. 248-259.
  11. A. Biswas, M. Bhutani, and S. Sanyal, "MRNet-Product2Vec: A Multi-task Recurrent Neural Network for Product Embeddings," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10536 LNAI, pp. 153-165, 2017.
  12. Scrapy developers, "Architecture overview-Scrapy 1.5.1 documentation," 2017-01-10. [Online]. Available: https://doc.scrapy.org/en/latest/topics/architecture.html. [Accessed: 31-March-2021].
  13. J. Wang and Y. Guo, "Scrapy-Based Crawling and User-Behavior Characteristics Analysis on Taobao," in 2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, 2012, pp. 44-52.
  14. w3schools, "XML and XPath." [Online]. Available: https://www.w3schools.com/xml/xml_xpath.asp. [Accessed: 30-March-2021].
  15. H. Lee and Y. Yoon, "Engineering doc2vec for automatic classification of product descriptions on O2O applications," Electron. Commer. Res., vol. 18, no. 3, pp. 433-456, 2018. https://doi.org/10.1007/s10660-017-9268-5
  16. A. B. Soliman, K. Eissa, and S. R. El-Beltagy, "AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP," Procedia Comput. Sci., vol. 117, pp. 256-265, 2017. https://doi.org/10.1016/j.procs.2017.10.117
  17. J. Han and M. Kamber, Data mining : concepts and techniques. Elsevier, 2006.
  18. S. Park and W. Kim, "Ontology Mapping Between Heterogeneous Product Taxonomies in an Electronic Commerce Environment," Int. J. Electron. Commer., vol. 12, no. 2, pp. 69-87, 2007. https://doi.org/10.2753/JEC1086-4415120203
  19. Y. S.-T. T. mater and undefined 2007, "The truth of the F-measure," cs.odu.edu.
  20. D. Hand and P. Christen, "A note on using the F-measure for evaluating record linkage algorithms," Stat. Comput., vol. 28, no. 3, pp. 539-547, May 2018. https://doi.org/10.1007/s11222-017-9746-6