Acknowledgement
The authors thank the Deanship of Scientific Research and RSSU at King Saud University for their technical support.
References
- CITC, "ICT Report - ECommerce in Saudi Arabia," 2017.
- P. Ristoski, P. Petrovski, P. Mika, and H. Paulheim, "A Machine Learning Approach for Product Matching and Categorization Use case: Enriching Product Ads with Semantic Structured Data," Semant. Web-Interoperability, Usability, Appl. an IOS Press, vol. 0, 2016.
- L. Akritidis, A. Fevgas, P. Bozanis, and C. Makris. A self-verifying clustering approach to unsupervised matching of product titles. Artificial Intelligence Review. 2020;53(7): pp. 4777-4820. doi:10.1007/s10462-020-09807-8
- A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios Knowl Data Eng 19(1): pp. 1-16
- W. H. Gomaaand A. A. Fahmy (2013) A survey of text similarity approaches. Int J Comput Appl 68(13):13-18 https://doi.org/10.5120/11638-7118
- S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani (2003) Robust and efcient fuzzy match for online data cleaning. In: Proceedings of the 2003 ACM international conference on management of data (SIGMOD), pp 313-324
- J. Wang, G. Li, and J. Fe (2011) Fast-join: An efcient method for fuzzy token matching based string similarity join. In: Proceedings of the 27th IEEE international conference on data engineering (ICDE), pp 458-469
- M. de Bakker, D. Vandic, F. Frasincar, and U. Kaymak, "Model words-driven approaches for duplicate detection on the web," Proc. 28th Annu. ACM Symp. Appl. Comput. - SAC '13, p. 717, 2013.
- R. Van Bezu, J. Verhagen, and R. Rijkse, "Multicomponent Similarity Method for Web Product Duplicate Detection," pp. 761-768, 2015.
- A. Horch, H. Kett, and A. Weisbecker, "Matching Product Offers of E-Shops," Springer, Cham, 2016, pp. 248-259.
- A. Biswas, M. Bhutani, and S. Sanyal, "MRNet-Product2Vec: A Multi-task Recurrent Neural Network for Product Embeddings," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10536 LNAI, pp. 153-165, 2017.
- Scrapy developers, "Architecture overview-Scrapy 1.5.1 documentation," 2017-01-10. [Online]. Available: https://doc.scrapy.org/en/latest/topics/architecture.html. [Accessed: 31-March-2021].
- J. Wang and Y. Guo, "Scrapy-Based Crawling and User-Behavior Characteristics Analysis on Taobao," in 2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, 2012, pp. 44-52.
- w3schools, "XML and XPath." [Online]. Available: https://www.w3schools.com/xml/xml_xpath.asp. [Accessed: 30-March-2021].
- H. Lee and Y. Yoon, "Engineering doc2vec for automatic classification of product descriptions on O2O applications," Electron. Commer. Res., vol. 18, no. 3, pp. 433-456, 2018. https://doi.org/10.1007/s10660-017-9268-5
- A. B. Soliman, K. Eissa, and S. R. El-Beltagy, "AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP," Procedia Comput. Sci., vol. 117, pp. 256-265, 2017. https://doi.org/10.1016/j.procs.2017.10.117
- J. Han and M. Kamber, Data mining : concepts and techniques. Elsevier, 2006.
- S. Park and W. Kim, "Ontology Mapping Between Heterogeneous Product Taxonomies in an Electronic Commerce Environment," Int. J. Electron. Commer., vol. 12, no. 2, pp. 69-87, 2007. https://doi.org/10.2753/JEC1086-4415120203
- Y. S.-T. T. mater and undefined 2007, "The truth of the F-measure," cs.odu.edu.
- D. Hand and P. Christen, "A note on using the F-measure for evaluating record linkage algorithms," Stat. Comput., vol. 28, no. 3, pp. 539-547, May 2018. https://doi.org/10.1007/s11222-017-9746-6