DOI QR코드

DOI QR Code

Empirical Study on Analyzing Training Data for CNN-based Product Classification Deep Learning Model

CNN기반 상품분류 딥러닝모델을 위한 학습데이터 영향 실증 분석

  • Lee, Nakyong (Division of Computer Science, Sookmyung Women's University) ;
  • Kim, Jooyeon (Division of Computer Science, Sookmyung Women's University) ;
  • Shim, Junho (Division of Computer Science, Sookmyung Women's University)
  • Received : 2021.01.06
  • Accepted : 2021.02.01
  • Published : 2021.02.28

Abstract

In e-commerce, rapid and accurate automatic product classification according to product information is important. Recent developments in deep learning technology have been actively applied to automatic product classification. In order to develop a deep learning model with good performance, the quality of training data and data preprocessing suitable for the model are crucial. In this study, when categories are inferred based on text product data using a deep learning model, both effects of the data preprocessing and of the selection of training data are extensively compared and analyzed. We employ our CNN model as an example of deep learning model. In the experimental analysis, we use a real e-commerce data to ensure the verification of the study results. The empirical analysis and results shown in this study may be meaningful as a reference study for improving performance when developing a deep learning product classification model.

전자상거래에서 상품 정보에 따른 신속하고 정확한 자동 상품 분류는 중요하다. 최근의 딥러닝 기술 발전은 자동 상품 분류에도 적용이 시도되고 있다. 성능이 우수한 딥러닝 모델개발에 있어, 학습 데이터의 품질과 모델에 적합한 데이터 전처리는 중요하다. 본 연구에서는, 텍스트 상품 데이터를 기반으로 카테고리를 자동 유추할 때, 데이터의 전처리 정도에 따른 영향력과 학습 데이터 선택 범위 영향력을 CNN모델을 사례 모델로 이용하여 비교 분석한다. 실험 분석에 사용한 데이터는 실제 데이터를 사용하여 연구 결과의 실증을 담보하였다. 본 연구가 도출한 실증 분석 및 결과는 딥러닝 상품 분류 모델 개발 시 성능 향상을 위한 레퍼런스로서 의의가 있다.

Keywords

References

  1. Aanen, S. S., Vandic, D., and Frasincar, F., "Automated product taxonomy mapping in an e-commerce environment," Expert Systems with Applications, Vol. 42, No. 3, pp. 1298-1313, 2015. https://doi.org/10.1016/j.eswa.2014.09.032
  2. Abels, S. and Hahn, A., "Automatic Classification and Re-Classification of Product Data in e-Business," 2005 Symposium on Applications and the Internet Workshops (SAINT 2005 Workshops), pp. 350-353, 2005.
  3. Cortez, E., Rojas Herrera, M., da Silva, A. S., De Moura, E. S., and Neubert, M., "Lightweight Methods for Large-Scale Product Categorization," Journal of the American Society for Information Science & Technology, Vol. 62, No. 9, pp. 1839-1848, 2011. https://doi.org/10.1002/asi.21586
  4. Dalal, M. K. and Zaveri, M. A., "Automatic Text Classification: a Technical Review," International Journal of Computer Applications, Vol. 28, No. 2, pp. 37-40, 2011. https://doi.org/10.5120/3358-4633
  5. Goumy, S. and Mejri, M.-A., "Ecommerce Product Title Classification," In SIGIR 2018 Workshop on eCommerce, 2018.
  6. Ha, J. W., Pyo, H., and Kim, J., "Large-scale item categorization in e-commerce using multiple recurrent neural networks," In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 107-115, 2016.
  7. Khoo, A., Marom, Y., and Albrecht, D., "Experiments with sentence classification," Proceedings of the Australasian Language Technology Workshop 2006, pp. 18-25, 2006.
  8. Kil, H.-H., "The Study of Korean Stopwords list for Text mining," URIMALGEUL : The Korean Language and Literature, Vol. 78, pp. 1-25, 2018. https://doi.org/10.18628/URIMAL.78..201809.1
  9. Kim, Y., "Convolutional neural networks for sentence classification," Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746-1751, 2014.
  10. Kozareva, Z., "Everyone likes shopping! multi-class product categorization for ecommerce," Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1329-1333, 2015,
  11. Krishnan, A. and Amarthaluri, A., "Large Scale Product Categorization using Structured and Unstructured Attributes," KDD '19. Mar 2019.
  12. LeCun, Y., Bengio, Y., and Hinton, G., "Deep Learning," Nature, Vol. 521, pp. 436-444, 2015. https://doi.org/10.1038/nature14539
  13. Lee, D. J., Hwang, I. B., and Lee, S.-G., "Efficient Management of Statistical Information of Keywords on E-Catalogs," The Jounal of Society for e-Business Studies, Vol. 14, No. 4, pp. 1-17, 2009.
  14. Lee, T., Lee, I.-H., Lee, S. K., Lee, S.-G., Kim, D. K., Chun, J. H., Lee, H., and Shim, J. H., "Building an Operational Product Ontology System," Electronic Commerce Research and Applications, Vol. 5, No. 1, pp. 16-28, 2006. https://doi.org/10.1016/j.elerap.2005.08.005
  15. Lin, Y.-C., Das, P., Trotman, A., and Kallumadi, S., "A Dataset and Baselines for e-Commerce Product Categorization," 2019 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 213-216, 2019.
  16. Shen, D., Ruvini, J.-D., and Sarwar, B., "Large-scale item categorization for e-commerce," Proceedings of the 21st ACM international conference on Information and knowledge management (CIKM '12), pp. 595-604, 2012.
  17. Shen, D., Ruvini, J.-D., Mukherjee, R., and Sundaresan, N., "A study of smoothing algorithms for item categorization on e-commerce sites," Neurocomputing, Vol. 92, pp. 54-60, 2012. https://doi.org/10.1016/j.neucom.2011.08.035
  18. Skinner, M., "Product Categorization with LSTMs and Balanced Pooling Views," In SIGIR 2018 Workshop on eCommerce, 2018.
  19. Suzuki, S., Iseki, Y., Shiino, H., Zhang, H., Iwamoto, A., and Takahashi, F., "Convolutional Neural Network and Bidirectional LSTM Based Taxonomy Classification Using External Dataset at SIGIR eCom Data Challenge," In SIGIR 2018 Workshop on eCommerce, 2018.
  20. Xia, Y., Levine, A., Das, P., Di Fabbrizio, G., Shinzato, K., and Datta, A., "Large-Scale Categorization of Japanese Product Titles Using Neural Attention Models," In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Vol. 2, pp. 663-668, 2017.
  21. Yeon, J. H., Lee, D. J., Shim, J. H., and Lee, S.-G., "Product Review Data and Sentiment Analytical Processing Modeling," The Jounal of Society for e-Business Studies, Vol. 16, No. 4, pp. 125-137, 2011. https://doi.org/10.7838/jsebs.2011.16.4.125
  22. Yu, W., Sun, Z., Liu, H., Li, Z., and Zheng, Z., "Multi-level Deep Learning based E-commerce Product Categorization," In SIGIR 2018 Workshop on eCommerce, 2018.
  23. Zahavy, T., Magnani, A., Krishnan, A., and Mannor, S., "Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce," CoRR, Vol. abs/1611.09534, 2016.