DOI QR코드

DOI QR Code

Enhancing Search Functionality for Website Posts and Product Reviews: Improving BM25 Ranking Algorithm Performance Using the ResNet-Transformer Model

  • Hong-Ju Yang (Dept. of Computer Science, Kangnam University) ;
  • In-Yeop Choi (Dept. of Computer Science, Kangnam University)
  • 투고 : 2024.10.10
  • 심사 : 2024.11.11
  • 발행 : 2024.11.29

초록

이 논문은 BM25 랭킹 알고리즘에 ResNet-Transformer 모델을 사용하여, 웹사이트의 게시글과 상품평 리뷰에 대한 검색 기능을 개선하는 방법을 제안한다. BM25는 사용자 질의와 문서 간의 관련성을 평가하여 순위화(ranking)하는 알고리즘으로 텍스트 기반 검색에서 광범위하게 사용되고 있다. 하지만 단어의 국소적인 특징 추출과 문장의 맥락을 파악하지 못하는 단점이 있다. 이에 본 논문에서는 국소적인 특징을 잘 추출하는 ResNet 모델과 문맥을 잘 파악하는 트랜스포머 모델을 결합한 분류 방법을 BM25의 가중치로 적용하여, 검색 기능을 향상시켰다. 테스트 결과 본 논문에서 제시하는 방법이 BM25 대비 nDCG 평가지표는 9.38%, aP@5 평가지표는 11.82% 향상됨을 확인하였다. 이를 통해 논문에서 제시한 방법을 여러 웹사이트의 검색창에 적용하면, 게시글과 상품평 리뷰 검색시에 정확한 결과를 제공해 줄 것으로 기대된다.

This paper proposes a method to improve the search functionality for website posts and product reviews by using a ResNet-Transformer model in conjunction with the BM25 ranking algorithm. BM25 is a widely used algorithm in text-based search that ranks documents by evaluating their relevance to user queries. However, it has limitations in capturing local features of words and understanding the context of a sentences. To address these issues, this study applies a classification approach that combines the ResNet model, which excels at extracting local features, with the Transformer model, known for its strong contextual understanding, as weights for BM25. Experimental results demonstrate that the proposed method improves the nDCG metric by 9.38% and the aP@5 metric by 11.82% compared to BM25 alone. This suggests that implementing this method in search engines across various websites can provide more accurate results for post and review searches.

키워드

참고문헌

  1. Y. Kim, "Convolution Neural Networks for Sentence Classification," Computer Science and Computation Language, Sep 2014. DOI: https://doi.org/10.48550/arXiv.1408.5882
  2. A. Conneau, H. Schwenk, Y. L. Cun and L. Barrault, "Very Deep Convolutional Networks for Text Classification," Computer Science and Computation Language, Jan 2017. DOI: https://doi.org/10.48550/arXiv.1606.01781
  3. H. Han, X. Bai and J. Liu, "Attention-based ResNet for Chinese Text Sentiment Classification," Advances in Computer Science Research, Vol. 80, Feb 2018. DOI: 10.2991/csece-18.2018.108
  4. K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," The Computer Vision and Pattern Recognition, Dec 2015. https://doi.org/10.48550/arXiv.1512.03385
  5. J. Wang, L. C, Yu, K. R. Lai and X. Zhang, "Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model," 2016 Association for Computational Linguistics, pp. 225-230, Berlin, Germany, Aug 2016.
  6. H. Y. Park and K. J. Kim, "Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode," Journal of Intelligence and information system, Vol. 25, pp. 141-154, Dec 2019.
  7. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin, "Attention Is All You Need," 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, Jun 2017. https://doi.org/10.48550/arXiv.1706.03762
  8. C. E. Benarab and S. Gui, "CNN-Trans-Enc: A CNN-Enhanced Transformer-Encoder On Top Of Static BERT representations for Document Classification," Journal of Korea Design Knowledge, Vol. 33, pp. 401-409, March 2015.
  9. AG News, https://paperswithcode.com/dataset/ag-news
  10. Kangnam University Homepage, https://web.kangnam.ac.kr
  11. Nori Korean morphological analyzer, https://esbook.kimjmin.net/06-text-analysis/6.7-stemming/6.7.2-nori
  12. S. A. Saqqa and A. Awajan, "The Use of Word2vec Model in Sentiment Analysis: A Survey," Association for Computing Machinery, Cairo, Egypt, Dec 2019. DOI:https://doi.org/10.1145/3388218.3388229
  13. Jsoup, https://jsoup.org/
  14. Spring Boot, https://spring.io/projects/spring-boot
  15. Thymeleaf, https://www.thymeleaf.org/
  16. Website on the front-end, https://www.knusearch.site/search
  17. User's Guide of Flask, https://flask.palletsprojects.com/
  18. Representational State Transfer (REST) architecture , https://ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm
  19. REST API Tutorial, https://restfulapi.net/
  20. A website that evaluates search results, https://knusearch.site/research
  21. K. Jarvelin and J. Kekalainen, "Cumulated Gain-based Evaluation of IR Techniques," ACM Transactions on Information System, Vol. 20, No. 4, pp. 422-446, Oct 2002. https://doi.org/10.1145/582415.582418