DOI QR코드

DOI QR Code

검색 재순위화를 위한 가중치 반영 딥러닝 학습 모델

Search Re-ranking Through Weighted Deep Learning Model

  • 안기택 (전북대학교 컴퓨터공학전공 ) ;
  • 최우석 (전북대학교 컴퓨터인공지능학부) ;
  • 박준용 (전북대학교 컴퓨터인공지능학부) ;
  • 박정민 (한국식품연구원 ) ;
  • 이경순 (전북대학교 컴퓨터인공지능학부)
  • 투고 : 2024.03.12
  • 심사 : 2024.04.11
  • 발행 : 2024.05.31

초록

정보검색에서 질의는 다양한 유형이 존재한다. 추상적인 질의부터 구체적인 키워드를 포함하는 질의까지 다양한 형태로 구성되어 있어서 사용자의 요구에 정확한 결과 도출은 어려운 과제이다. 또한 검색시스템이 오타, 다국어, 코드와 같은 다양한 요소를 포함하는 질의를 다뤄야 하는 특징이 존재한다. 본 연구에서는 질의 유형을 분석하고, 이에 따라 딥러닝 기반 재순위화의 적용 여부를 결정하는 방법을 제안한다. 최근 연구에서 높은 성능을 보인 딥러닝 모델인 DeBERTa를 이용하여 질의에 대한 적합 문서의 학습을 통해 재순위화를 수행한다. 제안 방법의 유효성을 평가하기 위해 국제정보검색 평가대회인 TREC 2023의 상품 검색 트랙(Product Search Track) 테스트컬렉션을 이용하여 실험을 하였다. 실험 결과에 대한 정규화된 할인누적이득(NDCG) 성능측정 비교에서 제안 방법이 정보검색 기본 모델인 BM25 에 비해 질의 오류 처리를 통한 검색, 잠정적 적합성피드백을 통한 상품제목 기반 질의확장과 질의유형에 따른 재순위화에서 0.7810으로 BM25 대비 10.48% 향상을 보였다.

In information retrieval, queries come in various types, ranging from abstract queries to those containing specific keywords, making it a challenging task to accurately produce results according to user demands. Additionally, search systems must handle queries encompassing various elements such as typos, multilingualism, and codes. Reranking is performed through training suitable documents for queries using DeBERTa, a deep learning model that has shown high performance in recent research. To evaluate the effectiveness of the proposed method, experiments were conducted using the test collection of the Product Search Track at the TREC 2023 international information retrieval evaluation competition. In the comparison of NDCG performance measurements regarding the experimental results, the proposed method showed a 10.48% improvement over BM25, a basic information retrieval model, in terms of search through query error handling, provisional relevance feedback-based product title-based query expansion, and reranking according to query types, achieving a score of 0.7810.

키워드

과제정보

이 논문은 2024년도 과학기술정보통신부 재원으로 한국식품연구원의 지원(기본사업 E0220700)을 받아 수행된 연구성과입니다. 본 연구는 2024년 과학기술정보통신부 및 정보통신기획평가원의 SW중심 대학지원사업의 지원을 받아 수행되었음(2022-0-01067).

참고문헌

  1. N. Asadi and J. Lin, "Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures," In Proceedings of the 36th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp.997-1000, 2013. 
  2. R. C. Chen, L. Gallagher, R. Blanco, and J. S. Culpepper, "Efficient cost-aware cascade ranking in multi-stage retrieval. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.445-454, 2017. 
  3. L. Gao, Z. Dai, and J. Callan, "Rethink training of BERT rerankers in multi-stage retrieval pipeline," In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28-April 1, 2021, Proceedings, Part II 43 (pp.280-286). Springer International Publishing, 2021. 
  4. Y. Nie, S. Wang, and M. Bansal, "Revealing the importance of semantic retrieval for machine reading at scale," arXiv preprint arXiv:1909.08041, 2019. 
  5. A. Aizawa, "An information-theoretic perspective of tf-idf measures," Information Processing & Management, Vol.39, No.1, pp.45-65, 2003. 
  6. Y. Liu et al., "Roberta: A robustly optimized bert pretraining approach," arXiv preprint arXiv:1907.11692, 2019. 
  7. V. Karpukhin et al., "Dense passage retrieval for open-domain question answering," arXiv preprint arXiv:2004.04906, 2020. 
  8. X. Ma, J. Guo, R. Zhang, Y. Fan, Y. Li, and X. Cheng, "B-PROP: bootstrapped pre-training with representative words prediction for ad-hoc retrieval," In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.1513-1522, 2021. 
  9. J. Zhan, J. Mao, Y. Liu, J. Guo, M. Zhang, and S. Ma, "Optimizing dense retrieval model training with hard negatives," In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.1503-1512, 2021. 
  10. J., Zhan, J., Mao, Y., Liu, M., Zhang, and S. Ma, "Repbert: Contextualized text embeddings for first-stage retrieval," arXiv preprint arXiv:2006.15498, 2020. 
  11. P. S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck, "Learning deep structured semantic models for web search using clickthrough data," In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp.2333-2338, 2013. 
  12. H. Shan, Q. Zhang, Z. Liu, G. Zhang, and C. Li, "Beyond two-tower: Attribute guided representation learning for candidate retrieval," In Proceedings of the ACM Web Conference 2023, pp.3173-3181, 2023. 
  13. Q., Zhang et al., "A semantic alignment system for multilingual query-product retrieval," arXiv preprint arXiv: 2208.02958, 2022. 
  14. X. Qin, N. Liang, H. Zhang, W. Zou, and W. Zhang, "Second place solution of Amazon KDD Cup 2022: ESCI Challenge for Improving Product Search," 2022. 
  15. J. Lin, L. Xue, Z. Ying, C. Meng, W. Wang, H. Wang, and X. Wu, "A Winning Solution of KDD CUP 2022 ESCI Challenge for Improving Product Search," 2022. 
  16. J. Park, W. Choi, G. An, and K. Lee, "Deep learning based reranking model for product search," Digital Contents Society, pp.131-132 2023. 
  17. G. An, W. Choi, J. Park, and K. Lee, "JBNU at TREC 2023 Product Search Track," The Thirty-Second Text REtrieval Conference (TREC 2023), 2023. 
  18. V. Ashish, "Attention is all you need," arXiv preprint arXiv: 1706.03762, 2017. 
  19. T. Barrus, "pyspellchecker," 2024, accessed: 20.02.2024. [Internet], https://pypi.org/project/pyspellchecker/ 
  20. P. He, X. Liu, J. Gao, and W. Chen, "Deberta: Decoding-enhanced bert with disentangled attention," arXiv preprint arXiv:2006.03654, 2020.