DOI QR코드

DOI QR Code

A Methodology for Extracting Shopping-Related Keywords by Analyzing Internet Navigation Patterns

인터넷 검색기록 분석을 통한 쇼핑의도 포함 키워드 자동 추출 기법

  • Kim, Mingyu (Graduate School of Business IT, Kookmin University) ;
  • Kim, Namgyu (Graduate School of Business IT, Kookmin University) ;
  • Jung, Inhwan (Department of Computer Engineering, Hansung University)
  • 김민규 (국민대학교 비즈니스IT전문대학원) ;
  • 김남규 (국민대학교 비즈니스IT전문대학원) ;
  • 정인환 (한성대학교 공과대학 컴퓨터공학과)
  • Received : 2014.06.15
  • Accepted : 2014.06.22
  • Published : 2014.06.30

Abstract

Recently, online shopping has further developed as the use of the Internet and a variety of smart mobile devices becomes more prevalent. The increase in the scale of such shopping has led to the creation of many Internet shopping malls. Consequently, there is a tendency for increasingly fierce competition among online retailers, and as a result, many Internet shopping malls are making significant attempts to attract online users to their sites. One such attempt is keyword marketing, whereby a retail site pays a fee to expose its link to potential customers when they insert a specific keyword on an Internet portal site. The price related to each keyword is generally estimated by the keyword's frequency of appearance. However, it is widely accepted that the price of keywords cannot be based solely on their frequency because many keywords may appear frequently but have little relationship to shopping. This implies that it is unreasonable for an online shopping mall to spend a great deal on some keywords simply because people frequently use them. Therefore, from the perspective of shopping malls, a specialized process is required to extract meaningful keywords. Further, the demand for automating this extraction process is increasing because of the drive to improve online sales performance. In this study, we propose a methodology that can automatically extract only shopping-related keywords from the entire set of search keywords used on portal sites. We define a shopping-related keyword as a keyword that is used directly before shopping behaviors. In other words, only search keywords that direct the search results page to shopping-related pages are extracted from among the entire set of search keywords. A comparison is then made between the extracted keywords' rankings and the rankings of the entire set of search keywords. Two types of data are used in our study's experiment: web browsing history from July 1, 2012 to June 30, 2013, and site information. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The original sample dataset contains 150 million transaction logs. First, portal sites are selected, and search keywords in those sites are extracted. Search keywords can be easily extracted by simple parsing. The extracted keywords are ranked according to their frequency. The experiment uses approximately 3.9 million search results from Korea's largest search portal site. As a result, a total of 344,822 search keywords were extracted. Next, by using web browsing history and site information, the shopping-related keywords were taken from the entire set of search keywords. As a result, we obtained 4,709 shopping-related keywords. For performance evaluation, we compared the hit ratios of all the search keywords with the shopping-related keywords. To achieve this, we extracted 80,298 search keywords from several Internet shopping malls and then chose the top 1,000 keywords as a set of true shopping keywords. We measured precision, recall, and F-scores of the entire amount of keywords and the shopping-related keywords. The F-Score was formulated by calculating the harmonic mean of precision and recall. The precision, recall, and F-score of shopping-related keywords derived by the proposed methodology were revealed to be higher than those of the entire number of keywords. This study proposes a scheme that is able to obtain shopping-related keywords in a relatively simple manner. We could easily extract shopping-related keywords simply by examining transactions whose next visit is a shopping mall. The resultant shopping-related keyword set is expected to be a useful asset for many shopping malls that participate in keyword marketing. Moreover, the proposed methodology can be easily applied to the construction of special area-related keywords as well as shopping-related ones.

최근 온라인 및 다양한 스마트 기기의 사용이 확산됨에 따라 온라인을 통한 쇼핑구매가 더욱 활성화 되었다. 때문에 인터넷 쇼핑몰들은 쇼핑에 관심이 있는 잠재 고객들에게 한 번이라도 더 자사의 링크를 노출시키기 위해 키워드에 비용을 지불할 용의가 있으며, 이러한 추세는 검색 광고 시장의 광고비를 증가시키는 원인을 제공하였다. 이 때 키워드의 가치는 대체로 검색어의 빈도수에 기반을 두어 산정된다. 하지만 포털 사이트에서 검색어로 자주 입력되는 모든 단어가 쇼핑과 관련이 있는 것은 아니며, 이들 키워드 중에는 빈도수는 높지만 쇼핑몰 관점에서는 별로 수익과 관련이 없는 키워드도 다수 존재한다. 그렇기 때문에 특정 키워드가 사용자들에게 많이 노출된다고 해서, 이를 통해 구매가 이루어질 것을 기대하여 해당 키워드에 많은 광고비를 지급하는 것은 매우 비효율적인 방식이다. 따라서 포털 사이트의 빈발 검색어 중 쇼핑몰 관점에서 중요한 키워드를 추출하는 작업이 별도로 요구되며, 이 과정을 빠르고 효과적으로 수행하기 위한 자동화 방법론에 대한 수요가 증가하고 있다. 본 연구에서는 이러한 수요에 부응하기 위해 포털 사이트에 입력된 키워드 중 쇼핑의도를 포함하고 있을 가능성이 높을 것으로 추정되는 키워드만을 자동으로 추출하는 방안을 제시하고, 구체적으로는 전체 검색어 중 검색결과 페이지에서 쇼핑과 관련 된 페이지로 이동한 검색어만을 추출하여 순위를 집계하고, 이 순위를 전체 검색 키워드의 순위와 비교하였다. 국내 최대의 검색 포털인 'N'사에서 이루어진 검색 약 390만 건에 대한 실험결과, 제안 방법론에 의해 추천된 쇼핑의도 포함 키워드가 단순 빈도수 기반의 키워드에 비해 정확도, 재현율, F-Score의 모든 측면에서 상대적으로 우수한 성능을 보이는 것으로 나타남을 확인할 수 있었다.

Keywords

References

  1. Agarwal, A., K. Hosanagar, and M. D. Smith, "Location, Location, Location: An Analysis of Profitability of Position in Online Advertising Markets," Journal of Marketing Research, Vol. 48(2008), 1057-1073.
  2. Buhalis, D., "Strategic Use of Information Technologies in the Tourism Industry," Tourism Management, Vol.19, No.5(1998), 409-421. https://doi.org/10.1016/S0261-5177(98)00038-7
  3. Choi, Y. S., "Researches of Keyword Advertisement of Domestic Portal Websites" Myongji University, 2005.
  4. Fain, D. C. and J. O. Pedersen, "Sponsored Search : A Brief History," Bulletin of the American Society for Information Science and Technology, Vol.32, No.2(2006), 12-13.
  5. Jeong, D. Y., "The Optimal Positioning Strategy for Auction-Based CPC Advertising," Korea Internet e-Commerce Association, Vol.6, No.2(2006), 81-101.
  6. Johnson, G. J., G. C. Bruner, and A. Kumar, "Interactivity and Its Facets Revised," Journal of Advertising, Vol.35, No.4(2006), 35-52.
  7. Kim, D. Y., G. G. Lim and D. C. Lee, "A Study on the Efficiency of Internet Keyword Advertisement According to CPM and CPC Methods by Analyzing Transactional Data," Journal of the Society for e-Business Studies, Vol.16, No.4(2011), 139-152. https://doi.org/10.7838/jsebs.2011.16.4.139
  8. Lee, D. Y., H. G. Kim, "Developing the Purchase Conversion Model of the Keyword Advertising Based on the Individual Search," Korean Academic Society of Business Administration, Vol.38, No.1(2013), 123-138. https://doi.org/10.7737/JKORMS.2013.38.1.123
  9. Lee, S. J., S. W. Lee, "A Related Keyword Group Extraction Method for Keyword Marketing," The Korean Institute of Information Scientists and Engineers, Vol.31, No.2(2004), 124-126.
  10. Lim, S. G., Shoppingmall marketing book, Hanbit Media, Seoul, 2007.
  11. Oh, C. W., "Study of the characteristics of Internet keyword advertising's rate system and it's unfair click types," The Korean Journal of Advertising, Vol. 19, No. 4(2008), 7-27.
  12. Rutz, O. and R. E. Bucklin, "From Generic to Branded: A Model of Spillover Dynamics in Paid Search Advertising," Journal of Marketing Research, Vol.48, No.1(2011), 87-102. https://doi.org/10.1509/jmkr.48.1.87
  13. Statistics Korea, "E-commerce and Cyber Shopping," Statistics Korea, 2014.
  14. Youm, D. H., "The Influence of Mobile Banner Characteristics on Advertisement Selection," Dankook University, 2012.

Cited by

  1. Latent Keyphrase Extraction Using LDA Model vol.25, pp.2, 2015, https://doi.org/10.5391/JKIIS.2015.25.2.180
  2. User Perspective Website Clustering for Site Portfolio Construction vol.16, pp.3, 2015, https://doi.org/10.7472/jksii.2015.16.3.59
  3. 빅데이터 기반 정보 추천 시스템 vol.22, pp.3, 2018, https://doi.org/10.6109/jkiice.2018.22.3.443
  4. SNS를 이용한 잠재적 광고 키워드 추출 시스템 설계 및 구현 vol.9, pp.7, 2018, https://doi.org/10.15207/jkcs.2018.9.7.017
  5. 키워드검색광고 포트폴리오 구성을 위한 통계적 최적화 모델에 대한 실증분석 vol.25, pp.2, 2014, https://doi.org/10.13088/jiis.2019.25.2.167
  6. A Hybrid Collaborative Filtering-based Product Recommender System using Search Keywords vol.26, pp.1, 2014, https://doi.org/10.13088/jiis.2020.26.1.151