DOI QR코드

DOI QR Code

Survey on Vector Similarity Measures : Focusing on Algebraic Characteristics

대수적 특성을 고려한 벡터 유사도 측정 함수의 고찰

  • Lee, Dongjoo (DMC R&D Center, Samsung Electronics Co.) ;
  • Shim, Junho (Division of Computer Science, Sookmyung Women's University)
  • 이동주 (삼성전자 DMC연구소) ;
  • 심준호 (숙명여자대학교 컴퓨터과학부)
  • Received : 2012.10.31
  • Accepted : 2012.11.20
  • Published : 2012.11.30

Abstract

Objects such as products, product reviews, and user profiles are important in e-commerce domain. Vector is one of the most widely used object representation scheme. Information of e-commerce objects may be modeled by vectors in which the featured values are assigned to various dimensions. E-commerce objects are in general quantitatively large while some are similar or even same in reality. It Plays, therefore, an important role to measure the similarity between objects. In this paper, we survey the state-of-the -art vector similarity measures. Similarity measures are analyzed to feature the algebraic characteristics and relationship of those, and upon which we classify the related measures accordingly. We then present such features that standard vector similarity measures should convey.

전자 상거래 시스템 환경에서 상품, 상품평, 사용자 특성 등은 주요한 정보 객체이다. 벡터는 객체의 표현기법으로 널리 사용되고 있다. 전자 상거래 데이터 객체들은 벡터로서 모델되어 각 특질에 해당하는 차원의 숫자 값으로 표현될 수 있다. 전자 상거래의 특성상 이러한 객체들은 방대한 분량이 되고 있고, 이중 여러 객체들은 실제로 같거나 유사한 객체일 수 있다. 따라서 객체간 유사도 측정은 전자상거래 시스템에서 중요한 역할을 한다. 본 논문에서는 벡터 객체에서 사용되는 대표적인 유사도 측정 함수들을 고찰한다. 유사 함수들은 각각의 대수적 특성을 가지고 있고 서로 연결된 특성을 보인다. 이러한 특성을 분석하고 또한 유사 함수들을 분류해 본다. 이러한 과정은 표준 벡터 유사도 함수가 가져야 할 대수적 특성을 제시해준다.

Keywords

Acknowledgement

Supported by : 숙명여자대학교

References

  1. Batagelj, V. and Bren, M., "Comparing resemblance measures," Journal of Classification, Vol. 12, 1995.
  2. Bouchon-Meunier, B., Rifqi, M., and Bothorel, S., "Towards general measures of comparison of objects," Fuzzy Sets Systems, Vol. 84, 1996.
  3. Cha, S.-H., "Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions," INTERNATIONAL JOURNAL of MATHEMATICAL MODELS AND METHODS IN APPLIED SCIENCES, Vol. 1, 2007.
  4. Choi, S.-S., Cha, S.-H., and Tappert, C. C., "A Survey of Binary Similarity and Distance Measures," Journal of Systemics, Cybernetics and Informatics, Vol. 8, 2010.
  5. Deza, M -M. and Deza, E., Dictionary of Distances, Elsevier Science, 2006.
  6. Jaccard, P., "Etude comparative de la distribution florale dans une portion des Alpes et des Jura," Bulletin del la Societe Vaudoise des Sciences Naturelles, 1901.
  7. Lee, D., An Efficient Filtering Framework for Vector Similarity Joins, PhD. Thesis, Seoul National University, 2011.
  8. Lesot, M.- J, Rifqi, M., and Benhadda, H., "Similarity measures for binary and numerical data: a survey," International Journal of Knowledge Engineering and Soft Data Paradigms, Vol. 1, 2009.
  9. Levenshtein, V., "Binary codes capable of correcting deletions, insertions and reversals," Soviet Physics Doklady, Vol. 10, 1966.
  10. Salton, G., Wong, A., and Yang, C. S., "A vector space model for automatic indexing," Communications of the ACM, Vol. 18, 1975.
  11. Santini, S. and Jain, R., "Similarity Measures," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 21, 1999.
  12. Yeon, J., Lee, D., Shim, J., and Lee, S.-G., "Product Review Data and Sentiment Analytical Processing Modeling," The Journal of Society for e-Business Studies, Vol. 16, 2011.

Cited by

  1. A Study on Method for User Gender Prediction Using Multi-Modal Smart Device Log Data vol.21, pp.1, 2016, https://doi.org/10.7838/jsebs.2016.21.1.147
  2. A Two-Phase On-Device Analysis for Gender Prediction of Mobile Users Using Discriminative and Popular Wordsets vol.21, pp.1, 2016, https://doi.org/10.7838/jsebs.2016.21.1.065
  3. Practical Datasets for Similarity Measures and Their Threshold Values vol.18, pp.1, 2013, https://doi.org/10.7838/jsebs.2013.18.1.097