DOI QR코드

DOI QR Code

A Tensor Space Model based Semantic Search Technique

텐서공간모델 기반 시멘틱 검색 기법

  • Hong, Kee-Joo (School of Electrical and Computer Engineering, University of Seoul) ;
  • Kim, Han-Joon (School of Electrical and Computer Engineering, University of Seoul) ;
  • Chang, Jae-Young (Department of Computer Engineering, Hansung University) ;
  • Chun, Jong-Hoon (Department of Data Technology, Myonji University)
  • Received : 2016.08.21
  • Accepted : 2016.10.18
  • Published : 2016.11.30

Abstract

Semantic search is known as a series of activities and techniques to improve the search accuracy by clearly understanding users' search intent without big cognitive efforts. Usually, semantic search engines requires ontology and semantic metadata to analyze user queries. However, building a particular ontology and semantic metadata intended for large amounts of data is a very time-consuming and costly task. This is why commercialization practices of semantic search are insufficient. In order to resolve this problem, we propose a novel semantic search method which takes advantage of our previous semantic tensor space model. Since each term is represented as the 2nd-order 'document-by-concept' tensor (i.e., matrix), and each concept as the 2nd-order 'document-by-term' tensor in the model, our proposed semantic search method does not require to build ontology. Nevertheless, through extensive experiments using the OHSUMED document collection and SCOPUS journal abstract data, we show that our proposed method outperforms the vector space model-based search method.

시멘틱 검색은 검색 사용자의 인지적 노력을 최소화하면서 사용자 질의의 문맥을 이해하여 의미에 맞는 문서를 정확히 찾아주는 기술이다. 아직 시멘틱 검색 기술은 온톨로지 또는 시멘틱 메타데이터 구축의 난제를 갖고 있으며 상용화 사례도 매우 미흡한 실정이다. 본 논문은 기존 시멘틱 검색 엔진의 한계를 극복하기 위하여 이전 연구에서 고안한 위키피디아 기반의 시멘틱 텐서공간모델을 활용하여 새로운 시멘틱 검색 기법을 제안한다. 제안하는 시멘틱 기법은 문서 집합에 출현하는 '단어'가 텐서공간모델에서 '문서-개념'의 2차 텐서(행렬), '개념'은 '문서-단어'의 2차 텐서로 표현된다는 성질을 이용하여 시멘틱 검색을 위해 요구되는 온톨로지 구축의 필요성을 없앤다. 그럼에도 불구하고, OHSUMED, SCOPUS 데이터셋을 이용한 성능평가를 통해 제안 기법이 벡터공간모델에서의 기존 검색 기법보다 우수함을 보인다.

Keywords

References

  1. Baeza-Yates, R. and Ribeiro-Neto, B., Modern information retrieval: The Concepts and Technology behind Search, New York: ACM Press, Chapter 3, 2011.
  2. Berlanga, R., Nebot, V., and Perez, M., "Tailored semantic annotation for semantic search," Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 30, pp. 69-81, 2015. https://doi.org/10.1016/j.websem.2014.07.007
  3. Gantz, J. and David R., "The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east," IDC iView: IDC Analyze the Future 2007, pp. 1-16, 2012.
  4. Heck, L. P., Hakkani-Tur, D., and Tur, G., "Leveraging knowledge graphs for web-scale unsupervised semantic parsing," INTERSPEECH, pp. 1594-1598, 2013.
  5. Kim, H. J. and Chang, J. Y., "A Semantic Text Model with Wikipedia-based Concept Space," The Journal of Society for e-Business Studies, Vol. 19, No. 3, pp. 107-123, 2014. https://doi.org/10.7838/jsebs.2014.19.3.107
  6. Kim, H. J., Hong, K. J., and Chang, J. Y., "Semantically enriching text representation model for document clustering," Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp. 922-925, 2015.
  7. Nadeau, D. and Sekine, S., "A survey of named entity recognition and classification," Lingvisticae Investigationes, Vol. 30, No. 1, pp. 3-26, 2007. https://doi.org/10.1075/li.30.1.03nad
  8. Navigli, R., "Word sense disambiguation: A survey," ACM Computing Surveys (CSUR), Vol. 41, No. 2, pp. 1-69, 2009.
  9. Page, L., Brin, S., and Motwani, R., Winograd, T., "The PageRank citation ranking: bringing order to the Web," 1999.
  10. Rossi, R. G., Marcacini, R. M., and Rezende, S. O., "Benchmarking text collections for classification and clustering tasks," Institute of Mathematics and Computer Sciences, University of Sao Paulo, 2013.
  11. Salton, G., Wong, A., and Yang, C. S., "A vector space model for automatic indexing," Communications of the ACM, Vol. 18, No. 11, pp. 613-620, 1975. https://doi.org/10.1145/361219.361220
  12. Sudeepthi, G., Anuradha, G., and Babu, M. S. P., "A survey on semantic web search engine," IJCSI International Journal of Computer Science Issues, Vol. 9, No. 2, pp. 241-245, 2012.
  13. Tablan, V., Bontcheva, K., and Roberts, I., Cunningham, H., "Mimir: An open-source semantic search framework for interactive information seeking and discovery," Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 30, pp. 52-68, 2015. https://doi.org/10.1016/j.websem.2014.10.002
  14. Yang, K. and Shahabi, C., "A PCA-based similarity measure for multivariate time series," Proceedings of the 2nd ACM international workshop on multimedia databases, pp. 65-74, 2004.