Word Sense Similarity Clustering Based on Vector Space Model and HAL

벡터 공간 모델과 HAL에 기초한 단어 의미 유사성 군집

  • 김동성 (고려대학교 언어정보연구소)
  • Received : 2012.08.14
  • Accepted : 2012.09.17
  • Published : 2012.09.30

Abstract

In this paper, we cluster similar word senses applying vector space model and HAL (Hyperspace Analog to Language). HAL measures corelation among words through a certain size of context (Lund and Burgess 1996). The similarity measurement between a word pair is cosine similarity based on the vector space model, which reduces distortion of space between high frequency words and low frequency words (Salton et al. 1975, Widdows 2004). We use PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) to reduce a large amount of dimensions caused by similarity matrix. For sense similarity clustering, we adopt supervised and non-supervised learning methods. For non-supervised method, we use clustering. For supervised method, we use SVM (Support Vector Machine), Naive Bayes Classifier, and Maximum Entropy Method.

Keywords

Distributional Hypothesis;Vector Space Model;HAL;Supervised/Non-supervised Learning;Pysholinguistics;Clustering;Dimensionality Reduction;Corpus Linguistics