Search | Korea Science

Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model

Kadowaki, Natsuki;Kishida, Kazuaki
- Journal of Information Science Theory and Practice
- /
- v.8 no.2
- /
- pp.6-17
- /
- 2020
Word similarity is often measured to enhance system performance in the information retrieval field and other related areas. This paper reports on an experimental comparison of values for word similarity measures that were computed based on 50 intentionally selected words from a Reuters corpus. There were three targets, including (1) co-occurrence-based similarity measures (for which a co-occurrence frequency is counted as the number of documents or sentences), (2) context-based distributional similarity measures obtained from a latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), and Word2Vec algorithm, and (3) similarity measures computed from the tf-idf weights of each word according to a vector space model (VSM). Here, a Pearson correlation coefficient for a pair of VSM-based similarity measures and co-occurrence-based similarity measures according to the number of documents was highest. Group-average agglomerative hierarchical clustering was also applied to similarity matrices computed by individual measures. An evaluation of the cluster sets according to an answer set revealed that VSM- and LDA-based similarity measures performed best.
https://doi.org/10.1633/JISTaP.2020.8.2.1 인용 PDF KSCI HTML

Customized Knowledge Creation Framework using Context- and intensity-based Similarity (상황과 정보 집적도를 고려한 유사도 기반의 맞춤형 지식 생성프레임워크)

Sohn, Mye M.;Lee, Hyun-Jung
- Journal of Internet Computing and Services
- /
- v.12 no.5
- /
- pp.113-125
- /
- 2011
As information resources have become more various and the number of the resources has increased, knowledge customization on the social web has been becoming more difficult. To reduce the burden, we offer a framework for context-based similarity calculation for knowledge customization using ontology on the CBR. Thereby, we newly developed context- and intensity-based similarity calculation methods which are applied to extraction of the most similar case considered semantic similarity and syntactic, and effective creation of the user-tailored knowledge using the selected case. The process is comprised of conversion of unstructured web information into cases, extraction of an appropriate case according to the user requirements, and customization of the knowledge using the selected case. In the experimental section, the effectiveness of the developed similarity methods are compared with other edge-counting similarity methods using two classes which are compared with each other. It shows that our framework leads higher similarity values for conceptually close classes compared with other methods.
PDF KSCI

A Granular Classifier By Means of Context-based Similarity Clustering

Huang, Wei;Wang, Jinsong;Liao, Jiping
- Journal of Electrical Engineering and Technology
- /
- v.11 no.5
- /
- pp.1383-1394
- /
- 2016
https://doi.org/10.5370/JEET.2016.11.5.1383 인용 PDF KSCI KPUBS HTML

Gated Recurrent Unit Architecture for Context-Aware Recommendations with improved Similarity Measures

Kala, K.U.;Nandhini, M.
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.2
- /
- pp.538-561
- /
- 2020
Recommender Systems (RecSys) have a major role in e-commerce for recommending products, which they may like for every user and thus improve their business aspects. Although many types of RecSyss are there in the research field, the state of the art RecSys has focused on finding the user similarity based on sequence (e.g. purchase history, movie-watching history) analyzing and prediction techniques like Recurrent Neural Network in Deep learning. That is RecSys has considered as a sequence prediction problem. However, evaluation of similarities among the customers is challenging while considering temporal aspects, context and multi-component ratings of the item-records in the customer sequences. For addressing this issue, we are proposing a Deep Learning based model which learns customer similarity directly from the sequence to sequence similarity as well as item to item similarity by considering all features of the item, contexts, and rating components using Dynamic Temporal Warping(DTW) distance measure for dynamic temporal matching and 2D-GRU (Two Dimensional-Gated Recurrent Unit) architecture. This will overcome the limitation of non-linearity in the time dimension while measuring the similarity, and the find patterns more accurately and speedily from temporal and spatial contexts. Experiment on the real world movie data set LDOS-CoMoDa demonstrates the efficacy and promising utility of the proposed personalized RecSys architecture.
https://doi.org/10.3837/tiis.2020.02.004 인용 PDF KSCI HTML

Decision Tree Based Context Clustering with Cross Likelihood Ratio for HMM-based TTS (HMM 기반의 TTS를 위한 상호유사도 비율을 이용한 결정트리 기반의 문맥 군집화)

Jung, Chi-Sang;Kang, Hong-Goo
- The Journal of the Acoustical Society of Korea
- /
- v.32 no.2
- /
- pp.174-180
- /
- 2013
This paper proposes a decision tree based context clustering algorithm for HMM-based speech synthesis systems using the cross likelihood ratio with a hierarchical prior (CLRHP). Conventional algorithms tie the context-dependent HMM states that have similar statistical characteristics, but they do not consider the statistical similarity of split child nodes, which does not guarantee the statistical difference between the final leaf nodes. The proposed CLRHP algorithm improves the reliability of model parameters by taking a criterion of minimizing the statistical similarity of split child nodes. Experimental results verify the superiority of the proposed approach to conventional ones.
https://doi.org/10.7776/ASK.2013.32.2.174 인용 PDF KSCI

Context-Weighted Metrics for Example Matching (문맥가중치가 반영된 문장 유사 척도)

Kim, Dong-Joo;Kim, Han-Woo
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.43 no.6 s.312
- /
- pp.43-51
- /
- 2006
This paper proposes a metrics for example matching under the example-based machine translation for English-Korean machine translation. Our metrics served as similarity measure is based on edit-distance algorithm, and it is employed to retrieve the most similar example sentences to a given query. Basically it makes use of simple information such as lemma and part-of-speech information of typographically mismatched words. Edit-distance algorithm cannot fully reflect the context of matched word units. In other words, only if matched word units are ordered, it is considered that the contribution of full matching context to similarity is identical to that of partial matching context for the sequence of words in which mismatching word units are intervened. To overcome this drawback, we propose the context-weighting scheme that uses the contiguity information of matched word units to catch the full context. To change the edit-distance metrics representing dissimilarity to similarity metrics, to apply this context-weighted metrics to the example matching problem and also to rank by similarity, we normalize it. In addition, we generalize previous methods using some linguistic information to one representative system. In order to verify the correctness of the proposed context-weighted metrics, we carry out the experiment to compare it with generalized previous methods.
PDF KSCI

Context-based Social Network Configuration Method between Users (컨텍스트 기반 사용자 간 소셜 네트워크 구성 방법)

Han, Jong-Hyun;Woo, Woon-Tack
- 한국HCI학회:학술대회논문집
- /
- 2009.02a
- /
- pp.11-14
- /
- 2009
In this paper, we propose the method configuring social networks among users based on users' context and profile. Recently, many researchers are concerned about social networks related with collaborative systems. In case of the existing researches, however, it is difficult to configure social networks dynamically because they are based on static data types, such as log and profile of users. The proposed method uses not only user profiles but also context reflecting users' behavior dynamically. It computes the similarity among users' behavior contexts using hierarchical structure of context domain knowledge model. And it calculates relationships between contexts by given weight factors of category of context model. In order to verify usefulness of the method, we conduct an experiment on configuring social network according to change of user context. We expect that it makes dynamic analysis of relationship of users possible.
PDF

Systematic Elicitation of Proximity for Context Management

Kim Chang-Suk;Lee Sang-Yong;Son Dong-Cheul
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.6 no.2
- /
- pp.167-172
- /
- 2006
As ubiquitous devices are fast spreading, the communication problem between humans and these devices is on the rise. The use of context is important in interactive application such as handhold and ubiquitous computing. Context is not crisp data, so it is necessary to introduce the fuzzy concept. The proxity relation is represented by the degree of closeness or similarity between data objects of a scalar domain. A context manager of context-awareness system evaluates imprecise queries with the proximity relations. in this paper, a systematic proximity elicitation method are proposed. The proposed generation method is simple and systematic. It is based on the well-known fuzzy set theory and applicable to the real world applications because it has tuning parameter and weighting factor. The proposed representations of proximity relation is more efficient than the ordinary matrix representation since it reflects some properties of a proximity relation to save space. We show an experiments of quantitative calculate for the proximity relation. And we analyze the time complexity and the space occupancy of the proposed representation method.
https://doi.org/10.5391/IJFIS.2006.6.2.167 인용 PDF KSCI

Application of Euclidean Distance Similarity for Smartphone-Based Moving Context Determination (스마트폰 기반의 이동상황 판별을 위한 유클리디안 거리유사도의 응용)

Jang, Young-Wan;Kim, Byeong Man;Jang, Sung Bong;Shin, Yoon Sik
- Journal of Korea Society of Industrial Information Systems
- /
- v.19 no.4
- /
- pp.53-63
- /
- 2014
Moving context determination is an important issue to be resolved in a mobile computing environment. This paper presents a method for recognizing and classifying a mobile user's moving context by Euclidean distance similarity. In the proposed method, basic data are gathered using Global Positioning System (GPS) and accelerometer sensors, and by using the data, the system decides which moving situation the user is in. The decided situation is one of the four categories: stop, walking, run, and moved by a car. In order to evaluate the effectiveness and feasibility of the proposed scheme, we have implemented applications using several variations of Euclidean distance similarity on the Android system, and measured the accuracies. Experimental results show that the proposed system achieves more than 90% accuracy.
https://doi.org/10.9723/jksiis.2014.19.4.053 인용 PDF KSCI

A Semantic Representation Based-on Term Co-occurrence Network and Graph Kernel

Noh, Tae-Gil;Park, Seong-Bae;Lee, Sang-Jo
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.11 no.4
- /
- pp.238-246
- /
- 2011
This paper proposes a new semantic representation and its associated similarity measure. The representation expresses textual context observed in a context of a certain term as a network where nodes are terms and edges are the number of cooccurrences between connected terms. To compare terms represented in networks, a graph kernel is adopted as a similarity measure. The proposed representation has two notable merits compared with previous semantic representations. First, it can process polysemous words in a better way than a vector representation. A network of a polysemous term is regarded as a combination of sub-networks that represent senses and the appropriate sub-network is identified by context before compared by the kernel. Second, the representation permits not only words but also senses or contexts to be represented directly from corresponding set of terms. The validity of the representation and its similarity measure is evaluated with two tasks: synonym test and unsupervised word sense disambiguation. The method performed well and could compete with the state-of-the-art unsupervised methods.
https://doi.org/10.5391/IJFIS.2011.11.4.238 인용 PDF KSCI

Search Result 87, Processing Time 0.036 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)