Search | Korea Science

Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model

Kadowaki, Natsuki;Kishida, Kazuaki
- Journal of Information Science Theory and Practice
- /
- v.8 no.2
- /
- pp.6-17
- /
- 2020
Word similarity is often measured to enhance system performance in the information retrieval field and other related areas. This paper reports on an experimental comparison of values for word similarity measures that were computed based on 50 intentionally selected words from a Reuters corpus. There were three targets, including (1) co-occurrence-based similarity measures (for which a co-occurrence frequency is counted as the number of documents or sentences), (2) context-based distributional similarity measures obtained from a latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), and Word2Vec algorithm, and (3) similarity measures computed from the tf-idf weights of each word according to a vector space model (VSM). Here, a Pearson correlation coefficient for a pair of VSM-based similarity measures and co-occurrence-based similarity measures according to the number of documents was highest. Group-average agglomerative hierarchical clustering was also applied to similarity matrices computed by individual measures. An evaluation of the cluster sets according to an answer set revealed that VSM- and LDA-based similarity measures performed best.
https://doi.org/10.1633/JISTaP.2020.8.2.1 인용 PDF KSCI HTML

A Max-Flow-Based Similarity Measure for Spectral Clustering

Cao, Jiangzhong;Chen, Pei;Zheng, Yun;Dai, Qingyun
- ETRI Journal
- /
- v.35 no.2
- /
- pp.311-320
- /
- 2013
In most spectral clustering approaches, the Gaussian kernel-based similarity measure is used to construct the affinity matrix. However, such a similarity measure does not work well on a dataset with a nonlinear and elongated structure. In this paper, we present a new similarity measure to deal with the nonlinearity issue. The maximum flow between data points is computed as the new similarity, which can satisfy the requirement for similarity in the clustering method. Additionally, the new similarity carries the global and local relations between data. We apply it to spectral clustering and compare the proposed similarity measure with other state-of-the-art methods on both synthetic and real-world data. The experiment results show the superiority of the new similarity: 1) The max-flow-based similarity measure can significantly improve the performance of spectral clustering; 2) It is robust and not sensitive to the parameters.
https://doi.org/10.4218/etrij.13.0112.0520 인용 PDF KSCI

A NOTE ON APPROXIMATE SIMILARITY

Hadwin, Don
- Journal of the Korean Mathematical Society
- /
- v.38 no.6
- /
- pp.1157-1166
- /
- 2001
This paper answers some old questions about approximate similarity and raises new ones. We provide positive evidence and a technique for finding negative evidence on the question of whether approximate similarity is the equivalence relation generated by approximate equivalence and similarity.
PDF

Multi-Modal Based Malware Similarity Estimation Method (멀티모달 기반 악성코드 유사도 계산 기법)

Yoo, Jeong Do;Kim, Taekyu;Kim, In-sung;Kim, Huy Kang
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.29 no.2
- /
- pp.347-363
- /
- 2019
Malware has its own unique behavior characteristics, like DNA for living things. To respond APT (Advanced Persistent Threat) attacks in advance, it needs to extract behavioral characteristics from malware. To this end, it needs to do classification for each malware based on its behavioral similarity. In this paper, various similarity of Windows malware is estimated; and based on these similarity values, malware's family is predicted. The similarity measures used in this paper are as follows: 'TF-IDF cosine similarity', 'Nilsimsa similarity', 'malware function cosine similarity' and 'Jaccard similarity'. As a result, we find the prediction rate for each similarity measure is widely different. Although, there is no similarity measure which can be applied to malware classification with high accuracy, this result can be helpful to select a similarity measure to classify specific malware family.
https://doi.org/10.13089/JKIISC.2019.29.2.347 인용 PDF KSCI HTML

A Study on the Performance of Similarity Indices and its Relationship with Link Prediction: a Two-State Random Network Case

Ahn, Min-Woo;Jung, Woo-Sung
- Journal of the Korean Physical Society
- /
- v.73 no.10
- /
- pp.1589-1595
- /
- 2018
Similarity index measures the topological proximity of node pairs in a complex network. Numerous similarity indices have been defined and investigated, but the dependency of structure on the performance of similarity indices has not been sufficiently investigated. In this study, we investigated the relationship between the performance of similarity indices and structural properties of a network by employing a two-state random network. A node in a two-state network has binary types that are initially given, and a connection probability is determined from the state of the node pair. The performances of similarity indices are affected by the number of links and the ratio of intra-connections to inter-connections. Similarity indices have different characteristics depending on their type. Local indices perform well in small-size networks and do not depend on whether the structure is intra-dominant or inter-dominant. In contrast, global indices perform better in large-size networks, and some such indices do not perform well in an inter-dominant structure. We also found that link prediction performance and the performance of similarity are correlated in both model networks and empirical networks. This relationship implies that link prediction performance can be used as an approximation for the performance of the similarity index when information about node type is unavailable. This relationship may help to find the appropriate index for given networks.
https://doi.org/10.3938/jkps.73.1589 인용 KSCI

Similarity Classifier based on Schweizer & Sklars t-norms

Luukka, P.;Sampo, J.
- 제어로봇시스템학회:학술대회논문집
- /
- 2004.08a
- /
- pp.1053-1056
- /
- 2004
In this article we have applied Schweizer & Sklars t-norm based similarity measures to classification task. We will compare results to fuzzy similarity measure based classification and show that sometimes better results can be found by using these measures than fuzzy similarity measure. We will also show that classification results are not so sensitive to p values with Schweizer & Sklars measures than when fuzzy similarity is used. This is quite important when one does not have luxury of tuning these kind of parameters but needs good classification results fast.
PDF

Grouping DNA sequences with similarity measure and application

Lee, Sanghyuk
- Journal of the Korea Convergence Society
- /
- v.4 no.3
- /
- pp.35-41
- /
- 2013
Grouping problem with similarities between DNA sequences are studied. The similaritymeasure and the distance measure showed the complementary characteristics. Distance measure can be obtained by complementing similarity measure, and vice versa. Similarity measure is derived and proved. Usefulness of the proposed similarity measure is applied to grouping problem of 25 cockroach DNA sequences. By calculation of DNA similarity, 25 cockroaches are clustered by four groups, and the results are compared with the previous neighbor-joining method.
https://doi.org/10.15207/JKCS.2013.4.3.035 인용 PDF

Transactions Clustering based on Item Similarity (아이템의 유사도를 고려한 트랜잭션 클러스터링)

이상욱;김재련
- Proceedings of the Korea Inteligent Information System Society Conference
- /
- 2002.11a
- /
- pp.250-257
- /
- 2002
Clustering is a data mining method, which consists in discovering interesting data distributions in very large databases. In traditional data clustering, similarity of a cluster of object is measured by pairwise similarity of objects in that paper. In view of the nature of clustering transactions, we devise in this paper a novel measurement called item similarity and utilize this to perform clustering. With this item similarity measurement, we develop an efficient clustering algorithm for target marketing in each group.
PDF

SIMILAR AND SELF-SIMILAR CURVES IN MINKOWSKI n-SPACE

OZDEMIR, MUSTAFA;SIMSEK, HAKAN
- Bulletin of the Korean Mathematical Society
- /
- v.52 no.6
- /
- pp.2071-2093
- /
- 2015
In this paper, we investigate the similarity transformations in the Minkowski n-space. We study the geometric invariants of non-null curves under the similarity transformations. Besides, we extend the fundamental theorem for a non-null curve according to a similarity motion of ${\mathbb{E}}_1^n$. We determine the parametrizations of non-null self-similar curves in ${\mathbb{E}}_1^n$.
https://doi.org/10.4134/BKMS.2015.52.6.2071 인용 PDF KSCI

Similarity Measure Construction with Fuzzy Entropy and Distance Measure

Lee Sang-Hyuk
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.5 no.4
- /
- pp.367-371
- /
- 2005
The similarity measure is derived using fuzzy entropy and distance measure. By the elations of fuzzy entropy, distance measure, and similarity measure, we first obtain the fuzzy entropy. And with both fuzzy entropy and distance measure, similarity measure is obtained., We verify that the proposed measure become the similarity measure.
https://doi.org/10.5391/IJFIS.2005.5.4.367 인용 PDF KSCI

Search Result 8,138, Processing Time 0.037 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)