• Title/Summary/Keyword: Similarity Metrics

Search Result 77, Processing Time 0.017 seconds

A Novel Similarity Measure for Sequence Data

  • Pandi, Mohammad. H.;Kashefi, Omid;Minaei, Behrouz
    • Journal of Information Processing Systems
    • /
    • v.7 no.3
    • /
    • pp.413-424
    • /
    • 2011
  • A variety of different metrics has been introduced to measure the similarity of two given sequences. These widely used metrics are ranging from spell correctors and categorizers to new sequence mining applications. Different metrics consider different aspects of sequences, but the essence of any sequence is extracted from the ordering of its elements. In this paper, we propose a novel sequence similarity measure that is based on all ordered pairs of one sequence and where a Hasse diagram is built in the other sequence. In contrast with existing approaches, the idea behind the proposed sequence similarity metric is to extract all ordering features to capture sequence properties. We designed a clustering problem to evaluate our sequence similarity metric. Experimental results showed the superiority of our proposed sequence similarity metric in maximizing the purity of clustering compared to metrics such as d2, Smith-Waterman, Levenshtein, and Needleman-Wunsch. The limitation of those methods originates from some neglected sequence features, which are considered in our proposed sequence similarity metric.

Understanding of F2 Metrics Used to Evaluate Similarity of Dissolution Profiles (유사인자를 사용하여 용출양상 유사성을 비교하는 방법에 대한 고찰)

  • Cho, Mi-Hyun;Kim, Jeong-Ho;Lee, Hyeon-Tae;Sah, Hong-Kee
    • Journal of Pharmaceutical Investigation
    • /
    • v.33 no.3
    • /
    • pp.245-253
    • /
    • 2003
  • Dissolution profile comparsions can be done by virtue of the similarity factor $(f_2)$. It is a logarithmic reciprocal square root transformation of the sum of squared error of % dissolution differences between two profiles at several time points. It gives information on the degree of similarity between the two profiles: An $f_2$ value between 50 and 100 suggests the similarity/equivalence of the two dissolution curves being compared. The objective of this report was to provide a careful examination on the $f_2$ metrics in detail. It was shown that $f_2$ values exceeded 50, when relative differences in % dissolved between two products were less than 15% at all time points. The similarity factor value was also found to be greater than 50, in cases when absolute % dissolution differences were below 10% at all time points. Interestingly, the $f_2$ value was changed by the number of the time points selected for calculation. In particular, $f_2$ tended to have higher values, when the $f_2$ metrics used a large number of time points in which % dissolved reached plateau. Finally, since the similarity factor was a sample statistics, it was impossible to infer type I/II errors and sampling error. Despite certain limitations inherited in the $f_2$ metrics, it was easy and convenient to evaluate how similar the two dissolution profiles were.

An Effective Metric for Measuring the Degree of Web Page Changes (효과적인 웹 문서 변경도 측정 방법)

  • Kwon, Shin-Young;Kim, Sung-Jin;Lee, Sang-Ho
    • Journal of KIISE:Databases
    • /
    • v.34 no.5
    • /
    • pp.437-447
    • /
    • 2007
  • A variety of similarity metrics have been used to measure the degree of web page changes. In this paper, we first define criteria for web page changes to evaluate the effectiveness of the similarity metrics in terms of six important types of web page changes. Second, we propose a new similarity metric appropriate for measuring the degree of web page changes. Using real web pages and synthesized pages, we analyze the five existing metrics (i.e., the byte-wise comparison, the TF IDF cosine distance, the word distance, the edit distance, and the shingling) and ours under the proposed criteria. The analysis result shows that our metric represents the changes more effectively than other metrics. We expect that our study can help users select an appropriate metric for particular web applications.

Context-Weighted Metrics for Example Matching (문맥가중치가 반영된 문장 유사 척도)

  • Kim, Dong-Joo;Kim, Han-Woo
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.6 s.312
    • /
    • pp.43-51
    • /
    • 2006
  • This paper proposes a metrics for example matching under the example-based machine translation for English-Korean machine translation. Our metrics served as similarity measure is based on edit-distance algorithm, and it is employed to retrieve the most similar example sentences to a given query. Basically it makes use of simple information such as lemma and part-of-speech information of typographically mismatched words. Edit-distance algorithm cannot fully reflect the context of matched word units. In other words, only if matched word units are ordered, it is considered that the contribution of full matching context to similarity is identical to that of partial matching context for the sequence of words in which mismatching word units are intervened. To overcome this drawback, we propose the context-weighting scheme that uses the contiguity information of matched word units to catch the full context. To change the edit-distance metrics representing dissimilarity to similarity metrics, to apply this context-weighted metrics to the example matching problem and also to rank by similarity, we normalize it. In addition, we generalize previous methods using some linguistic information to one representative system. In order to verify the correctness of the proposed context-weighted metrics, we carry out the experiment to compare it with generalized previous methods.

Improving Performance of Jaccard Coefficient for Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.11
    • /
    • pp.121-126
    • /
    • 2016
  • In recommender systems based on collaborative filtering, measuring similarity is very critical for determining the range of recommenders. Data sparsity problem is fundamental in collaborative filtering systems, which is partly solved by Jaccard coefficient combined with traditional similarity measures. This study proposes a new coefficient for improving performance of Jaccard coefficient by compensating for its drawbacks. We conducted experiments using datasets of various characteristics for performance analysis. As a result of comparison between the proposed and the similarity metric of Pearson correlation widely used up to date, it is found that the two metrics yielded competitive performance on a dense dataset while the proposed showed much better performance on a sparser dataset. Also, the result of comparing the proposed with Jaccard coefficient showed that the proposed yielded far better performance as the dataset is denser. Overall, the proposed coefficient demonstrated the best prediction and recommendation performance among the experimented metrics.

Using User Rating Patterns for Selecting Neighbors in Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.9
    • /
    • pp.77-82
    • /
    • 2019
  • Collaborative filtering is a popular technique for recommender systems and used in many practical commercial systems. Its basic principle is select similar neighbors of a current user and from their past preference information on items the system makes recommendations for the current user. One of the major problems inherent in this type of system is data sparsity of ratings. This is mainly caused from the underlying similarity measures which produce neighbors based on the ratings records. This paper handles this problem and suggests a new similarity measure. The proposed method takes users rating patterns into account for computing similarity, without just relying on the commonly rated items as in previous measures. Performance experiments of various existing measures are conducted and their performance is compared in terms of major performance metrics. As a result, the proposed measure reveals better or comparable achievements in all the metrics considered.

Cohesion and Coupling Metrics for Component Design Model (컴포넌트 설계에 대한 응집도와 결합도 메트릭스)

  • Ko, Byung-Sun;Park, Jai-Nyun
    • The KIPS Transactions:PartD
    • /
    • v.10D no.5
    • /
    • pp.745-752
    • /
    • 2003
  • The component-based development methodology becomes famous as the reuse technology for independence and productivity of software development It is necessary component metrics for component-based systems, because It should be measurable to improve the quality of the software. Hence, in this paper, we propose component cohesion and coupling metrics which is reflected in characteristics of component. The operation use value is calculated by the information of classes interface commonly uses to offer the component's service. And, the operation similarity value is calculated by the operations use value. Component cohesion and coupling is calculated by the operation similarity and based of the information which is extracted in the analysis phase. And, we examine the necessity of component metrics in comparison with object-oriented metrics.

Similarity Analysis Between SAR Target Images Based on Siamese Network (Siamese 네트워크 기반 SAR 표적영상 간 유사도 분석)

  • Park, Ji-Hoon
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.25 no.5
    • /
    • pp.462-475
    • /
    • 2022
  • Different from the field of electro-optical(EO) image analysis, there has been less interest in similarity metrics between synthetic aperture radar(SAR) target images. A reliable and objective similarity analysis for SAR target images is expected to enable the verification of the SAR measurement process or provide the guidelines of target CAD modeling that can be used for simulating realistic SAR target images. For this purpose, this paper presents a similarity analysis method based on the siamese network that quantifies the subjective assessment through the distance learning of similar and dissimilar SAR target image pairs. The proposed method is applied to MSTAR SAR target images of slightly different depression angles and the resultant metrics are compared and analyzed with qualitative evaluation. Since the image similarity is somewhat related to recognition performance, the capacity of the proposed method for target recognition is further checked experimentally with the confusion matrix.

Improved Collaborative Filtering Using Entropy Weighting

  • Kwon, Hyeong-Joon
    • International Journal of Advanced Culture Technology
    • /
    • v.1 no.2
    • /
    • pp.1-6
    • /
    • 2013
  • In this paper, we evaluate performance of existing similarity measurement metric and propose a novel method using user's preferences information entropy to reduce MAE in memory-based collaborative recommender systems. The proposed method applies a similarity of individual inclination to traditional similarity measurement methods. We experiment on various similarity metrics under different conditions, which include an amount of data and significance weighting from n/10 to n/60, to verify the proposed method. As a result, we confirm the proposed method is robust and efficient from the viewpoint of a sparse data set, applying existing various similarity measurement methods and Significance Weighting.

  • PDF

Underwater Optical Image Data Transmission in the Presence of Turbulence and Attenuation

  • Ramavath Prasad Naik;Maaz Salman;Wan-Young Chung
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.1
    • /
    • pp.1-14
    • /
    • 2023
  • Underwater images carry information that is useful in the fields of aquaculture, underwater military security, navigation, transportation, and so on. In this research, we transmitted an underwater image through various underwater mediums in the presence of underwater turbulence and beam attenuation effects using a high-speed visible optical carrier signal. The optical beam undergoes scintillation because of the turbulence and attenuation effects; therefore, distorted images were observed at the receiver end. To understand the behavior of the communication media, we obtained the bit error rate (BER) performance of the system with respect to the average signal-to-noise ratio (SNR). Also, the structural similarity index (SSI) and peak SNR (PSNR) metrics of the received image were evaluated. Based on the received images, we employed suitable nonlinear filters to recover the distorted images and enhance them further. The BER, SSI, and PSNR metrics of the specific nonlinear filters were also evaluated and compared with the unfiltered metrics. These metrics were evaluated using the on-off keying and binary phase-shift keying modulation techniques for the 50-m and 100-m links for beam attenuation resulting from pure seawater, clear ocean water, and coastal ocean water mediums.