• Title/Summary/Keyword: similarity comparison

Search Result 751, Processing Time 0.024 seconds

Semantic Similarity-Based Contributable Task Identification for New Participating Developers

  • Kim, Jungil;Choi, Geunho;Lee, Eunjoo
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.4
    • /
    • pp.228-234
    • /
    • 2018
  • In software development, the quality of a product often depends on whether its developers can rapidly find and contribute to the proper tasks. Currently, the word data of projects to which newcomers have previously contributed are mainly utilized to find appropriate source files in an ongoing project. However, because of the vocabulary gap between software projects, the accuracy of source file identification based on information retrieval is not guaranteed. In this paper, we propose a novel source file identification method to reduce the vocabulary gap between software projects. The proposed method employs DBPedia Spotlight to identify proper source files based on semantic similarity between source files of software projects. In an experiment based on the Spring Framework project, we evaluate the accuracy of the proposed method in the identification of contributable source files. The experimental results show that the proposed approach can achieve better accuracy than the existing method based on comparison of word vocabularies.

A Modified Steering Kernel Filter for AWGN Removal based on Kernel Similarity

  • Cheon, Bong-Won;Kim, Nam-Ho
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.3
    • /
    • pp.195-203
    • /
    • 2022
  • Noise generated during image acquisition and transmission can negatively impact the results of image processing applications, and noise removal is typically a part of image preprocessing. Denoising techniques combined with nonlocal techniques have received significant attention in recent years, owing to the development of sophisticated hardware and image processing algorithms, much attention has been paid to; however, this approach is relatively poor for edge preservation of fine image details. To address this limitation, the current study combined a steering kernel technique with adaptive masks that can adjust the size according to the noise intensity of an image. The algorithm sets the steering weight based on a similarity comparison, allowing it to respond to edge components more effectively. The proposed algorithm was compared with existing denoising algorithms using quantitative evaluation and enlarged images. The proposed algorithm exhibited good general denoising performance and better performance in edge area processing than existing non-local techniques.

Study on the comparison result of Machine code Program (실행코드 비교 감정에서 주변장치 분석의 유효성)

  • Kim, Do-Hyeun;Lee, Kyu-Tae
    • Journal of Software Assessment and Valuation
    • /
    • v.16 no.1
    • /
    • pp.37-44
    • /
    • 2020
  • The similarity of the software is extracted by the verification of comparing with the source code. The source code is the intellectual copyright of the developer written in the programming language. And the source code written in text format contains the contents of the developer's expertise and ideas. The verification for judging the illegal use of software copyright is performed by comparing the structure and contents of files with the source code of the original and the illegal copy. However, there is hard to do the one-to-one comparison in practice. Cause the suspected source code do not submitted Intentionally or unconsciously. It is now increasing practically. In this case, the comparative evaluation with execution code should be performed, and indirect methods such as reverse assembling method, reverse engineering technique, and sequence analysis of function execution are applied. In this paper, we analyzed the effectiveness of indirect comparison results by practical evaluation . It also proposes a method to utilize to the system and executable code files as a verification results.

Seabed Classification Using the K-L (Karhunen-Lo$\grave{e}$ve) Transform of Chirp Acoustic Profiling Data: An Effective Approach to Geoacoustic Modeling (광역주파수 음향반사자료의 K-L 변환을 이용한 해저면 분류: 지질음향 모델링을 위한 유용한 방법)

  • Chang, Jae-Kyeong;Kim, Han-Joon;Jou, Hyeong-Tae;Suk, Bong-Chool;Park, Gun-Tae;Yoo, Hai-Soo;Yang, Sung-Jin
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.3 no.3
    • /
    • pp.158-164
    • /
    • 1998
  • We introduce a statistical scheme to classify seabed from acoustic profiling data acquired using Chirp sonar system. The classification is based on grouping of signal traces by similarity index, which is computed using the K-L (Karhunen-Lo$\grave{e}$ve) transform of the Chirp profiling data. The similarity index represents the degree of coherence of bottom-reflected signals in consecutive traces, hence indicating the acoustic roughness of the seabed. The results of this study show that similarity index is a function of homogeneity, grain size of sediments and bottom hardness. The similarity index ranges from 0 to 1 for various types of seabed material. It increases in accordance with the homogeneity and softness of bottom sediments, whereas it is inversely proportional to the grain size of sediments. As a real data example, we classified the seabed off Cheju Island, Korea based on the similarity index and compared the result with side-scan sonar data and sediment samples. The comparison shows that the classification of seabed by the similarity index is in good agreement with the real sedimentary facies and can delineate acoustic response of the seabed in more detail. Therefore, this study presents an effective method for geoacoustic modeling to classify the seafloor directly from acoustic data.

  • PDF

Deep Learning Similarity-based 1:1 Matching Method for Real Product Image and Drawing Image

  • Han, Gi-Tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.12
    • /
    • pp.59-68
    • /
    • 2022
  • This paper presents a method for 1:1 verification by comparing the similarity between the given real product image and the drawing image. The proposed method combines two existing CNN-based deep learning models to construct a Siamese Network. After extracting the feature vector of the image through the FC (Fully Connected) Layer of each network and comparing the similarity, if the real product image and the drawing image (front view, left and right side view, top view, etc) are the same product, the similarity is set to 1 for learning and, if it is a different product, the similarity is set to 0. The test (inference) model is a deep learning model that queries the real product image and the drawing image in pairs to determine whether the pair is the same product or not. In the proposed model, through a comparison of the similarity between the real product image and the drawing image, if the similarity is greater than or equal to a threshold value (Threshold: 0.5), it is determined that the product is the same, and if it is less than or equal to, it is determined that the product is a different product. The proposed model showed an accuracy of about 71.8% for a query to a product (positive: positive) with the same drawing as the real product, and an accuracy of about 83.1% for a query to a different product (positive: negative). In the future, we plan to conduct a study to improve the matching accuracy between the real product image and the drawing image by combining the parameter optimization study with the proposed model and adding processes such as data purification.

Cloning and Phylogenetic Analysis of Two Different bphC Genes and bphD Gene From PCB-Degrading Bacterium, Pseudomonas sp. Strain SY5

  • Na, Kyung-Su;Kim, Seong-Jun;Kubo, Motoki;Chung, Seon-Yong
    • Journal of Microbiology and Biotechnology
    • /
    • v.11 no.4
    • /
    • pp.668-676
    • /
    • 2001
  • Pseudomonas sp. strain SY5 is a PCB-degrading bacterium [24] that includes two different enzymes (BphC1 and BphC2) encoding 2,3-dihdroxybiphenyl 1,2-dioxygenase and BphD encoding 2-hydroxy-6-oxo-6-phenylhexa-2,4-dienoate hydrolase. The bphC1 and bphC2 genes were found to consist of 897 based encoding 299 amino acids and 882 bases encoding 294 amino acids, respectively, whereas the bphD gene consisted of 861 bases encoding 287 amino acids. According to a homology search, a 50% and 39% similarity between the bphC1 and bphC2 genes at the nucleotide and amino acid level was shown, respectively. The bphC1 gene showed a 38% and 45% similarity at the amino acid level to Alcaligenes eutrophus A5 and Rhodococcus rhodochrous, respectively, whereas, bphC2 showed a 95% and 43% similarity, respectively. A comparison of the deduced amino acid sequence of the bphD product of Pseudomonas sp. SY5 with that of A. eutrophus A5, Pseudomons sp. KKS102, and LB400 showed a sequence identity of 92, 92, and 79%, respectively. Strain SY5 was originally isolated from municipal sewage containing recalcitrant organic compounds an found to have a high degradability of various aromatic compounds [23]. The current study found that strain SY5 had two extradiol-type dioxygenases, which did not hybridize with each other as they had a low similarity, yet a similar structure of evolutionarily conserved amino acids residues for catalytic activity between BphC1 and BphC2 was observed.

  • PDF

A Dispersion Mean Algorithm based on Similarity Measure for Evaluation of Port Competitiveness (항만 경쟁력 평가를 위한 유사도 기반의 이산형 평균 알고리즘)

  • Chw, Bong-Sung;Lee, Cheol-Yeong
    • Journal of Navigation and Port Research
    • /
    • v.28 no.3
    • /
    • pp.185-191
    • /
    • 2004
  • The mean and Clustering are important methods of data mining, which is now widely applied to various multi-attributes problem However, feature weighting and feature selection are important in those methods bemuse features may differ in importance and such differences need to be considered in data mining with various multiful-attributes problem. In addition, in the event of arithmetic mean, which is inadequate to figure out the most fitted result for structure of evaluation with attributes that there are weighted and ranked. Moreover, it is hard to catch hold of a specific character for assume the form of user's group. In this paper. we propose a dispersion mean algorithm for evaluation of similarity measure based on the geometrical figure. In addition, it is applied to mean classified by user's group. One of the key issues to be considered in evaluation of the similarity measure is how to achieve objectiveness that it is not change over an item ranking in evaluation process.

A Study of Similarity Measures on Multidimensional Data Sequences Using Semantic Information (의미 정보를 이용한 다차원 데이터 시퀀스의 유사성 척도 연구)

  • Lee, Seok-Lyong;Lee, Ju-Hong;Chun, Seok-Ju
    • The KIPS Transactions:PartD
    • /
    • v.10D no.2
    • /
    • pp.283-292
    • /
    • 2003
  • One-dimensional time-series data have been studied in various database applications such as data mining and data warehousing. However, in the current complex business environment, multidimensional data sequences (MDS') become increasingly important in addition to one-dimensional time-series data. For example, a video stream can be modeled as an MDS in the multidimensional space with respect to color and texture attributes. In this paper, we propose the effective similarity measures on which the similar pattern retrieval is based. An MDS is partitioned into segments, each of which is represented by various geometric and semantic features. The similarity measures are defined on the basis of these segments. Using the measures, irrelevant segments are pruned from a database with respect to a given query. Both data sequences and query sequences are partitioned into segments, and the query processing is based upon the comparison of the features between data and query segments, instead of scanning all data elements of entire sequences.

Structural Alignment: Conceptual Implications and Limitations (구조적 정렬: 개념적 시사점과 한계)

  • Lee Tae-Yeon
    • Korean Journal of Cognitive Science
    • /
    • v.17 no.1
    • /
    • pp.53-74
    • /
    • 2006
  • Similarity has been considered as one of basic concepts of cognitive psychology which is useful for explaining cognitive structure and process. MDS models(Shepard, 1964; Nosofsky, 1991) and Contrast model(Tversky, 1977) were proposed as early models of similarity comparison process. But, there have been a lot of theoretical doubts about the conceptual validity of similarity as a result of empirical findings which could not be explained by early models. Goldstone(1994) assumed that similarity could be defined by alignment processes, and suggested structural alignment as a prospective alternative for solving conceptual controversies so far. In this study, basic assumption and algorithms of MDS models(Shepard, 1944; Nosofsky, 1991) and Contrast model(Tversky, 1977) were described shortly and some theoretical limitations such as arbitrariness of selective attention and correlated structures were discussed as well. The conceptual characteristics and algorithms of SIAM(Goldstone, 1994) were described and how it has been applied to cognitive psychology areas such as categorization, conceptual combination, and analogical reasoning were reviewed. Finally, some theoretical limitations related with data-driven processing and alternative processing and possible directions for structural alignment were discussed.

  • PDF

An Index-Based Search Method for Performance Improvement of Set-Based Similar Sequence Matching (집합 유사 시퀀스 매칭의 성능 향상을 위한 인덱스 기반 검색 방법)

  • Lee, Juwon;Lim, Hyo-Sang
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.507-520
    • /
    • 2017
  • The set-based similar sequence matching method measures similarity not for an individual data item but for a set grouping multiple data items. In the method, the similarity of two sets is represented as the size of intersection between them. However, there is a critical performances issue for the method in twofold: 1) calculating intersection size is a time consuming process, and 2) the number of set pairs that should be calculated the intersection size is quite large. In this paper, we propose an index-based search method for improving performance of set-based similar sequence matching in order to solve these performance issues. Our method consists of two parts. In the first part, we convert the set similarity problem into the intersection size comparison problem, and then, provide an index structure that accelerates the intersection size calculation. Second, we propose an efficient set-based similar sequence matching method which exploits the proposed index structure. Through experiments, we show that the proposed method reduces the execution time by 30 to 50 times then the existing methods. We also show that the proposed method has scalability since the performance gap becomes larger as the number of data sequences increases.