• Title/Summary/Keyword: Similarity Measures

Search Result 304, Processing Time 0.025 seconds

Multi-Modal Based Malware Similarity Estimation Method (멀티모달 기반 악성코드 유사도 계산 기법)

  • Yoo, Jeong Do;Kim, Taekyu;Kim, In-sung;Kim, Huy Kang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.2
    • /
    • pp.347-363
    • /
    • 2019
  • Malware has its own unique behavior characteristics, like DNA for living things. To respond APT (Advanced Persistent Threat) attacks in advance, it needs to extract behavioral characteristics from malware. To this end, it needs to do classification for each malware based on its behavioral similarity. In this paper, various similarity of Windows malware is estimated; and based on these similarity values, malware's family is predicted. The similarity measures used in this paper are as follows: 'TF-IDF cosine similarity', 'Nilsimsa similarity', 'malware function cosine similarity' and 'Jaccard similarity'. As a result, we find the prediction rate for each similarity measure is widely different. Although, there is no similarity measure which can be applied to malware classification with high accuracy, this result can be helpful to select a similarity measure to classify specific malware family.

The Evaluation Measure of Text Clustering for the Variable Number of Clusters (가변적 클러스터 개수에 대한 문서군집화 평가방법)

  • Jo, Tae-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10b
    • /
    • pp.233-237
    • /
    • 2006
  • This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.

  • PDF

Semantic-Based K-Means Clustering for Microblogs Exploiting Folksonomy

  • Heu, Jee-Uk
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1438-1444
    • /
    • 2018
  • Recently, with the development of Internet technologies and propagation of smart devices, use of microblogs such as Facebook, Twitter, and Instagram has been rapidly increasing. Many users check for new information on microblogs because the content on their timelines is continually updating. Therefore, clustering algorithms are necessary to arrange the content of microblogs by grouping them for a user who wants to get the newest information. However, microblogs have word limits, and it has there is not enough information to analyze for content clustering. In this paper, we propose a semantic-based K-means clustering algorithm that not only measures the similarity between the data represented as a vector space model, but also measures the semantic similarity between the data by exploiting the TagCluster for clustering. Through the experimental results on the RepLab2013 Twitter dataset, we show the effectiveness of the semantic-based K-means clustering algorithm.

Workflow Clustering Methodology Using Structural Similarity Metrics (프로세스 유사성을 이용한 워크플로우 클러스터링)

  • Jung, Jae-Yoon;Bae, Joonsoo;Kang, Suk-Ho
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.33 no.1
    • /
    • pp.99-109
    • /
    • 2007
  • To realize process-driven management, so many companies have been launching business process managementsystems. Business process is collection of standardized and structured tasks inducing value creation of acompany. Moreover, it is recognized as one of significant intangible business assets to achieve competitiveadvantages. This research introduces a novel approach of workflow process analysis, which has more and moresignificance as process-aware information systems are spreading widely into a lot of companies, In this paper, amethodology of workflow clustering based on process similarity has been proposed. The purpose of workflowclustering is to analyze accumulated process definitions in order to assist design of new processes andimprovement of existing ones. The proposed methodology exploits measures of structural similarity of workflowprocesses.The methodology has been experimented with synthetic process models for illustrating the implicationofworkflow clustering.

Application of Similarity Measure for Fuzzy C-Means Clustering to Power System Management

  • Park, Dong-Hyuk;Ryu, Soo-Rok;Park, Hyun-Jeong;Lee, Sang-H.
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.1
    • /
    • pp.18-23
    • /
    • 2008
  • A FCM with locational price and regional information between locations are proposed in this paper. Any point in a networked system has its own values indicating the physical characteristics of that networked system and regional information at the same time. The similarity measure used for FCM in this paper is defined through the system-wide characteristic values at each point. To avoid the grouping of geometrically distant locations with similar measures, the locational information are properly considered and incorporated in the proposed similarity measure. We have verified that the proposed measure has produced proper classification of a networked system, followed by an example of a networked electricity system.

Relevance Feedback for Content Based Retrieval Using Fuzzy Integral (퍼지적분을 이용한 내용기반 검색 사용자 의견 반영시스템)

  • Young Sik Choi
    • Journal of Internet Computing and Services
    • /
    • v.1 no.2
    • /
    • pp.89-96
    • /
    • 2000
  • Relevance feedback is a technique to learn the user's subjective perception of similarity between images, and has recently gained attention in Content Based Image Retrieval. Most relevance feedback methods assume that the individual features that are used in similarity judgments do not interact with each other. However, this assumption severely limits the types of similarity judgments that can be modeled In this paper, we explore a more sophisticated model for similarity judgments based on fuzzy measures and the Choquet Integral, and propose a suitable algorithm for relevance feedback, Experimental results show that the proposed method is preferable to traditional weighted- average techniques.

  • PDF

Entropy-based Similarity Measures for Memory-based Collaborative Filtering

  • Kwon, Hyeong-Joon;Latchman, Haniph
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.5 no.2
    • /
    • pp.5-10
    • /
    • 2013
  • We proposed a novel similarity measure using weighted difference entropy (WDE) to improve the performance of the CF system. The proposed similarity metric evaluates the entropy with a preference score difference between the common rated items of two users, and normalizes it based on the Gaussian, tanh and sigmoid function. We showed significant improvement of experimental results and environments. These experiments involved changing the number of nearest neighborhoods, and we presented experimental results for two data sets with different characteristics, and results for the quality of recommendation.

Comparison Study for similarities based on Distance Measure and Fuzzy Number (거리측도를 이용한 유사도의 구성과 퍼지 넘버를 이용한 유사도와의 비교연구)

  • Lee, Sang-Hyuk
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.1
    • /
    • pp.1-6
    • /
    • 2007
  • The similarity measure is derived with distance measure, and the proposed similarity measure is proved to verily the usefulness. Conventional similarity measure which is constructed through fuzzy number and Center of Gravity(COG) is introduced, furthermore two similarity measures are compared through various types of membership function.

Acceleration sensor, and embedded system using location-aware

  • He, Wei;Nayel, Mohamed
    • Journal of Convergence Society for SMB
    • /
    • v.3 no.1
    • /
    • pp.23-30
    • /
    • 2013
  • In this paper, fuzzy entropy and similarity measure to measure the uncertainty and similarity of data as real value were introduced. Design of fuzzy entropy and similarity measure were illustrated and proved. Obtained measures were applied to the calculating process and discussed. Extension of data quantification results such as decision making and fuzzy game theory were also discussed.

  • PDF

Improving Performance of Jaccard Coefficient for Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.11
    • /
    • pp.121-126
    • /
    • 2016
  • In recommender systems based on collaborative filtering, measuring similarity is very critical for determining the range of recommenders. Data sparsity problem is fundamental in collaborative filtering systems, which is partly solved by Jaccard coefficient combined with traditional similarity measures. This study proposes a new coefficient for improving performance of Jaccard coefficient by compensating for its drawbacks. We conducted experiments using datasets of various characteristics for performance analysis. As a result of comparison between the proposed and the similarity metric of Pearson correlation widely used up to date, it is found that the two metrics yielded competitive performance on a dense dataset while the proposed showed much better performance on a sparser dataset. Also, the result of comparing the proposed with Jaccard coefficient showed that the proposed yielded far better performance as the dataset is denser. Overall, the proposed coefficient demonstrated the best prediction and recommendation performance among the experimented metrics.