• Title/Summary/Keyword: Binary Similarity

Search Result 93, Processing Time 0.023 seconds

Cross-architecture Binary Function Similarity Detection based on Composite Feature Model

  • Xiaonan Li;Guimin Zhang;Qingbao Li;Ping Zhang;Zhifeng Chen;Jinjin Liu;Shudan Yue
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.8
    • /
    • pp.2101-2123
    • /
    • 2023
  • Recent studies have shown that the neural network-based binary code similarity detection technology performs well in vulnerability mining, plagiarism detection, and malicious code analysis. However, existing cross-architecture methods still suffer from insufficient feature characterization and low discrimination accuracy. To address these issues, this paper proposes a cross-architecture binary function similarity detection method based on composite feature model (SDCFM). Firstly, the binary function is converted into vector representation according to the proposed composite feature model, which is composed of instruction statistical features, control flow graph structural features, and application program interface calling behavioral features. Then, the composite features are embedded by the proposed hierarchical embedding network based on a graph neural network. In which, the block-level features and the function-level features are processed separately and finally fused into the embedding. In addition, to make the trained model more accurate and stable, our method utilizes the embeddings of predecessor nodes to modify the node embedding in the iterative updating process of the graph neural network. To assess the effectiveness of composite feature model, we contrast SDCFM with the state of art method on benchmark datasets. The experimental results show that SDCFM has good performance both on the area under the curve in the binary function similarity detection task and the vulnerable candidate function ranking in vulnerability search task.

Efficient Similarity Analysis Methods for Same Open Source Functions in Different Versions (서로 다른 버전의 동일 오픈소스 함수 간 효율적인 유사도 분석 기법)

  • Kim, Yeongcheol;Cho, Eun-Sun
    • Journal of KIISE
    • /
    • v.44 no.10
    • /
    • pp.1019-1025
    • /
    • 2017
  • Binary similarity analysis is used in vulnerability analysis, malicious code analysis, and plagiarism detection. Proving that a function is equal to a well-known safe functions of different versions through similarity analysis can help to improve the efficiency of the binary code analysis of malicious behavior as well as the efficiency of vulnerability analysis. However, few studies have been carried out on similarity analysis of the same function of different versions. In this paper, we analyze the similarity of function units through various methods based on extractable function information from binary code, and find a way to analyze efficiently with less time. In particular, we perform a comparative analysis of the different versions of the OpenSSL library to determine the way in which similar functions are detected even when the versions differ.

A Study on Decision Tree for Multiple Binary Responses

  • Lee, Seong-Keon
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.971-980
    • /
    • 2003
  • The tree method can be extended to multivariate responses, such as repeated measure and longitudinal data, by modifying the split function so as to accommodate multiple responses. Recently, some decision trees for multiple responses have been constructed by Segal (1992) and Zhang (1998). Segal suggested a tree can analyze continuous longitudinal response using Mahalanobis distance for within node homogeneity measures and Zhang suggested a tree can analyze multiple binary responses using generalized entropy criterion which is proportional to maximum likelihood of joint distribution of multiple binary responses. In this paper, we will modify CART procedure and suggest a new tree-based method that can analyze multiple binary responses using similarity measures.

Cluster Analysis with Balancing Weight on Mixed-type Data

  • Chae, Seong-San;Kim, Jong-Min;Yang, Wan-Youn
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.719-732
    • /
    • 2006
  • A set of clustering algorithms with proper weight on the formulation of distance which extend to mixed numeric and multiple binary values is presented. A simple matching and Jaccard coefficients are used to measure similarity between objects for multiple binary attributes. Similarities are converted to dissimilarities between i th and j th objects. The performance of clustering algorithms with balancing weight on different similarity measures is demonstrated. Our experiments show that clustering algorithms with application of proper weight give competitive recovery level when a set of data with mixed numeric and multiple binary attributes is clustered.

A Similarity Solution for the Directional Casting of Peritectic Alloys in the Presence of Shrinkage-Induced Flow (체적수축유동이 있는 포정합금의 방향성주조에 대한 상사해)

  • Yu, Ho-Seon;Jeong, Jae-Dong;Lee, Jun-Sik
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.25 no.4
    • /
    • pp.485-495
    • /
    • 2001
  • This paper presents a similarity solution for the directional casting of binary peritectic alloys in the presence of shrinkage-induced flow. The present model retains essential ingredients of alloy solidification, such as temperature-solute coupling, macrosegregation, solid-liquid property differences, and finite back diffusion in the primary phase. An algorithm for simultaneously determining the peritectic and liquidus positions is newly developed, which proves to be more efficient and stable than the existing scheme. Sample calculations are performed for both hypo- and hyper-peritectic compositions. The results show that the present analysis is capable of properly resolving the solidification characteristics of peritectic alloys so that it can be used for validating numerical models as a test solution.

Effect of Radiation on Laminar Film Boiling of Binary Mixtures (2성분 혼합물질의 층류 막비등에서 복사열전달의 효과)

  • Seong Hyeon-Chan;Kim Kyoung-Hoon
    • Korean Journal of Air-Conditioning and Refrigeration Engineering
    • /
    • v.16 no.10
    • /
    • pp.942-951
    • /
    • 2004
  • This paper presents the results of a theoretical study of the effect of radiation during free convective laminar film boiling for methanol/water binary mixtures on an isothermal vertical wall at atmospheric pressure. With the well-known boundary layer theory as a basis, a theoretical model has been formulated into consideration for mass diffusion at liquid phase. The equations are numerically solved by a similarity method to investigate the effects of radiation emissivity on the surface with various parameters such as wall superheat and composition of more volatile component at liquid phase far from the wall. From the results, the distributions of the physical quantifies are investigated in both phases. New correlations are proposed to predict the heat transfer coefficient of binary mixtures. It is shown that the proposed correlations are in good agreement with numerical results and with Bromley's correlation within maximum $11\%$ errors. It is also found that as the wall superheat is increased, radiation effect becomes more important.

Detecting Software Similarity Using API Sequences on Static Major Paths (정적 주요 경로 API 시퀀스를 이용한 소프트웨어 유사성 검사)

  • Park, Seongsoo;Han, Hwansoo
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1007-1012
    • /
    • 2014
  • Software birthmarks are used to detect software plagiarism. For binaries, however, only a few birthmarks have been developed. In this paper, we propose a static approach to generate API sequences along major paths, which are analyzed from control flow graphs of the binaries. Since our API sequences are extracted along the most plausible paths of the binary codes, they can represent actual API sequences produced from binary executions, but in a more concise form. Our similarity measures use the Smith-Waterman algorithm that is one of the popular sequence alignment algorithms for DNA sequence analysis. We evaluate our static path-based API sequence with multiple versions of five applications. Our experiment indicates that our proposed method provides a quite reliable similarity birthmark for binaries.

A Study on the Performance of Similarity Indices and its Relationship with Link Prediction: a Two-State Random Network Case

  • Ahn, Min-Woo;Jung, Woo-Sung
    • Journal of the Korean Physical Society
    • /
    • v.73 no.10
    • /
    • pp.1589-1595
    • /
    • 2018
  • Similarity index measures the topological proximity of node pairs in a complex network. Numerous similarity indices have been defined and investigated, but the dependency of structure on the performance of similarity indices has not been sufficiently investigated. In this study, we investigated the relationship between the performance of similarity indices and structural properties of a network by employing a two-state random network. A node in a two-state network has binary types that are initially given, and a connection probability is determined from the state of the node pair. The performances of similarity indices are affected by the number of links and the ratio of intra-connections to inter-connections. Similarity indices have different characteristics depending on their type. Local indices perform well in small-size networks and do not depend on whether the structure is intra-dominant or inter-dominant. In contrast, global indices perform better in large-size networks, and some such indices do not perform well in an inter-dominant structure. We also found that link prediction performance and the performance of similarity are correlated in both model networks and empirical networks. This relationship implies that link prediction performance can be used as an approximation for the performance of the similarity index when information about node type is unavailable. This relationship may help to find the appropriate index for given networks.

A GA-based Binary Classification Method for Bankruptcy Prediction (도산예측을 위한 유전 알고리듬 기반 이진분류기법의 개발)

  • Min, Jae-H.;Jeong, Chul-Woo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.33 no.2
    • /
    • pp.1-16
    • /
    • 2008
  • The purpose of this paper is to propose a new binary classification method for predicting corporate failure based on genetic algorithm, and to validate its prediction power through empirical analysis. Establishing virtual companies representing bankrupt companies and non-bankrupt ones respectively, the proposed method measures the similarity between the virtual companies and the subject for prediction, and classifies the subject into either bankrupt or non-bankrupt one. The values of the classification variables of the virtual companies and the weights of the variables are determined by the proper model to maximize the hit ratio of training data set using genetic algorithm. In order to test the validity of the proposed method, we compare its prediction accuracy with ones of other existing methods such as multi-discriminant analysis, logistic regression, decision tree, and artificial neural network, and it is shown that the binary classification method we propose in this paper can serve as a premising alternative to the existing methods for bankruptcy prediction.

On the Categorical Variable Clustering

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.7 no.2
    • /
    • pp.219-226
    • /
    • 1996
  • Basic objective in cluster analysis is to discover natural groupings of items or variables. In general, variable clustering was conducted based on some similarity measures between variables which have binary characteristics. We propose a variable clustering method when variables have more categories ordered in some sense. We also consider some measures of association as a similarity between variables. Numerical example is included.

  • PDF