• 제목/요약/키워드: Information Similarity

검색결과 2,858건 처리시간 0.026초

Comparison Analysis of Co-authorship Network and Citation Based Network for Author Research Similarity Exploration

  • 윤지영;송민
    • 한국문헌정보학회지
    • /
    • 제56권4호
    • /
    • pp.269-284
    • /
    • 2022
  • Exploring research similarity of researchers offers insight on research communities and potential interactions among scholars. While co-authorship is a popular measure for studying research similarity of researchers, it cannot provide insight on authors who have not collaborated yet. In this work, we present novel approach to capture research similarity of authors using citation information. Extensive study is conducted on DATA & KNOWLEDGE ENGINEERING (DKE) publications to demonstrate and compare suggested approach with co-authorship based approach. Analysis result shows that proposed approach distinguishes author relationships that is not shown in co-authorship network.

A Novel Similarity Measure for Sequence Data

  • Pandi, Mohammad. H.;Kashefi, Omid;Minaei, Behrouz
    • Journal of Information Processing Systems
    • /
    • 제7권3호
    • /
    • pp.413-424
    • /
    • 2011
  • A variety of different metrics has been introduced to measure the similarity of two given sequences. These widely used metrics are ranging from spell correctors and categorizers to new sequence mining applications. Different metrics consider different aspects of sequences, but the essence of any sequence is extracted from the ordering of its elements. In this paper, we propose a novel sequence similarity measure that is based on all ordered pairs of one sequence and where a Hasse diagram is built in the other sequence. In contrast with existing approaches, the idea behind the proposed sequence similarity metric is to extract all ordering features to capture sequence properties. We designed a clustering problem to evaluate our sequence similarity metric. Experimental results showed the superiority of our proposed sequence similarity metric in maximizing the purity of clustering compared to metrics such as d2, Smith-Waterman, Levenshtein, and Needleman-Wunsch. The limitation of those methods originates from some neglected sequence features, which are considered in our proposed sequence similarity metric.

A Method of Service Refinement for Network-Centric Operational Environment

  • Lee, Haejin;Kang, Dongsu
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권12호
    • /
    • pp.97-105
    • /
    • 2016
  • Network-Centric Operational Environment(NCOE) service becomes critical in today's military environment network because reusability of service and interaction are being increasingly important as well in business process. However, the refinement of service by semantic similarity and functional similarity at the business process was not detailed yet. In order to enhance accuracy of refining of business service, in this study, the authors introduce a method for refining service by semantic similarity and functional similarity in BPMN model. The business process are designed in a BPMN model. In this model, candidated services are refined through binding related activities by the analysis result of semantic similarity based on word-net and functional similarity based on properties specification between activities. Then, the services are identified through refining the candidated service. The proposed method is expected to enhance the service identification with accuracy and modularity. It also can accelerate more standardized service refinement developments by the proposed method.

Design of Solving Similarity Recognition for Cloth Products Based on Fuzzy Logic and Particle Swarm Optimization Algorithm

  • Chang, Bae-Muu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권10호
    • /
    • pp.4987-5005
    • /
    • 2017
  • This paper introduces a new method to solve Similarity Recognition for Cloth Products, which is based on Fuzzy logic and Particle swarm optimization algorithm. For convenience, it is called the SRCPFP method hereafter. In this paper, the SRCPFP method combines Fuzzy Logic (FL) and Particle Swarm Optimization (PSO) algorithm to solve similarity recognition for cloth products. First, it establishes three features, length, thickness, and temperature resistance, respectively, for each cloth product. Subsequently, these three features are engaged to construct a Fuzzy Inference System (FIS) which can find out the similarity between a query cloth and each sampling cloth in the cloth database D. At the same time, the FIS integrated with the PSO algorithm can effectively search for near optimal parameters of membership functions in eight fuzzy rules of the FIS for the above similarities. Finally, experimental results represent that the SRCPFP method can realize a satisfying recognition performance and outperform other well-known methods for similarity recognition under considerations here.

SSF: Sentence Similar Function Based on word2vector Similar Elements

  • Yuan, Xinpan;Wang, Songlin;Wan, Lanjun;Zhang, Chengyuan
    • Journal of Information Processing Systems
    • /
    • 제15권6호
    • /
    • pp.1503-1516
    • /
    • 2019
  • In this paper, to improve the accuracy of long sentence similarity calculation, we proposed a sentence similarity calculation method based on a system similarity function. The algorithm uses word2vector as the system elements to calculate the sentence similarity. The higher accuracy of our algorithm is derived from two characteristics: one is the negative effect of penalty item, and the other is that sentence similar function (SSF) based on word2vector similar elements doesn't satisfy the exchange rule. In later studies, we found the time complexity of our algorithm depends on the process of calculating similar elements, so we build an index of potentially similar elements when training the word vector process. Finally, the experimental results show that our algorithm has higher accuracy than the word mover's distance (WMD), and has the least query time of three calculation methods of SSF.

Collaborative Similarity Metric Learning for Semantic Image Annotation and Retrieval

  • Wang, Bin;Liu, Yuncai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제7권5호
    • /
    • pp.1252-1271
    • /
    • 2013
  • Automatic image annotation has become an increasingly important research topic owing to its key role in image retrieval. Simultaneously, it is highly challenging when facing to large-scale dataset with large variance. Practical approaches generally rely on similarity measures defined over images and multi-label prediction methods. More specifically, those approaches usually 1) leverage similarity measures predefined or learned by optimizing for ranking or annotation, which might be not adaptive enough to datasets; and 2) predict labels separately without taking the correlation of labels into account. In this paper, we propose a method for image annotation through collaborative similarity metric learning from dataset and modeling the label correlation of the dataset. The similarity metric is learned by simultaneously optimizing the 1) image ranking using structural SVM (SSVM), and 2) image annotation using correlated label propagation, with respect to the similarity metric. The learned similarity metric, fully exploiting the available information of datasets, would improve the two collaborative components, ranking and annotation, and sequentially the retrieval system itself. We evaluated the proposed method on Corel5k, Corel30k and EspGame databases. The results for annotation and retrieval show the competitive performance of the proposed method.

Similarity measurement based on Min-Hash for Preserving Privacy

  • Cha, Hyun-Jong;Yang, Ho-Kyung;Song, You-Jin
    • International Journal of Advanced Culture Technology
    • /
    • 제10권2호
    • /
    • pp.240-245
    • /
    • 2022
  • Because of the importance of the information, encryption algorithms are heavily used. Raw data is encrypted and secure, but problems arise when the key for decryption is exposed. In particular, large-scale Internet sites such as Facebook and Amazon suffer serious damage when user data is exposed. Recently, research into a new fourth-generation encryption technology that can protect user-related data without the use of a key required for encryption is attracting attention. Also, data clustering technology using encryption is attracting attention. In this paper, we try to reduce key exposure by using homomorphic encryption. In addition, we want to maintain privacy through similarity measurement. Additionally, holistic similarity measurements are time-consuming and expensive as the data size and scope increases. Therefore, Min-Hash has been studied to efficiently estimate the similarity between two signatures Methods of measuring similarity that have been studied in the past are time-consuming and expensive as the size and area of data increases. However, Min-Hash allowed us to efficiently infer the similarity between the two sets. Min-Hash is widely used for anti-plagiarism, graph and image analysis, and genetic analysis. Therefore, this paper reports privacy using homomorphic encryption and presents a model for efficient similarity measurement using Min-Hash.

A Table Integration Technique Using Query Similarity Analysis

  • Choi, Go-Bong;Woo, Yong-Tae
    • 한국컴퓨터정보학회논문지
    • /
    • 제24권3호
    • /
    • pp.105-112
    • /
    • 2019
  • In this paper, we propose a technique to analyze similarity between SQL queries and to assist integrating similar tables. First, the table information was extracted from the SQL queries through the query structure analyzer, and the similarity between the tables was measured using the Jacquard index technique. Then, similar table clusters are generated through hierarchical cluster analysis method and the co-occurence probability of the table used in the query is calculated. The possibility of integrating similar tables is classified by using the possibility of co-occurence of similarity table and table, and classifying them into an integrable cluster, a cluster requiring expert review, and a cluster with low integration possibility. This technique analyzes the SQL query in practice and analyse the possibility of table integration independent of the existing business, so that the existing schema can be effectively reconstructed without interruption of work or additional cost.

Collaborative Filtering Algorithm Based on User-Item Attribute Preference

  • Ji, JiaQi;Chung, Yeongjee
    • Journal of information and communication convergence engineering
    • /
    • 제17권2호
    • /
    • pp.135-141
    • /
    • 2019
  • Collaborative filtering algorithms often encounter data sparsity issues. To overcome this issue, auxiliary information of relevant items is analyzed and an item attribute matrix is derived. In this study, we combine the user-item attribute preference with the traditional similarity calculation method to develop an improved similarity calculation approach and use weights to control the importance of these two elements. A collaborative filtering algorithm based on user-item attribute preference is proposed. The experimental results show that the performance of the recommender system is the most optimal when the weight of traditional similarity is equal to that of user-item attribute preference similarity. Although the rating-matrix is sparse, better recommendation results can be obtained by adding a suitable proportion of user-item attribute preference similarity. Moreover, the mean absolute error of the proposed approach is less than that of two traditional collaborative filtering algorithms.

Similarity Measure Design on High Dimensional Data

  • Nipon, Theera-Umpon;Lee, Sanghyuk
    • 한국융합학회논문지
    • /
    • 제4권1호
    • /
    • pp.43-48
    • /
    • 2013
  • Designing of similarity on high dimensional data was done. Similarity measure between high dimensional data was considered by analysing neighbor information with respect to data sets. Obtained result could be applied to big data, because big data has multiple characteristics compared to simple data set. Definitely, analysis of high dimensional data could be the pre-study of big data. High dimensional data analysis was also compared with the conventional similarity. Traditional similarity measure on overlapped data was illustrated, and application to non-overlapped data was carried out. Its usefulness was proved by way of mathematical proof, and verified by calculation of similarity for artificial data example.