• Title/Summary/Keyword: 자카드 계수

Search Result 11, Processing Time 0.025 seconds

Jaccard Index Reflecting Time-Context for User-based Collaborative Filtering

  • Soojung Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.163-170
    • /
    • 2023
  • The user-based collaborative filtering technique, one of the implementation methods of the recommendation system, recommends the preferred items of neighboring users based on the calculations of neighboring users with similar rating histories. However, it fundamentally has a data scarcity problem in which the quality of recommendations is significantly reduced when there is little common rating history. To solve this problem, many existing studies have proposed various methods of combining Jaccard index with a similarity measure. In this study, we introduce a time-aware concept to Jaccard index and propose a method of weighting common items with different weights depending on the rating time. As a result of conducting experiments using various performance metrics and time intervals, it is confirmed that the proposed method showed the best performance compared to the original Jaccard index at most metrics, and that the optimal time interval differs depending on the type of performance metric.

Applying Different Similarity Measures based on Jaccard Index in Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.5
    • /
    • pp.47-53
    • /
    • 2021
  • Sparse ratings data hinder reliable similarity computation between users, which degrades the performance of memory-based collaborative filtering techniques for recommender systems. Many works in the literature have been developed for solving this data sparsity problem, where the most simple and representative ones are the methods of utilizing Jaccard index. This index reflects the number of commonly rated items between two users and is mostly integrated into traditional similarity measures to compute similarity more accurately between the users. However, such integration is very straightforward with no consideration of the degree of data sparsity. This study suggests a novel idea of applying different similarity measures depending on the numeric value of Jaccard index between two users. Performance experiments are conducted to obtain optimal values of the parameters used by the proposed method and evaluate it in comparison with other relevant methods. As a result, the proposed demonstrates the best and comparable performance in prediction and recommendation accuracies.

Hierarchic Document Clustering in OPAC (OPAC에서 자동분류 열람을 위한 계층 클러스터링 연구)

  • 노정순
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.1
    • /
    • pp.93-117
    • /
    • 2004
  • This study is to develop a hierarchic clustering model fur document classification and browsing in OPAC systems. Two automatic indexing techniques (with and without controlled terms), two term weighting methods (based on term frequency and binary weight), five similarity coefficients (Dice, Jaccard, Pearson, Cosine, and Squared Euclidean). and three hierarchic clustering algorithms (Between Average Linkage, Within Average Linkage, and Complete Linkage method) were tested on the document collection of 175 books and theses on library and information science. The best document clusters resulted from the Between Average Linkage or Complete Linkage method with Jaccard or Dice coefficient on the automatic indexing with controlled terms in binary vector. The clusters from Between Average Linkage with Jaccard has more likely decimal classification structure.

Comparing the Performance of Global Query Expansion according to Similarity Measures (유사계수에 따른 전역적 질의확장 검색 성능 비교)

  • 이재윤
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.526-528
    • /
    • 2003
  • 공기빈도를 이용한 전역적 질의확장 검색에서 공기유사도를 판정하는데 이용되는 유사계수의 특성에 따른 질의확장 성능을 비교해보았다. 먼저 각 유사계수의 통계적인 특성을 말뭉치와 검색실험 문서집단을 대상으로 살펴본 결과 코사인 계수, 자카드 계수는 고빈도어 선호경향을 보이고 상호정보량과 율의 Y는 저빈도어 선호경향을 보이는 것으로 나타났다. 질의확장 검색실험에서는 고빈도어 선호경향을 가진 유사계수에 비해서 저빈도어 선호경향을 가진 유사계수률 이용할 때 더 종은 성능이 나타났다. 특히 율의 Y는 질의어의 DF가 1에 가깝게 매우 낮을 때 다른 유사계수와 달리 고빈도어를 선호함으로써 항상 저빈도어를 선호하는 상호정보량에 비해서 질의확장 검색에 유리함을 알 수가 있었다.

  • PDF

The Effectiveness of Hierarchic Clustering on Query Results in OPAC (OPAC에서 탐색결과의 클러스터링에 관한 연구)

  • Ro, Jung-Soon
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.38 no.1
    • /
    • pp.35-50
    • /
    • 2004
  • This study evaluated the applicability of the static hierarchic clustering model to clustering query results in OPAC. Two clustering methods(Between Average Linkage(BAL) and Complete Linkage(CL)) and two similarity coefficients(Dice and Jaccard) were tested on the query results retrieved from 16 title-based keyword searchings. The precision of optimal dusters was improved more than 100% compared with title-word searching. There was no difference between similarity coefficients but clustering methods in optimal cluster effectiveness. CL method is better in precision ratio but BAL is better in recall ratio at the optimal top-level and bottom-level clusters. However the differences are not significant except higher recall ratio of BAL at the top-level duster. Small number of clusters and long chain of hierarchy for optimal cluster resulted from BAL could not be desirable and efficient.

Development of a Clustering Model for Automatic Knowledge Classification (지식 분류의 자동화를 위한 클러스터링 모형 연구)

  • 정영미;이재윤
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.2
    • /
    • pp.203-230
    • /
    • 2001
  • The purpose of this study is to develop a document clustering model for automatic classification of knowledge. Two test collections of newspaper article texts and journal article abstracts are built for the clustering experiment. Various feature reduction criteria as well as term weighting methods are applied to the term sets of the test collections, and cosine and Jaccard coefficients are used as similarity measures. The performances of complete linkage and K-means clustering algorithms are compared using different feature selection methods and various term weights. It was found that complete linkage clustering outperforms K-means algorithm and feature reduction up to almost 10% of the total feature sets does not lower the performance of document clustering to any significant extent.

  • PDF

Road Tracking based on Prior Information in Video Sequences (비디오 영상에서 사전정보 기반의 도로 추적)

  • Lee, Chang Woo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.18 no.2
    • /
    • pp.19-25
    • /
    • 2013
  • In this paper, we propose an approach to tracking road regions from video sequences. The proposed method segments and tracks road regions by utilizing the prior information from the result of the previous frame. For the efficiency of the system, we have a simple assumption that the road region is usually shown in the lower part of input images so that lower 60% of input images is set to the region of interest(ROI). After initial segmentation using flood-fill algorithm, we merge neighboring regions based on color similarity measure. The previous segmentation result, in which seed points for the successive frame are extracted, is used as prior information to segment the current frame. The similarity between the road region of the previous frame and that of the current frame is measured by the modified Jaccard coefficient. According to the similarity we refine and track the detected road regions. The experimental results reveal that the proposed method is effective to segment and track road regions in noisy and non-noisy environments.

Application of Indicator Geostatistics for Probabilistic Uncertainty and Risk Analyses of Geochemical Data (지화학 자료의 확률론적 불확실성 및 위험성 분석을 위한 지시자 지구통계학의 응용)

  • Park, No-Wook
    • Journal of the Korean earth science society
    • /
    • v.31 no.4
    • /
    • pp.301-312
    • /
    • 2010
  • Geochemical data have been regarded as one of the important environmental variables in the environmental management. Since they are often sampled at sparse locations, it is important not only to predict attribute values at unsampled locations, but also to assess the uncertainty attached to the prediction for further analysis. The main objective of this paper is to exemplify how indicator geostatistics can be effectively applied to geochemical data processing for providing decision-supporting information as well as spatial distribution of the geochemical data. A whole geostatistical analysis framework, which includes probabilistic uncertainty modeling, classification and risk analysis, was illustrated through a case study of cadmium mapping. A conditional cumulative distribution function (ccdf) was first modeled by indicator kriging, and then e-type estimates and conditional variance were computed for spatial distribution of cadmium and quantitative uncertainty measures, respectively. Two different classification criteria such as a probability thresholding and an attribute thresholding were applied to delineate contaminated and safe areas. Finally, additional sampling locations were extracted from the coefficient of variation that accounts for both the conditional variance and the difference between attribute values and thresholding values. It is suggested that the indicator geostatistical framework illustrated in this study be a useful tool for analyzing any environmental variables including geochemical data for decision-making in the presence of uncertainty.

Applicability for Detecting Trails by Using KOMPSAT Imagery (등산로 탐지를 위한 KOMPSAT 영상의 활용가능성)

  • Bae, Jinsu;Yim, Jongseo;Shin, Young Ho
    • Journal of the Korean Geographical Society
    • /
    • v.50 no.6
    • /
    • pp.607-619
    • /
    • 2015
  • It is important to detect trails accurately for finding a proper management. We examined the applicability of KOMPSAT imagery to detect trails and found that it could be an efficient alternative to track trails correctly. We selected K2 and K3 imagery with different spatial resolution. Then, we processed each imagery to get NDVI, SAVI, and SC data. And then, we identified trails by object-based analysis and network analysis. Finally, we evaluated the potential trails with F-measurement and Jaccard coefficient which are based on correctness and completeness. The results show that the applicability is quite different in each case. Among them, especially the SC data with K3 shows the most highest value; correctness of detecting legal trails is 0.44 and completeness of that is 0.54. F-measurement and Jaccard coefficient are 0.49 and 0.32. In general, although there is a limit in detecting trails by using only KOMPSAT imagery, the usefulness of KOMPSAT imagery can be a higher considering its cost efficiency and availability of acquiring periodic data.

  • PDF

Patent data analysis using clique analysis in a keyword network (키워드 네트워크의 클릭 분석을 이용한 특허 데이터 분석)

  • Kim, Hyon Hee;Kim, Donggeon;Jo, Jinnam
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.5
    • /
    • pp.1273-1284
    • /
    • 2016
  • In this paper, we analyzed the patents on machine learning using keyword network analysis and clique analysis. To construct a keyword network, important keywords were extracted based on the TF-IDF weight and their association, and network structure analysis and clique analysis was performed. Density and clustering coefficient of the patent keyword network are low, which shows that patent keywords on machine learning are weakly connected with each other. It is because the important patents on machine learning are mainly registered in the application system of machine learning rather thant machine learning techniques. Also, our results of clique analysis showed that the keywords found by cliques in 2005 patents are the subjects such as newsmaker verification, product forecasting, virus detection, biomarkers, and workflow management, while those in 2015 patents contain the subjects such as digital imaging, payment card, calling system, mammogram system, price prediction, etc. The clique analysis can be used not only for identifying specialized subjects, but also for search keywords in patent search systems.