• 제목/요약/키워드: Content Similarity

검색결과 530건 처리시간 0.021초

Development of the Recommender System of Arabic Books Based on the Content Similarity

  • Alotaibi, Shaykhah Hajed;Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • 제22권8호
    • /
    • pp.175-186
    • /
    • 2022
  • This research article develops an Arabic books' recommendation system, which is based on the content similarity that assists users to search for the right book and predict the appropriate and suitable books pertaining to their literary style. In fact, the system directs its users toward books, which can meet their needs from a large dataset of Information. Further, this system makes its predictions based on a set of data that is gathered from different books and converts it to vectors by using the TF-IDF system. After that, the recommendation algorithms such as the cosine similarity, the sequence matcher similarity, and the semantic similarity aggregate data to produce an efficient and effective recommendation. This approach is advantageous in recommending previously unrated books to users with unique interests. It is found to be proven from the obtained results that the results of the cosine similarity of the full content of books, the results of the sequence matcher similarity of Arabic titles of the books, and the results of the semantic similarity of English titles of the books are the best obtained results, and extremely close to the average of the result related to the human assigned/annotated similarity. Flask web application is developed with a simple interface to show the recommended Arabic books by using cosine similarity, sequence matcher similarity, and semantic similarity algorithms with all experiments that are conducted.

다중레벨 벡터양자화 기반의 유사도를 이용한 자동 음악요약 (Automatic Music Summarization Using Similarity Measure Based on Multi-Level Vector Quantization)

  • 김성탁;김상호;김회린
    • The Journal of the Acoustical Society of Korea
    • /
    • 제26권2E호
    • /
    • pp.39-43
    • /
    • 2007
  • Music summarization refers to a technique which automatically extracts the most important and representative segments in music content. In this paper, we propose and evaluate a technique which provides the repeated part in music content as music summary. For extracting a repeated segment in music content, the proposed algorithm uses the weighted sum of similarity measures based on multi-level vector quantization for fixed-length summary or optimal-length summary. For similarity measures, count-based similarity measure and distance-based similarity measure are proposed. The number of the same codeword and the Mahalanobis distance of features which have same codeword at the same position in segments are used for count-based and distance-based similarity measure, respectively. Fixed-length music summary is evaluated by measuring the overlapping ratio between hand-made repeated parts and automatically generated ones. Optimal-length music summary is evaluated by calculating how much automatically generated music summary includes repeated parts of the music content. From experiments we observed that optimal-length summary could capture the repeated parts in music content more effectively in terms of summary length than fixed-length summary.

Learning Free Energy Kernel for Image Retrieval

  • Wang, Cungang;Wang, Bin;Zheng, Liping
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제8권8호
    • /
    • pp.2895-2912
    • /
    • 2014
  • Content-based image retrieval has been the most important technique for managing huge amount of images. The fundamental yet highly challenging problem in this field is how to measure the content-level similarity based on the low-level image features. The primary difficulties lie in the great variance within images, e.g. background, illumination, viewpoint and pose. Intuitively, an ideal similarity measure should be able to adapt the data distribution, discover and highlight the content-level information, and be robust to those variances. Motivated by these observations, we in this paper propose a probabilistic similarity learning approach. We first model the distribution of low-level image features and derive the free energy kernel (FEK), i.e., similarity measure, based on the distribution. Then, we propose a learning approach for the derived kernel, under the criterion that the kernel outputs high similarity for those images sharing the same class labels and output low similarity for those without the same label. The advantages of the proposed approach, in comparison with previous approaches, are threefold. (1) With the ability inherited from probabilistic models, the similarity measure can well adapt to data distribution. (2) Benefitting from the content-level hidden variables within the probabilistic models, the similarity measure is able to capture content-level cues. (3) It fully exploits class label in the supervised learning procedure. The proposed approach is extensively evaluated on two well-known databases. It achieves highly competitive performance on most experiments, which validates its advantages.

Retrieval of Scholarly Articles with Similar Core Contents

  • Liu, Rey-Long
    • International Journal of Knowledge Content Development & Technology
    • /
    • 제7권3호
    • /
    • pp.5-27
    • /
    • 2017
  • Retrieval of scholarly articles about a specific research issue is a routine job of researchers to cross-validate the evidence about the issue. Two articles that focus on a research issue should share similar terms in their core contents, including their goals, backgrounds, and conclusions. In this paper, we present a technique CCSE ($\underline{C}ore$ $\underline{C}ontent$ $\underline{S}imilarity$ $\underline{E}stimation$) that, given an article a, recommends those articles that share similar core content terms with a. CCSE works on titles and abstracts of articles, which are publicly available. It estimates and integrates three kinds of similarity: goal similarity, background similarity, and conclusion similarity. Empirical evaluation shows that CCSE performs significantly better than several state-of-the-art techniques in recommending those biomedical articles that are judged (by domain experts) to be the ones whose core contents focus on the same research issues. CCSE works for those articles that present research background followed by main results and discussion, and hence it may be used to support the identification of the closely related evidence already published in these articles, even when only titles and abstracts of the articles are available.

내용기반 이미지 검색을 위한 MPEG-7 우위컬러 기술자의 효과적인 유사도 (An Effective Similarity Measure for Content-Based Image Retrieval using MPEG-7 Dominant Color Descriptor)

  • 이종원;낭종호
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제16권8호
    • /
    • pp.837-841
    • /
    • 2010
  • 본 논문에서는 MPEG-7 DCD를 이용하여 내용기반 이미지 검색을 할 때 적합한 유사도 측정 방법을 제안한다. 제안한 방법은 이미지에서 추출한 도미넌트 컬러의 비율에 따라 유사도를 측정할 수 있도록 하였다. 실험결과 제안한 방법은 MPEG-7 DCD의 QHDM[1]에 의한 검색결과보다 전역 DCD를 사용할 경우 ANMRR이 18.9%의 성능향상을 보였으며 블록별 DCD를 사용할 경우 47.2%라는 높은 성능향상을 보였다. 이는 제안한 방법이 DCD를 이용하여 내용기반 이미지 검색을 할 때 효과적인 유사도 측정 방법임을 보여준다. 특히, 영역 기반의 이미지 검색 방법에 유용하게 적용할 수 있을 것으로 보인다.

이미지 데이터베이스 유사도 순위 매김 알고리즘 (A Similarity Ranking Algorithm for Image Databases)

  • 차광호
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제36권5호
    • /
    • pp.366-373
    • /
    • 2009
  • 이 논문은 이미지 데이터베이스를 위한 유사도 순위 매김 알고리즘을 제시한다. 이미지 검색의 문제점 중 하나가 이미지로부터 자동적으로 계산한 하위 레벨 특성과 인간 지각과의 의미 차이이며, 검색시에 이미지 유사도 측정을 위해 많은 알고리즘에서는 민코프스키 측정법($L_p$-norm)을 사용하고 있다. 그러나 민코프스키 측정법은 인간 시각 시스템의 비선형적 특성과 문맥 정보를 반영하지 못한다. 본 알고리즘에서는 인간 지각의 비선형성과 문맥 정보를 반영하는 유사도와 탐색 알고리즘을 통해 이 문제를 해결한다. 본 알고리즘을 필기체 숫자 이미지 데이터베이스에 적용하여 성능의 우수성과 효과를 증명하였다.

퍼지적분을 이용한 내용기반 검색 사용자 의견 반영시스템 (Relevance Feedback for Content Based Retrieval Using Fuzzy Integral)

  • Young Sik Choi
    • 인터넷정보학회논문지
    • /
    • 제1권2호
    • /
    • pp.89-96
    • /
    • 2000
  • 영상의 유사성에 대한 사용자의 주관적인지를 학습하는 방법으로 relevance feedback 기술이 사용되며, 최근 들어 이에 대한 관심이 높아지고 있다. 대부분의 relevance feedback기술은 영상 유사성을 측정하는데 사용되는 특징이 서로 독립적이라는 가정하고 있으나, 이러한 가정은 유사성 판단을 모델링 하는데 있어서 상당한 제약을 두는 것이다. 이 논문에서는. 퍼지 측정과 Choquet 적분을 이용하여, 유사성 판단에 대한 보다 나은 모델링 방법을 제안하고, 이를 이용한 relevance feedback 알고리즘을 제안한다. 실험결과를 통하여, 기존의 가중치 평균 방식에 의한 relevance feedback보다 제안된 방식이 우수함을 보인다.

  • PDF

Video Content Indexing using Kullback-Leibler Distance

  • Kim, Sang-Hyun
    • International Journal of Contents
    • /
    • 제5권4호
    • /
    • pp.51-54
    • /
    • 2009
  • In huge video databases, the effective video content indexing method is required. While manual indexing is the most effective approach to this goal, it is slow and expensive. Thus automatic indexing is desirable and recently various indexing tools for video databases have been developed. For efficient video content indexing, the similarity measure is an important factor. This paper presents new similarity measures between frames and proposes a new algorithm to index video content using Kullback-Leibler distance defined between two histograms. Experimental results show that the proposed algorithm using Kullback-Leibler distance gives remarkable high accuracy ratios compared with several conventional algorithms to index video content.

내용 및 유사도 검색을 위한 움직임 객체 모델링 (Moving Objects Modeling for Supporting Content and Similarity Searches)

  • 복경수;김미희;신재룡;유재수;조기형
    • 한국멀티미디어학회논문지
    • /
    • 제7권5호
    • /
    • pp.617-632
    • /
    • 2004
  • 비디오 데이터에는 시간의 변화에 따라 공간적인 위치가 변화하는 움직임 객체를 포함하고 있다. 이 논문에서는 비디오 데이터의 움직임 객체에 대한 새로운 모델링 방법을 제안한다. 제안하는 모델링은 움직임 객체를 효과적으로 검색하기 위해 시간의 변화에 따라 공간적인 위치와 크기 변화를 표현한다. 또한 객체의 시간에 따른 시각적 특징 변화와 객체의 방향, 거리 그리고 속도를 고려한 궤적을 표현한다. 따라서 움직임 객체의 시각적인 특징 유사도 검색, 거리 유사도 검색, 제적 유사도 검색을 수행할 수 있다. 또한 이들을 통합한 가중치 검색이 가능하도록 한다.

  • PDF

An approach for improving the performance of the Content-Based Image Retrieval (CBIR)

  • Jeong, Inseong
    • 한국측량학회지
    • /
    • 제30권6_2호
    • /
    • pp.665-672
    • /
    • 2012
  • Amid rapidly increasing imagery inputs and their volume in a remote sensing imagery database, Content-Based Image Retrieval (CBIR) is an effective tool to search for an image feature or image content of interest a user wants to retrieve. It seeks to capture salient features from a 'query' image, and then to locate other instances of image region having similar features elsewhere in the image database. For a CBIR approach that uses texture as a primary feature primitive, designing a texture descriptor to better represent image contents is a key to improve CBIR results. For this purpose, an extended feature vector combining the Gabor filter and co-occurrence histogram method is suggested and evaluated for quantitywise and qualitywise retrieval performance criterion. For the better CBIR performance, assessing similarity between high dimensional feature vectors is also a challenging issue. Therefore a number of distance metrics (i.e. L1 and L2 norm) is tried to measure closeness between two feature vectors, and its impact on retrieval result is analyzed. In this paper, experimental results are presented with several CBIR samples. The current results show that 1) the overall retrieval quantity and quality is improved by combining two types of feature vectors, 2) some feature is better retrieved by a specific feature vector, and 3) retrieval result quality (i.e. ranking of retrieved image tiles) is sensitive to an adopted similarity metric when the extended feature vector is employed.