• Title/Summary/Keyword: Content Similarity

Search Result 530, Processing Time 0.025 seconds

Development of the Recommender System of Arabic Books Based on the Content Similarity

  • Alotaibi, Shaykhah Hajed;Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.175-186
    • /
    • 2022
  • This research article develops an Arabic books' recommendation system, which is based on the content similarity that assists users to search for the right book and predict the appropriate and suitable books pertaining to their literary style. In fact, the system directs its users toward books, which can meet their needs from a large dataset of Information. Further, this system makes its predictions based on a set of data that is gathered from different books and converts it to vectors by using the TF-IDF system. After that, the recommendation algorithms such as the cosine similarity, the sequence matcher similarity, and the semantic similarity aggregate data to produce an efficient and effective recommendation. This approach is advantageous in recommending previously unrated books to users with unique interests. It is found to be proven from the obtained results that the results of the cosine similarity of the full content of books, the results of the sequence matcher similarity of Arabic titles of the books, and the results of the semantic similarity of English titles of the books are the best obtained results, and extremely close to the average of the result related to the human assigned/annotated similarity. Flask web application is developed with a simple interface to show the recommended Arabic books by using cosine similarity, sequence matcher similarity, and semantic similarity algorithms with all experiments that are conducted.

Automatic Music Summarization Using Similarity Measure Based on Multi-Level Vector Quantization (다중레벨 벡터양자화 기반의 유사도를 이용한 자동 음악요약)

  • Kim, Sung-Tak;Kim, Sang-Ho;Kim, Hoi-Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.2E
    • /
    • pp.39-43
    • /
    • 2007
  • Music summarization refers to a technique which automatically extracts the most important and representative segments in music content. In this paper, we propose and evaluate a technique which provides the repeated part in music content as music summary. For extracting a repeated segment in music content, the proposed algorithm uses the weighted sum of similarity measures based on multi-level vector quantization for fixed-length summary or optimal-length summary. For similarity measures, count-based similarity measure and distance-based similarity measure are proposed. The number of the same codeword and the Mahalanobis distance of features which have same codeword at the same position in segments are used for count-based and distance-based similarity measure, respectively. Fixed-length music summary is evaluated by measuring the overlapping ratio between hand-made repeated parts and automatically generated ones. Optimal-length music summary is evaluated by calculating how much automatically generated music summary includes repeated parts of the music content. From experiments we observed that optimal-length summary could capture the repeated parts in music content more effectively in terms of summary length than fixed-length summary.

Learning Free Energy Kernel for Image Retrieval

  • Wang, Cungang;Wang, Bin;Zheng, Liping
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.8
    • /
    • pp.2895-2912
    • /
    • 2014
  • Content-based image retrieval has been the most important technique for managing huge amount of images. The fundamental yet highly challenging problem in this field is how to measure the content-level similarity based on the low-level image features. The primary difficulties lie in the great variance within images, e.g. background, illumination, viewpoint and pose. Intuitively, an ideal similarity measure should be able to adapt the data distribution, discover and highlight the content-level information, and be robust to those variances. Motivated by these observations, we in this paper propose a probabilistic similarity learning approach. We first model the distribution of low-level image features and derive the free energy kernel (FEK), i.e., similarity measure, based on the distribution. Then, we propose a learning approach for the derived kernel, under the criterion that the kernel outputs high similarity for those images sharing the same class labels and output low similarity for those without the same label. The advantages of the proposed approach, in comparison with previous approaches, are threefold. (1) With the ability inherited from probabilistic models, the similarity measure can well adapt to data distribution. (2) Benefitting from the content-level hidden variables within the probabilistic models, the similarity measure is able to capture content-level cues. (3) It fully exploits class label in the supervised learning procedure. The proposed approach is extensively evaluated on two well-known databases. It achieves highly competitive performance on most experiments, which validates its advantages.

Retrieval of Scholarly Articles with Similar Core Contents

  • Liu, Rey-Long
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.7 no.3
    • /
    • pp.5-27
    • /
    • 2017
  • Retrieval of scholarly articles about a specific research issue is a routine job of researchers to cross-validate the evidence about the issue. Two articles that focus on a research issue should share similar terms in their core contents, including their goals, backgrounds, and conclusions. In this paper, we present a technique CCSE ($\underline{C}ore$ $\underline{C}ontent$ $\underline{S}imilarity$ $\underline{E}stimation$) that, given an article a, recommends those articles that share similar core content terms with a. CCSE works on titles and abstracts of articles, which are publicly available. It estimates and integrates three kinds of similarity: goal similarity, background similarity, and conclusion similarity. Empirical evaluation shows that CCSE performs significantly better than several state-of-the-art techniques in recommending those biomedical articles that are judged (by domain experts) to be the ones whose core contents focus on the same research issues. CCSE works for those articles that present research background followed by main results and discussion, and hence it may be used to support the identification of the closely related evidence already published in these articles, even when only titles and abstracts of the articles are available.

An Effective Similarity Measure for Content-Based Image Retrieval using MPEG-7 Dominant Color Descriptor (내용기반 이미지 검색을 위한 MPEG-7 우위컬러 기술자의 효과적인 유사도)

  • Lee, Jong-Won;Nang, Jong-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.8
    • /
    • pp.837-841
    • /
    • 2010
  • This paper proposes an effective similarity measure for content-based image retrieval using MPEG-7 DCD. The proposed method can measure the similarity of images with the percentage of dominant colors extracted from images. As the result of experiments, we achieved a significant improvement of 18.92% with global DCD and 47.22% with local DCD in ANMRR than the result by QHDM. This result shows that the proposed method is an effective similarity measure for content-based image retrieval. Especially, our method is useful for region-based image retrieval.

A Similarity Ranking Algorithm for Image Databases (이미지 데이터베이스 유사도 순위 매김 알고리즘)

  • Cha, Guang-Ho
    • Journal of KIISE:Databases
    • /
    • v.36 no.5
    • /
    • pp.366-373
    • /
    • 2009
  • In this paper, we propose a similarity search algorithm for image databases. One of the central problems regarding content-based image retrieval (CBIR) is the semantic gap between the low-level features computed automatically from images and the human interpretation of image content. Many search algorithms used in CBIR have used the Minkowski metric (or $L_p$-norm) to measure similarity between image pairs. However those functions cannot adequately capture the aspects of the characteristics of the human visual system as well as the nonlinear relationships in contextual information. Our new search algorithm tackles this problem by employing new similarity measures and ranking strategies that reflect the nonlinearity of human perception and contextual information. Our search algorithm yields superior experimental results on a real handwritten digit image database and demonstrates its effectiveness.

Relevance Feedback for Content Based Retrieval Using Fuzzy Integral (퍼지적분을 이용한 내용기반 검색 사용자 의견 반영시스템)

  • Young Sik Choi
    • Journal of Internet Computing and Services
    • /
    • v.1 no.2
    • /
    • pp.89-96
    • /
    • 2000
  • Relevance feedback is a technique to learn the user's subjective perception of similarity between images, and has recently gained attention in Content Based Image Retrieval. Most relevance feedback methods assume that the individual features that are used in similarity judgments do not interact with each other. However, this assumption severely limits the types of similarity judgments that can be modeled In this paper, we explore a more sophisticated model for similarity judgments based on fuzzy measures and the Choquet Integral, and propose a suitable algorithm for relevance feedback, Experimental results show that the proposed method is preferable to traditional weighted- average techniques.

  • PDF

Video Content Indexing using Kullback-Leibler Distance

  • Kim, Sang-Hyun
    • International Journal of Contents
    • /
    • v.5 no.4
    • /
    • pp.51-54
    • /
    • 2009
  • In huge video databases, the effective video content indexing method is required. While manual indexing is the most effective approach to this goal, it is slow and expensive. Thus automatic indexing is desirable and recently various indexing tools for video databases have been developed. For efficient video content indexing, the similarity measure is an important factor. This paper presents new similarity measures between frames and proposes a new algorithm to index video content using Kullback-Leibler distance defined between two histograms. Experimental results show that the proposed algorithm using Kullback-Leibler distance gives remarkable high accuracy ratios compared with several conventional algorithms to index video content.

Moving Objects Modeling for Supporting Content and Similarity Searches (내용 및 유사도 검색을 위한 움직임 객체 모델링)

  • 복경수;김미희;신재룡;유재수;조기형
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.5
    • /
    • pp.617-632
    • /
    • 2004
  • Video Data includes moving objects which change spatial positions as time goes by. In this paper, we propose a new modeling method for a moving object contained in the video data. In order to effectively retrieve moving objects, the proposed modeling method represents the spatial position and the size of a moving object. It also represents the visual features and the trajectory by considering direction, distance and speed or moving objects as time goes by. Therefore, It allows various types of retrieval such as visual feature based similarity retrieval, distance based similarity retrieval and trajectory based similarity retrieval and their mixed type of weighted retrieval.

  • PDF

An approach for improving the performance of the Content-Based Image Retrieval (CBIR)

  • Jeong, Inseong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.30 no.6_2
    • /
    • pp.665-672
    • /
    • 2012
  • Amid rapidly increasing imagery inputs and their volume in a remote sensing imagery database, Content-Based Image Retrieval (CBIR) is an effective tool to search for an image feature or image content of interest a user wants to retrieve. It seeks to capture salient features from a 'query' image, and then to locate other instances of image region having similar features elsewhere in the image database. For a CBIR approach that uses texture as a primary feature primitive, designing a texture descriptor to better represent image contents is a key to improve CBIR results. For this purpose, an extended feature vector combining the Gabor filter and co-occurrence histogram method is suggested and evaluated for quantitywise and qualitywise retrieval performance criterion. For the better CBIR performance, assessing similarity between high dimensional feature vectors is also a challenging issue. Therefore a number of distance metrics (i.e. L1 and L2 norm) is tried to measure closeness between two feature vectors, and its impact on retrieval result is analyzed. In this paper, experimental results are presented with several CBIR samples. The current results show that 1) the overall retrieval quantity and quality is improved by combining two types of feature vectors, 2) some feature is better retrieved by a specific feature vector, and 3) retrieval result quality (i.e. ranking of retrieved image tiles) is sensitive to an adopted similarity metric when the extended feature vector is employed.