• Title/Summary/Keyword: retrieval effectiveness

Search Result 256, Processing Time 0.026 seconds

Construction of Theme Melody Index by Transforming Melody to Time-series Data for Content-based Music Information Retrieval (내용기반 음악정보 검색을 위한 선율의 시계열 데이터 변환을 이용한 주제선율색인 구성)

  • Ha, Jin-Seok;Ku, Kyong-I;Park, Jae-Hyun;Kim, Yoo-Sung
    • The KIPS Transactions:PartD
    • /
    • v.10D no.3
    • /
    • pp.547-558
    • /
    • 2003
  • From the viewpoint of that music melody has the similar features to time-series data, music melody is transformed to a time-series data with normalization and corrections and the similarity between melodies is defined as the Euclidean distance between the transformed time-series data. Then, based the similarity between melodies of a music object, melodies are clustered and the representative of each cluster is extracted as one of theme melodies for the music. To construct the theme melody index, a theme melody is represented as a point of the multidimensional metric space of M-tree. For retrieval of user's query melody, the query melody is also transformed into a time-series data by the same way of indexing phase. To retrieve the similar melodies to the query melody given by user from the theme melody index the range query search algorithm is used. By the implementation of the prototype system using the proposed theme melody index we show the effectiveness of the proposed methods.

Multi-Shape Retrieval Using Multi Curvature-Scale Space Descriptor (다중 곡률-단계 공간 기술자를 이용한 다중형상 검색)

  • Park, Sang Hyun;Lee, Soo-Chahn;Yun, Il-Dong
    • Journal of Broadcast Engineering
    • /
    • v.13 no.6
    • /
    • pp.962-965
    • /
    • 2008
  • 2-D shape descriptors, which are vectors representing characteristics of shapes, enable comparison and classification of shapes and are mainly applied to image and 3-D model retrieval. Existing descriptors have limitations that they only describe shapes of single closed contours or lack in precision, making it difficult to be applied to shapes with multiple contours. Therefore, in this paper, we propose a new shape descriptor called Multi-Curvature-Scale Space that can be applied to shapes with multiple contours. Specifically, we represent the topology of the sub-contours in the multi-contour along with Curvature-Scale Space descriptors to represent the shapes of each sub-contours. Also, by allowing the weight of each component to be controlled when computing the distance between descriptors the weight, we deal with ambiguities in measuring similarity between shapes. Results of various experiments that prove the effectiveness of proposed descriptor are presented.

Term Clustering and Duplicate Distribution for Efficient Parallel Information Retrieval (효율적인 병렬정보검색을 위한 색인어 군집화 및 분산저장 기법)

  • 강재호;양재완;정성원;류광렬;권혁철;정상화
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.129-139
    • /
    • 2003
  • The PC cluster architecture is considered as a cost-effective alternative to the existing supercomputers for realizing a high-performance information retrieval (IR) system. To implement an efficient IR system on a PC cluster, it is essential to achieve maximum parallelism by having the data appropriately distributed to the local hard disks of the PCs in such a way that the disk I/O and the subsequent computation are distributed as evenly as possible to all the PCs. If the terms in the inverted index file can be classified to closely related clusters, the parallelism can be maximized by distributing them to the PCs in an interleaved manner. One of the goals of this research is the development of methods for automatically clustering the terms based on the likelihood of the terms' co-occurrence in the same query. Also, in this paper, we propose a method for duplicate distribution of inverted index records among the PCs to achieve fault-tolerance as well as dynamic load balancing. Experiments with a large corpus revealed the efficiency and effectiveness of our method.

An Investigation on Non-Relevance Criteria for Image in Failed Image Search (이미지 검색 실패에 나타난 비적합성 평가요소 규명에 관한 연구)

  • Chung, EunKyung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.50 no.1
    • /
    • pp.417-435
    • /
    • 2016
  • Relevance judgment is important in terms of improving the effectiveness of information retrieval systems, and it has been dominant for users to search and use images utilizing internet and digital technologies. However, in the field of image retrieval, there have been only a few studies in terms of identifying relevance criteria. The purpose of this study aims to identify and characterize the non-relevance criteria from the failed image searches. In order to achieve the purpose of this study, a total of 135 participants were recruited and a total of 1,452 criteria items were collected for this study. Analyses and identification on the data set found thirteen criteria such as 'topicality', 'visual content', 'accuracy', 'visual feature', 'completeness', 'appeal to user', 'focal point', 'bibliographic information', 'impression', 'posture', 'face feature', 'novelty', and 'time frame'. Among these criteria, 'visual content' and 'focal point' were introduced in this current study, while 'action' criterion identified in previous studies was not shown in this current study. When image needs and image uses are analyzed with these criteria, there are distinctive differences depending on different image needs and uses.

Collection Fusion Algorithm in Distributed Multimedia Databases (분산 멀티미디어 데이터베이스에 대한 수집 융합 알고리즘)

  • Kim, Deok-Hwan;Lee, Ju-Hong;Lee, Seok-Lyong;Chung, Chin-Wan
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.406-417
    • /
    • 2001
  • With the advances in multimedia databases on the World Wide Web, it becomes more important to provide users with the search capability of distributed multimedia data. While there have been many studies about the database selection and the collection fusion for text databases. The multimedia databases on the Web have autonomous and heterogeneous properties and they use mainly the content based retrieval. The collection fusion problem of multimedia databases is concerned with the merging of results retrieved by content based retrieval from heterogeneous multimedia databases on the Web. This problem is crucial for the search in distributed multimedia databases, however, it has not been studied yet. This paper provides novel algorithms for processing the collection fusion of heterogeneous multimedia databases on the Web. We propose two heuristic algorithms for estimating the number of objects to be retrieved from local databases and an algorithm using the linear regression. Extensive experiments show the effectiveness and efficiency of these algorithms. These algorithms can provide the basis for the distributed content based retrieval algorithms for multimedia databases on the Web.

  • PDF

The Development of an Automatic Indexing System based on a Thesaurus (시소러스를 기반으로 하는 자동색인 시스템에 관한 연구)

  • 임형묵;정상철
    • Korean Journal of Cognitive Science
    • /
    • v.4 no.1
    • /
    • pp.213-242
    • /
    • 1993
  • During the past decades,several automatic indexing systems have been developed such as single term indexing.phrase indexing and thesaurus basedidndexing systems.Among these systems,single term indexing has been known as superior to others despte its simpicity of extracting meaningful terms.On the other hand,thesaurus based one has been conceived as producing low retrival rate ,mainly because thesauri do not usually have enough index terms.so that much of text data fail to be indexed if they do not match with any of index terms in thesauri.This paper develops a thesaurus based indexing system THINS that yields higher retrieval rate than other systems.by doing syntactic analysis of text data and matching them with index terms in thesauri partially.First,the system analyzes the input text syntactically by using the machine translation suystem MATES/EK and extracts noun phrases.After deleting stop words from noun phrases and stemming the remaining ones.it tries to index these with similar index terms in the thesaurus as much as possible. We conduct an experiment with CACM data set that measures the retrieval effectiveness with CACM data set that measures the retrieval effectuvenss of THINS with single term based one under HYKIS-a thesaurus based information retrieval system.It turns out that THINS yields about 10 percent higher precision than single term based one.while shows 8to9 percent lower recall.This retrieval rate shows that THINS improves much better than privious ones that only yields 25 or 30 percent lower precision than single term based one.We also argue that the relatively lower recall is cause by that CRCS-the thesaurus included in CACM datea set is very incomplete one,having only more than one thousand terms,thus THINS is expected to produce much higher rate if it is associated with currently available large thesaurus.

Emotion-based Video Scene Retrieval using Interactive Genetic Algorithm (대화형 유전자 알고리즘을 이용한 감성기반 비디오 장면 검색)

  • Yoo Hun-Woo;Cho Sung-Bae
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.10 no.6
    • /
    • pp.514-528
    • /
    • 2004
  • An emotion-based video scene retrieval algorithm is proposed in this paper. First, abrupt/gradual shot boundaries are detected in the video clip representing a specific story Then, five video features such as 'average color histogram' 'average brightness', 'average edge histogram', 'average shot duration', and 'gradual change rate' are extracted from each of the videos and mapping between these features and the emotional space that user has in mind is achieved by an interactive genetic algorithm. Once the proposed algorithm has selected videos that contain the corresponding emotion from initial population of videos, feature vectors from the selected videos are regarded as chromosomes and a genetic crossover is applied over them. Next, new chromosomes after crossover and feature vectors in the database videos are compared based on the similarity function to obtain the most similar videos as solutions of the next generation. By iterating above procedures, new population of videos that user has in mind are retrieved. In order to show the validity of the proposed method, six example categories such as 'action', 'excitement', 'suspense', 'quietness', 'relaxation', 'happiness' are used as emotions for experiments. Over 300 commercial videos, retrieval results show 70% effectiveness in average.

Viewpoint Invariant Person Re-Identification for Global Multi-Object Tracking with Non-Overlapping Cameras

  • Gwak, Jeonghwan;Park, Geunpyo;Jeon, Moongu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.4
    • /
    • pp.2075-2092
    • /
    • 2017
  • Person re-identification is to match pedestrians observed from non-overlapping camera views. It has important applications in video surveillance such as person retrieval, person tracking, and activity analysis. However, it is a very challenging problem due to illumination, pose and viewpoint variations between non-overlapping camera views. In this work, we propose a viewpoint invariant method for matching pedestrian images using orientation of pedestrian. First, the proposed method divides a pedestrian image into patches and assigns angle to a patch using the orientation of the pedestrian under the assumption that a person body has the cylindrical shape. The difference between angles are then used to compute the similarity between patches. We applied the proposed method to real-time global multi-object tracking across multiple disjoint cameras with non-overlapping field of views. Re-identification algorithm makes global trajectories by connecting local trajectories obtained by different local trackers. The effectiveness of the viewpoint invariant method for person re-identification was validated on the VIPeR dataset. In addition, we demonstrated the effectiveness of the proposed approach for the inter-camera multiple object tracking on the MCT dataset with ground truth data for local tracking.

Determining the optimal number of cases to combine in a case-based reasoning system for eCRM

  • Hyunchul Ahn;Kim, Kyoung-jae;Ingoo Han
    • Proceedings of the KAIS Fall Conference
    • /
    • 2003.11a
    • /
    • pp.178-184
    • /
    • 2003
  • Case-based reasoning (CBR) often shows significant promise for improving effectiveness of complex and unstructured decision making. Consequently, it has been applied to various problem-solving areas including manufacturing, finance and marketing. However, the design of appropriate case indexing and retrieval mechanisms to improve the performance of CBR is still challenging issue. Most of previous studies to improve the effectiveness for CBR have focused on the similarity function or optimization of case features and their weights. However, according to some of prior researches, finding the optimal k parameter for k-nearest neighbor (k-NN) is also crucial to improve the performance of CBR system. Nonetheless, there have been few attempts which have tried to optimize the number of neighbors, especially using artificial intelligence (AI) techniques. In this study, we introduce a genetic algorithm (GA) to optimize the number of neighbors to combine. This study applies the new model to the real-world case provided by an online shopping mall in Korea. Experimental results show that a GA-optimized k-NN approach outperforms other AI techniques for purchasing behavior forecasting.

  • PDF

Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text

  • Atwan, Jaffar
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.65-74
    • /
    • 2022
  • In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf's law, and Combined Stop-list. An experiment was conducted using a selected file from the Arabic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.