• 제목/요약/키워드: Bag of Visual Words

검색결과 21건 처리시간 0.023초

Bag of Visual Words Method based on PLSA and Chi-Square Model for Object Category

  • Zhao, Yongwei;Peng, Tianqiang;Li, Bicheng;Ke, Shengcai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권7호
    • /
    • pp.2633-2648
    • /
    • 2015
  • The problem of visual words' synonymy and ambiguity always exist in the conventional bag of visual words (BoVW) model based object category methods. Besides, the noisy visual words, so-called "visual stop-words" will degrade the semantic resolution of visual dictionary. In view of this, a novel bag of visual words method based on PLSA and chi-square model for object category is proposed. Firstly, Probabilistic Latent Semantic Analysis (PLSA) is used to analyze the semantic co-occurrence probability of visual words, infer the latent semantic topics in images, and get the latent topic distributions induced by the words. Secondly, the KL divergence is adopt to measure the semantic distance between visual words, which can get semantically related homoionym. Then, adaptive soft-assignment strategy is combined to realize the soft mapping between SIFT features and some homoionym. Finally, the chi-square model is introduced to eliminate the "visual stop-words" and reconstruct the visual vocabulary histograms. Moreover, SVM (Support Vector Machine) is applied to accomplish object classification. Experimental results indicated that the synonymy and ambiguity problems of visual words can be overcome effectively. The distinguish ability of visual semantic resolution as well as the object classification performance are substantially boosted compared with the traditional methods.

시계열 스트리트뷰 데이터베이스를 이용한 시각적 위치 인식 알고리즘 (Visual Location Recognition Using Time-Series Streetview Database)

  • 박천수;최준연
    • 반도체디스플레이기술학회지
    • /
    • 제18권4호
    • /
    • pp.57-61
    • /
    • 2019
  • Nowadays, portable digital cameras such as smart phone cameras are being popularly used for entertainment and visual information recording. Given a database of geo-tagged images, a visual location recognition system can determine the place depicted in a query photo. One of the most common visual location recognition approaches is the bag-of-words method where local image features are clustered into visual words. In this paper, we propose a new bag-of-words-based visual location recognition algorithm using time-series streetview database. The proposed algorithm selects only a small subset of image features which will be used in image retrieval process. By reducing the number of features to be used, the proposed algorithm can reduce the memory requirement of the image database and accelerate the retrieval process.

영상 기반 위치 인식을 위한 대규모 언어-이미지 모델 기반의 Bag-of-Objects 표현 (Large-scale Language-image Model-based Bag-of-Objects Extraction for Visual Place Recognition)

  • 정승운;박병재
    • 센서학회지
    • /
    • 제33권2호
    • /
    • pp.78-85
    • /
    • 2024
  • We proposed a method for visual place recognition that represents images using objects as visual words. Visual words represent the various objects present in urban environments. To detect various objects within the images, we implemented and used a zero-shot detector based on a large-scale image language model. This zero-shot detector enables the detection of various objects in urban environments without additional training. In the process of creating histograms using the proposed method, frequency-based weighting was applied to consider the importance of each object. Through experiments with open datasets, the potential of the proposed method was demonstrated by comparing it with another method, even in situations involving environmental or viewpoint changes.

Object Classification based on Weakly Supervised E2LSH and Saliency map Weighting

  • Zhao, Yongwei;Li, Bicheng;Liu, Xin;Ke, Shengcai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권1호
    • /
    • pp.364-380
    • /
    • 2016
  • The most popular approach in object classification is based on the bag of visual-words model, which has several fundamental problems that restricting the performance of this method, such as low time efficiency, the synonym and polysemy of visual words, and the lack of spatial information between visual words. In view of this, an object classification based on weakly supervised E2LSH and saliency map weighting is proposed. Firstly, E2LSH (Exact Euclidean Locality Sensitive Hashing) is employed to generate a group of weakly randomized visual dictionary by clustering SIFT features of the training dataset, and the selecting process of hash functions is effectively supervised inspired by the random forest ideas to reduce the randomcity of E2LSH. Secondly, graph-based visual saliency (GBVS) algorithm is applied to detect the saliency map of different images and weight the visual words according to the saliency prior. Finally, saliency map weighted visual language model is carried out to accomplish object classification. Experimental results datasets of Pascal 2007 and Caltech-256 indicate that the distinguishability of objects is effectively improved and our method is superior to the state-of-the-art object classification methods.

이미지 단어집과 관심영역 자동추출을 사용한 이미지 분류 (Image Classification Using Bag of Visual Words and Visual Saliency Model)

  • 장현웅;조수선
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제3권12호
    • /
    • pp.547-552
    • /
    • 2014
  • 플리커, 페이스북과 같은 대용량 소셜 미디어 공유 사이트의 발전으로 이미지 정보가 매우 빠르게 증가하고 있다. 이에 따라 소셜 이미지를 정확하게 검색하기 위한 다양한 연구가 활발히 진행되고 있다. 이미지 태그들의 의미적 연관성을 이용하여 태그기반의 이미지 검색의 정확도를 높이고자 하는 연구를 비롯하여 이미지 단어집(Bag of Visual Words)을 기반으로 웹 이미지를 분류하는 연구도 다양하게 진행되고 있다. 본 논문에서는 이미지에서 배경과 같은 중요도가 떨어지는 정보를 제거하여 중요부분을 찾는 GBVS(Graph Based Visual Saliency)모델을 기존 연구에 사용할 것을 제안한다. 제안하는 방법은 첫 번째, 이미지 태그들의 의미적 연관성을 이용해 1차 분류된 데이터베이스에 SIFT알고리즘을 사용하여 이미지 단어집(BoVW)을 만든다. 두 번째, 테스트할 이미지에 GBVS를 통해서 이미지의 관심영역을 선택하여 테스트한다. 의미연관성 태그와 SIFT기반의 이미지 단어집을 사용한 기존의 방법에 GBVS를 적용한 결과 더 높은 정확도를 보임을 확인하였다.

특징, 색상 및 텍스처 정보의 가공을 이용한 Bag of Visual Words 이미지 자동 분류 (Improved Bag of Visual Words Image Classification Using the Process of Feature, Color and Texture Information)

  • 박찬혁;권혁신;강석훈
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2015년도 추계학술대회
    • /
    • pp.79-82
    • /
    • 2015
  • 이미지를 분류하고 검색하는 기술(Image retrieval)중 하나인 Bag of visual words(BoVW)는 특징점(feature point)을 이용하는 방법으로 데이터베이스의 이미지 특징벡터들의 분포를 통해 쿼리 이미지를 자동으로 분류하고 검색해주는 시스템이다. Words를 구성하는데 특징벡터만을 이용하는 기존의 방법은 이용자가 원하지 않는 이미지를 검색하거나 분류할 수 있다. 이러한 단점을 해결하기 위해 특징벡터뿐만 아니라 이미지의 전체적인 분위기를 표현할 수 있는 색상정보나 반복되는 패턴 정보를 표현할 수 있는 텍스처 정보를 Words를 구성하는데 포함시킴으로서 다양한 검색을 가능하게 한다. 실험 부분에서는 특징정보만을 가진 words를 이용해 이미지를 분류한 결과와 색상정보와 텍스처 정보가 추가된 words를 가지고 이미지를 분류한 결과를 비교하였고 새로운 방법은 80~90%의 정확도를 나타내었다.

  • PDF

VILODE : 키 프레임 영상과 시각 단어들을 이용한 실시간 시각 루프 결합 탐지기 (VILODE : A Real-Time Visual Loop Closure Detector Using Key Frames and Bag of Words)

  • 김혜숙;김인철
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제4권5호
    • /
    • pp.225-230
    • /
    • 2015
  • 본 논문에서는 키 프레임 영상과 SURF 특징점 기반의 시각 단어들을 이용한 효과적인 실시간 시각 루프 결합 탐지기 VILODE를 제안한다. 시각 루프 결합 탐지기는 과거에 지나온 위치들 중 하나를 다시 재방문하였는지를 판단하기 위해, 새로운 입력 영상을 이미 지나온 위치들에서 수집한 과거 영상들과 모두 비교해보아야 한다. 따라서 새로운 위치나 장소를 방문할수록 비교 대상 영상들이 계속해서 증가하기 때문에, 일반적으로 루프 결합 탐지기는 실시간 제약과 높은 탐지 정확도를 동시에 만족하기 어렵다. 이러한 문제점을 극복하기 위해, 본 시스템에서는 입력 영상들 중에서 의미 있는 것들만을 선택해 이들만을 비교하는 효과적인 키 프레임 선택 방법을 채택하였다. 따라서 루프 탐지에 필요한 영상 비교를 대폭 줄일 수 있다. 또한 본 시스템에서는 루프 결합 탐지의 정확도와 효율성을 높이기 위해, 키 프레임 영상들을 시각 단어들로 표현하고, DBoW 데이터베이스 시스템을 이용해 키 프레임 영상들에 대한 색인을 구성하였다. TUM 대학의 벤치마크 데이터들을 이용한 실험을 통해, 본 논문에서 제안한 시각 루프 결합 탐지기의 높은 성능을 확인할 수 있었다.

Nearest-Neighbors Based Weighted Method for the BOVW Applied to Image Classification

  • Xu, Mengxi;Sun, Quansen;Lu, Yingshu;Shen, Chenming
    • Journal of Electrical Engineering and Technology
    • /
    • 제10권4호
    • /
    • pp.1877-1885
    • /
    • 2015
  • This paper presents a new Nearest-Neighbors based weighted representation for images and weighted K-Nearest-Neighbors (WKNN) classifier to improve the precision of image classification using the Bag of Visual Words (BOVW) based models. Scale-invariant feature transform (SIFT) features are firstly extracted from images. Then, the K-means++ algorithm is adopted in place of the conventional K-means algorithm to generate a more effective visual dictionary. Furthermore, the histogram of visual words becomes more expressive by utilizing the proposed weighted vector quantization (WVQ). Finally, WKNN classifier is applied to enhance the properties of the classification task between images in which similar levels of background noise are present. Average precision and absolute change degree are calculated to assess the classification performance and the stability of K-means++ algorithm, respectively. Experimental results on three diverse datasets: Caltech-101, Caltech-256 and PASCAL VOC 2011 show that the proposed WVQ method and WKNN method further improve the performance of classification.

A Salient Based Bag of Visual Word Model (SBBoVW): Improvements toward Difficult Object Recognition and Object Location in Image Retrieval

  • Mansourian, Leila;Abdullah, Muhamad Taufik;Abdullah, Lilli Nurliyana;Azman, Azreen;Mustaffa, Mas Rina
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권2호
    • /
    • pp.769-786
    • /
    • 2016
  • Object recognition and object location have always drawn much interest. Also, recently various computational models have been designed. One of the big issues in this domain is the lack of an appropriate model for extracting important part of the picture and estimating the object place in the same environments that caused low accuracy. To solve this problem, a new Salient Based Bag of Visual Word (SBBoVW) model for object recognition and object location estimation is presented. Contributions lied in the present study are two-fold. One is to introduce a new approach, which is a Salient Based Bag of Visual Word model (SBBoVW) to recognize difficult objects that have had low accuracy in previous methods. This method integrates SIFT features of the original and salient parts of pictures and fuses them together to generate better codebooks using bag of visual word method. The second contribution is to introduce a new algorithm for finding object place based on the salient map automatically. The performance evaluation on several data sets proves that the new approach outperforms other state-of-the-arts.

Crowd Activity Classification Using Category Constrained Correlated Topic Model

  • Huang, Xianping;Wang, Wanliang;Shen, Guojiang;Feng, Xiaoqing;Kong, Xiangjie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권11호
    • /
    • pp.5530-5546
    • /
    • 2016
  • Automatic analysis and understanding of human activities is a challenging task in computer vision, especially for the surveillance scenarios which typically contains crowds, complex motions and occlusions. To address these issues, a Bag-of-words representation of videos is developed by leveraging information including crowd positions, motion directions and velocities. We infer the crowd activity in a motion field using Category Constrained Correlated Topic Model (CC-CTM) with latent topics. We represent each video by a mixture of learned motion patterns, and predict the associated activity by training a SVM classifier. The experiment dataset we constructed are from Crowd_PETS09 bench dataset and UCF_Crowds dataset, including 2000 documents. Experimental results demonstrate that accuracy reaches 90%, and the proposed approach outperforms the state-of-the-arts by a large margin.