• Title/Summary/Keyword: Local Descriptors

Search Result 63, Processing Time 0.022 seconds

Efficient Use of MPEG-7 Edge Histogram Descriptor

  • Won, Chee-Sun;Park, Dong-Kwon;Park, Soo-Jun
    • ETRI Journal
    • /
    • v.24 no.1
    • /
    • pp.23-30
    • /
    • 2002
  • MPEG-7 Visual Standard specifies a set of descriptors that can be used to measure similarity in images or video. Among them, the Edge Histogram Descriptor describes edge distribution with a histogram based on local edge distribution in an image. Since the Edge Histogram Descriptor recommended for the MPEG-7 standard represents only local edge distribution in the image, the matching performance for image retrieval may not be satisfactory. This paper proposes the use of global and semi-local edge histograms generated directly from the local histogram bins to increase the matching performance. Then, the global, semi-global, and local histograms of images are combined to measure the image similarity and are compared with the MPEG-7 descriptor of the local-only histogram. Since we exploit the absolute location of the edge in the image as well as its global composition, the proposed matching method can retrieve semantically similar images. Experiments on MPEG-7 test images show that the proposed method yields better retrieval performance by an amount of 0.04 in ANMRR, which shows a significant difference in visual inspection.

  • PDF

Multi-scale Local Difference Directional Number Pattern for Group-housed Pigs Recognition

  • Huang, Weijia;Zhu, Weixing;Zhang, Zhengyan;Guo, Yizheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.9
    • /
    • pp.3186-3203
    • /
    • 2021
  • In this paper, a multi-scale local difference directional number (MLDDN) pattern is proposed for pig identification. Firstly, the color images of individual pig are converted into grey images by the most significant bits (MSB) quantization, which makes the grey values have better discrimination. Then, Gabor amplitude and phase responses on different scales are obtained by convoluting the grey images with Gabor masks. Next, by calculating the main difference of local edge directions instead of traditionally edge information, the directional numbers of Gabor amplitude and phase responses are encoded. Finally, the block histograms of the encoded images are concatenated on each scale, and the maximum pooling is adopted on different scales to avoid the high feature dimension. Experimental results on two pigsties show that MLDDN impressively outperforms the other widely used local descriptors.

Improving Transformer with Dynamic Convolution and Shortcut for Video-Text Retrieval

  • Liu, Zhi;Cai, Jincen;Zhang, Mengmeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.7
    • /
    • pp.2407-2424
    • /
    • 2022
  • Recently, Transformer has made great progress in video retrieval tasks due to its high representation capability. For the structure of a Transformer, the cascaded self-attention modules are capable of capturing long-distance feature dependencies. However, the local feature details are likely to have deteriorated. In addition, increasing the depth of the structure is likely to produce learning bias in the learned features. In this paper, an improved Transformer structure named TransDCS (Transformer with Dynamic Convolution and Shortcut) is proposed. A Multi-head Conv-Self-Attention module is introduced to model the local dependencies and improve the efficiency of local features extraction. Meanwhile, the augmented shortcuts module based on a dual identity matrix is applied to enhance the conduction of input features, and mitigate the learning bias. The proposed model is tested on MSRVTT, LSMDC and Activity-Net benchmarks, and it surpasses all previous solutions for the video-text retrieval task. For example, on the LSMDC benchmark, a gain of about 2.3% MdR and 6.1% MnR is obtained over recently proposed multimodal-based methods.

Detection of Faces with Partial Occlusions using Statistical Face Model (통계적 얼굴 모델을 이용한 부분적으로 가려진 얼굴 검출)

  • Seo, Jeongin;Park, Hyeyoung
    • Journal of KIISE
    • /
    • v.41 no.11
    • /
    • pp.921-926
    • /
    • 2014
  • Face detection refers to the process extracting facial regions in an input image, which can improve speed and accuracy of recognition or authorization system, and has diverse applicability. Since conventional works have tried to detect faces based on the whole shape of faces, its detection performance can be degraded by occlusion made with accessories or parts of body. In this paper we propose a method combining local feature descriptors and probability modeling in order to detect partially occluded face effectively. In training stage, we represent an image as a set of local feature descriptors and estimate a statistical model for normal faces. When the test image is given, we find a region that is most similar to face using our face model constructed in training stage. According to experimental results with benchmark data set, we confirmed the effect of proposed method on detecting partially occluded face.

PPD: A Robust Low-computation Local Descriptor for Mobile Image Retrieval

  • Liu, Congxin;Yang, Jie;Feng, Deying
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.4 no.3
    • /
    • pp.305-323
    • /
    • 2010
  • This paper proposes an efficient and yet powerful local descriptor called phase-space partition based descriptor (PPD). This descriptor is designed for the mobile image matching and retrieval. PPD, which is inspired from SIFT, also encodes the salient aspects of the image gradient in the neighborhood around an interest point. However, without employing SIFT's smoothed gradient orientation histogram, we apply the region based gradient statistics in phase space to the construction of a feature representation, which allows to reduce much computation requirements. The feature matching experiments demonstrate that PPD achieves favorable performance close to that of SIFT and faster building and matching. We also present results showing that the use of PPD descriptors in a mobile image retrieval application results in a comparable performance to SIFT.

Image Retrieval Using a Composite of MPEG-7 Visual Descriptors (MPEG-7 디스크립터들의 조합을 이용한 영상 검색)

  • 강희범;원치선
    • Journal of Broadcast Engineering
    • /
    • v.8 no.1
    • /
    • pp.91-100
    • /
    • 2003
  • In this paper, to improve the retrieval Performance, an efficient combination of the MPEG-7 visual descriptors, such as the edge histogram descriptor (EHD), the color layout descriptor (CLD), and the homogeneous texture descriptor (HTD), is proposed in the framework of the relevance feedback approach. The EHD represents spatial distribution of edges in local image regions and it is considered as an important feature to represent the content of the image. The CLD specifies spatial distribution of colors and is widely used in image retrieval due to its simplicity and fast operation speed. The HTD describes precise statistical distribution of the image texture. Both the feature vector for the query image and the weighting factors among the combined descriptors are adaptively determined during the relevance feedback. Experimental results show that the proposed method improves the retrieval performance significantly tot natural images.

Spatio-temporal Semantic Features for Human Action Recognition

  • Liu, Jia;Wang, Xiaonian;Li, Tianyu;Yang, Jie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.10
    • /
    • pp.2632-2649
    • /
    • 2012
  • Most approaches to human action recognition is limited due to the use of simple action datasets under controlled environments or focus on excessively localized features without sufficiently exploring the spatio-temporal information. This paper proposed a framework for recognizing realistic human actions. Specifically, a new action representation is proposed based on computing a rich set of descriptors from keypoint trajectories. To obtain efficient and compact representations for actions, we develop a feature fusion method to combine spatial-temporal local motion descriptors by the movement of the camera which is detected by the distribution of spatio-temporal interest points in the clips. A new topic model called Markov Semantic Model is proposed for semantic feature selection which relies on the different kinds of dependencies between words produced by "syntactic " and "semantic" constraints. The informative features are selected collaboratively based on the different types of dependencies between words produced by short range and long range constraints. Building on the nonlinear SVMs, we validate this proposed hierarchical framework on several realistic action datasets.

An Efficient Feature Point Detection for Interactive Pen-Input Display Applications (인터액티브 펜-입력 디스플레이 애플리케이션을 위한 효과적인 특징점 추출법)

  • Kim Dae-Hyun;Kim Myoung-Jun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.705-716
    • /
    • 2005
  • There exist many feature point detection algorithms that developed in pattern recognition research . However, interactive applications for the pen-input displays such as Tablet PCs and LCD tablets have set different goals; reliable segmentation for different drawing styles and real-time on-the-fly fieature point defection. This paper presents a curvature estimation method crucial for segmenting freeHand pen input. It considers only local shape descriptors, thus, peforming a novel curvature estimation on-the-fly while drawing on a pen-input display This has been used for pen marking recognition to build a 3D sketch-based modeling application.

Enhanced VLAD

  • Wei, Benchang;Guan, Tao;Luo, Yawei;Duan, Liya;Yu, Junqing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.7
    • /
    • pp.3272-3285
    • /
    • 2016
  • Recently, Vector of Locally Aggregated Descriptors (VLAD) has been proposed to index image by compact representations, which encodes powerful local descriptors and makes significant improvement on search performance with less memory compared against the state of art. However, its performance relies heavily on the size of the codebook which is used to generate VLAD representation. It indicates better accuracy needs higher dimensional representation. Thus, more memory overhead is needed. In this paper, we enhance VLAD image representation by using two level hierarchical-codebooks. It can provide more accurate search performance while keeping the VLAD size unchanged. In addition, hierarchical-codebooks are used to construct multiple inverted files for more accurate non-exhaustive search. Experimental results show that our method can make significant improvement on both VLAD image representation and non-exhaustive search.

Illumination Robust Feature Descriptor Based on Exact Order (조명 변화에 강인한 엄격한 순차 기반의 특징점 기술자)

  • Kim, Bongjoe;Sohn, Kwanghoon
    • Journal of Broadcast Engineering
    • /
    • v.18 no.1
    • /
    • pp.77-87
    • /
    • 2013
  • In this paper, we present a novel method for local image descriptor called exact order based descriptor (EOD) which is robust to illumination changes and Gaussian noise. Exact orders of image patch is induced by changing discrete intensity value into k-dimensional continuous vector to resolve the ambiguity of ordering for same intensity pixel value. EOD is generated from overall distribution of exact orders in the patch. The proposed local descriptor is compared with several state-of-the-art descriptors over a number of images. Experimental results show that the proposed method outperforms many state-of-the-art descriptors in the presence of illumination changes, blur and viewpoint change. Also, the proposed method can be used for many computer vision applications such as face recognition, texture recognition and image analysis.