• Title/Summary/Keyword: spatial pyramid

Search Result 53, Processing Time 0.019 seconds

CNN-based Human Parsing Technique Using Pyramid Pooling (Pyramid pooling을 이용한 CNN 기반의 Human Parsing 기법)

  • Choi, Inkyu;Ko, min-soo;Song, hyok
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2018.11a
    • /
    • pp.97-98
    • /
    • 2018
  • 최근 딥러닝 기술의 발전으로 영상 분류 및 영상 내 객체 검출뿐만 아니라 CNN 기반의 segmentation 기술도 개발되어 다른 요소까지 포함한 직사각형 영역의 검출 영역이 아닌 경계까지 고려한 분리가 가능하게 되었다. 더불어 사람 영역을 신체부위나 의류 부분과 같은 세부 영역으로 나누어 분리하는 human parsing 기술까지 연구되고 있다. Human parsing은 의류스타일 분석 및 검색, 사람의 행동 인식 및 추적과 같은 분야에도 응용될 수 있다. 본 논문에서는 Spatial pyramid pooling layer를 이용하여 영상 전체에 대한 공간적 분포 및 특성 정보를 고려한 human parsing 기법을 제안한다. Look into person(LIP) dataset을 이용하여 기존의 다른 segmentation 및 human parsing 기법과 제안하는 기법을 비교하여 제안하는 기법의 human parsing 결과가 보다 정교한 분리가 가능한 것을 확인하였다.

  • PDF

On the Hybrid Prediction Pyramid Compatible Coding Technique (혼성 예측 피라미드 호환 부호화 기법)

  • 이준서;이상욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.1
    • /
    • pp.33-46
    • /
    • 1996
  • Inthis paper, we investigate the compatible coding technique, which receives much interest ever since the introduction of HDTV. First, attempts have been made to analyze the theoretical transform coding gains for various hierarchical decomposition techniques, namely subband, pyramid and DCT-based decomposition techniques. It is shown that the spatical domain techniques proide higher transform coding gains than the DCT-based coding technique. Secondly, we compare the performance of these spatial domain techniques, in terms of the PSNR versus various rate allocations to each layer. Based on these analyses, it is believed that the pyramid decomposition is more appropriate for the compatible coding. Also in this paper, we propose a hybrid prediction pyramid coding technique, by combining the spatio-temporal prediction in MPEG-2[3] and the adaptive MC(Motion Compensation)[1]. In the proposed coding technigue, we also employ an adaptive DCT coefficient scanning technique to exploit the direction information of the 2nd-layer signal. Through computer simulations, the proposed hybrid prediction with adaptive scanning technuque shows the PSNR improvement, by about 0.46-1.78dB at low 1st-layer rate(about 0.1bpp) over the adaptive MC[1], and by about 0.33-0.63dB at high 1st-layer rate (about 0.32-0.43bpp) over the spatio-temporal prediction[3].

  • PDF

Spatial-Temporal Scale-Invariant Human Action Recognition using Motion Gradient Histogram (모션 그래디언트 히스토그램 기반의 시공간 크기 변화에 강인한 동작 인식)

  • Kim, Kwang-Soo;Kim, Tae-Hyoung;Kwak, Soo-Yeong;Byun, Hye-Ran
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.12
    • /
    • pp.1075-1082
    • /
    • 2007
  • In this paper, we propose the method of multiple human action recognition on video clip. For being invariant to the change of speed or size of actions, Spatial-Temporal Pyramid method is applied. Proposed method can minimize the complexity of the procedures owing to select Motion Gradient Histogram (MGH) based on statistical approach for action representation feature. For multiple action detection, Motion Energy Image (MEI) of binary frame difference accumulations is adapted and then we detect each action of which area is represented by MGH. The action MGH should be compared with pre-learning MGH having pyramid method. As a result, recognition can be done by the analyze between action MGH and pre-learning MGH. Ten video clips are used for evaluating the proposed method. We have various experiments such as mono action, multiple action, speed and site scale-changes, comparison with previous method. As a result, we can see that proposed method is simple and efficient to recognize multiple human action with stale variations.

Human Action Recognition in Still Image Using Weighted Bag-of-Features and Ensemble Decision Trees (가중치 기반 Bag-of-Feature와 앙상블 결정 트리를 이용한 정지 영상에서의 인간 행동 인식)

  • Hong, June-Hyeok;Ko, Byoung-Chul;Nam, Jae-Yeal
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38A no.1
    • /
    • pp.1-9
    • /
    • 2013
  • This paper propose a human action recognition method that uses bag-of-features (BoF) based on CS-LBP (center-symmetric local binary pattern) and a spatial pyramid in addition to the random forest classifier. To construct the BoF, an image divided into dense regular grids and extract from each patch. A code word which is a visual vocabulary, is formed by k-means clustering of a random subset of patches. For enhanced action discrimination, local BoF histogram from three subdivided levels of a spatial pyramid is estimated, and a weighted BoF histogram is generated by concatenating the local histograms. For action classification, a random forest, which is an ensemble of decision trees, is built to model the distribution of each action class. The random forest combined with the weighted BoF histogram is successfully applied to Standford Action 40 including various human action images, and its classification performance is better than that of other methods. Furthermore, the proposed method allows action recognition to be performed in near real-time.

Dual Attention Based Image Pyramid Network for Object Detection

  • Dong, Xiang;Li, Feng;Bai, Huihui;Zhao, Yao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.12
    • /
    • pp.4439-4455
    • /
    • 2021
  • Compared with two-stage object detection algorithms, one-stage algorithms provide a better trade-off between real-time performance and accuracy. However, these methods treat the intermediate features equally, which lacks the flexibility to emphasize meaningful information for classification and location. Besides, they ignore the interaction of contextual information from different scales, which is important for medium and small objects detection. To tackle these problems, we propose an image pyramid network based on dual attention mechanism (DAIPNet), which builds an image pyramid to enrich the spatial information while emphasizing multi-scale informative features based on dual attention mechanisms for one-stage object detection. Our framework utilizes a pre-trained backbone as standard detection network, where the designed image pyramid network (IPN) is used as auxiliary network to provide complementary information. Here, the dual attention mechanism is composed of the adaptive feature fusion module (AFFM) and the progressive attention fusion module (PAFM). AFFM is designed to automatically pay attention to the feature maps with different importance from the backbone and auxiliary network, while PAFM is utilized to adaptively learn the channel attentive information in the context transfer process. Furthermore, in the IPN, we build an image pyramid to extract scale-wise features from downsampled images of different scales, where the features are further fused at different states to enrich scale-wise information and learn more comprehensive feature representations. Experimental results are shown on MS COCO dataset. Our proposed detector with a 300 × 300 input achieves superior performance of 32.6% mAP on the MS COCO test-dev compared with state-of-the-art methods.

Seamline Detection for Image Mosaicking with Image Pyramid (영상 피라미드 기반 영상 모자이크를 위한 접합선 추출)

  • Eun-Jin Yoo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.3
    • /
    • pp.268-274
    • /
    • 2023
  • Image mosaicking is one of the basic and important technologies in the field of application using images. The key of image mosaicking is to extract seamlines from a joint image. The method proposed in this paper for image mosaicking is as follows. The feature points of the images to be joined are extracted and the joining form between the two images is identified. A reference position for detection the seamlines were selected according to the joint form, and an image pyramid was created for efficient image processing. The outlines of the image including buildings and roads are extracted from the overlapping area with low resolution, and the seamlines are determined by considering the components of the outlines. Based on this, the seamlines in the high-resolution image was re-searched and finally the seamline for image mosaicking was determined. In addition, in order to minimize color distortion of the image with the determined seamline, a method of improving the quality of the mosaic image by fine correction of the mosaic area was applied. It was confirmed that the quality of the seamline extraction results applying the method proposed was reasonable.

A Multi-Stage Convolution Machine with Scaling and Dilation for Human Pose Estimation

  • Nie, Yali;Lee, Jaehwan;Yoon, Sook;Park, Dong Sun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.6
    • /
    • pp.3182-3198
    • /
    • 2019
  • Vision-based Human Pose Estimation has been considered as one of challenging research subjects due to problems including confounding background clutter, diversity of human appearances and illumination changes in scenes. To tackle these problems, we propose to use a new multi-stage convolution machine for estimating human pose. To provide better heatmap prediction of body joints, the proposed machine repeatedly produces multiple predictions according to stages with receptive field large enough for learning the long-range spatial relationship. And stages are composed of various modules according to their strategic purposes. Pyramid stacking module and dilation module are used to handle problem of human pose at multiple scales. Their multi-scale information from different receptive fields are fused with concatenation, which can catch more contextual information from different features. And spatial and channel information of a given input are converted to gating factors by squeezing the feature maps to a single numeric value based on its importance in order to give each of the network channels different weights. Compared with other ConvNet-based architectures, we demonstrated that our proposed architecture achieved higher accuracy on experiments using standard benchmarks of LSP and MPII pose datasets.

DP-LinkNet: A convolutional network for historical document image binarization

  • Xiong, Wei;Jia, Xiuhong;Yang, Dichun;Ai, Meihui;Li, Lirong;Wang, Song
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1778-1797
    • /
    • 2021
  • Document image binarization is an important pre-processing step in document analysis and archiving. The state-of-the-art models for document image binarization are variants of encoder-decoder architectures, such as FCN (fully convolutional network) and U-Net. Despite their success, they still suffer from three limitations: (1) reduced feature map resolution due to consecutive strided pooling or convolutions, (2) multiple scales of target objects, and (3) reduced localization accuracy due to the built-in invariance of deep convolutional neural networks (DCNNs). To overcome these three challenges, we propose an improved semantic segmentation model, referred to as DP-LinkNet, which adopts the D-LinkNet architecture as its backbone, with the proposed hybrid dilated convolution (HDC) and spatial pyramid pooling (SPP) modules between the encoder and the decoder. Extensive experiments are conducted on recent document image binarization competition (DIBCO) and handwritten document image binarization competition (H-DIBCO) benchmark datasets. Results show that our proposed DP-LinkNet outperforms other state-of-the-art techniques by a large margin. Our implementation and the pre-trained models are available at https://github.com/beargolden/DP-LinkNet.

Geospatial Information Extraction by using Digital Photogrammetry (수치사진측량을 이용한 지형공간정보 추출)

  • Yeu, Bock-Mo;Kim, Won-Dae
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.7 no.2 s.14
    • /
    • pp.81-91
    • /
    • 1999
  • The study on the updating of the database of Geo-Spatial Information System that is necessary condition to GSIS operation have executed actively. In this study, digital orthophotos were produced by using the product of digital photogrammetry process and then multiscale digital orthophoto maps were generated tv the concept of image pyramids. The produced digital orthophoto has 2.092 meters accuracy compared to 1:5,000 topographic map. This digital orthophoto can be implemented in various Geo-Spatial Information System, providing valuable spatial reference for other objects in the database.

  • PDF

[ $F\"{o}rstner$ ] Interest Operator in Scale Space (다축척 수치영상에서 $F\"{o}rstner$연산자의 거동)

  • Cho, Woo-Sug
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.4 no.1 s.6
    • /
    • pp.67-73
    • /
    • 1996
  • The objective of this research is to investigate the behavior of the $F\"{o}rstner$ interest operator, which has been widely used for detecting distinct points in the field of digital photogrammetry and computer vision, in scale space. Considering the hugh volume of digital image utilized in digital photogrammetry, the scale space (image pyramid) approach which appears to be a solution for enhancing image processing, began to gain its attention. The investigation of the $F\"{o}rstner$ interest operator in scale space generated by the Gaussian kernel shows its behavior and feasibility for being used in practice.

  • PDF