• Title/Summary/Keyword: Keypoint

Search Result 80, Processing Time 0.028 seconds

Multi-resolution Fusion Network for Human Pose Estimation in Low-resolution Images

  • Kim, Boeun;Choo, YeonSeung;Jeong, Hea In;Kim, Chung-Il;Shin, Saim;Kim, Jungho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.7
    • /
    • pp.2328-2344
    • /
    • 2022
  • 2D human pose estimation still faces difficulty in low-resolution images. Most existing top-down approaches scale up the target human bonding box images to the large size and insert the scaled image into the network. Due to up-sampling, artifacts occur in the low-resolution target images, and the degraded images adversely affect the accurate estimation of the joint positions. To address this issue, we propose a multi-resolution input feature fusion network for human pose estimation. Specifically, the bounding box image of the target human is rescaled to multiple input images of various sizes, and the features extracted from the multiple images are fused in the network. Moreover, we introduce a guiding channel which induces the multi-resolution input features to alternatively affect the network according to the resolution of the target image. We conduct experiments on MS COCO dataset which is a representative dataset for 2D human pose estimation, where our method achieves superior performance compared to the strong baseline HRNet and the previous state-of-the-art methods.

Deep Local Multi-level Feature Aggregation Based High-speed Train Image Matching

  • Li, Jun;Li, Xiang;Wei, Yifei;Wang, Xiaojun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.5
    • /
    • pp.1597-1610
    • /
    • 2022
  • At present, the main method of high-speed train chassis detection is using computer vision technology to extract keypoints from two related chassis images firstly, then matching these keypoints to find the pixel-level correspondence between these two images, finally, detection and other steps are performed. The quality and accuracy of image matching are very important for subsequent defect detection. Current traditional matching methods are difficult to meet the actual requirements for the generalization of complex scenes such as weather, illumination, and seasonal changes. Therefore, it is of great significance to study the high-speed train image matching method based on deep learning. This paper establishes a high-speed train chassis image matching dataset, including random perspective changes and optical distortion, to simulate the changes in the actual working environment of the high-speed rail system as much as possible. This work designs a convolutional neural network to intensively extract keypoints, so as to alleviate the problems of current methods. With multi-level features, on the one hand, the network restores low-level details, thereby improving the localization accuracy of keypoints, on the other hand, the network can generate robust keypoint descriptors. Detailed experiments show the huge improvement of the proposed network over traditional methods.

A 3D Audio-Visual Animated Agent for Expressive Conversational Question Answering

  • Martin, J.C.;Jacquemin, C.;Pointal, L.;Katz, B.
    • 한국정보컨버전스학회:학술대회논문집
    • /
    • 2008.06a
    • /
    • pp.53-56
    • /
    • 2008
  • This paper reports on the ACQA(Animated agent for Conversational Question Answering) project conducted at LIMSI. The aim is to design an expressive animated conversational agent(ACA) for conducting research along two main lines: 1/ perceptual experiments(eg perception of expressivity and 3D movements in both audio and visual channels): 2/ design of human-computer interfaces requiring head models at different resolutions and the integration of the talking head in virtual scenes. The target application of this expressive ACA is a real-time question and answer speech based system developed at LIMSI(RITEL). The architecture of the system is based on distributed modules exchanging messages through a network protocol. The main components of the system are: RITEL a question and answer system searching raw text, which is able to produce a text(the answer) and attitudinal information; this attitudinal information is then processed for delivering expressive tags; the text is converted into phoneme, viseme, and prosodic descriptions. Audio speech is generated by the LIMSI selection-concatenation text-to-speech engine. Visual speech is using MPEG4 keypoint-based animation, and is rendered in real-time by Virtual Choreographer (VirChor), a GPU-based 3D engine. Finally, visual and audio speech is played in a 3D audio and visual scene. The project also puts a lot of effort for realistic visual and audio 3D rendering. A new model of phoneme-dependant human radiation patterns is included in the speech synthesis system, so that the ACA can move in the virtual scene with realistic 3D visual and audio rendering.

  • PDF

Multimodal Image Fusion with Human Pose for Illumination-Robust Detection of Human Abnormal Behaviors (조명을 위한 인간 자세와 다중 모드 이미지 융합 - 인간의 이상 행동에 대한 강력한 탐지)

  • Cuong H. Tran;Seong G. Kong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.637-640
    • /
    • 2023
  • This paper presents multimodal image fusion with human pose for detecting abnormal human behaviors in low illumination conditions. Detecting human behaviors in low illumination conditions is challenging due to its limited visibility of the objects of interest in the scene. Multimodal image fusion simultaneously combines visual information in the visible spectrum and thermal radiation information in the long-wave infrared spectrum. We propose an abnormal event detection scheme based on the multimodal fused image and the human poses using the keypoints to characterize the action of the human body. Our method assumes that human behaviors are well correlated to body keypoints such as shoulders, elbows, wrists, hips. In detail, we extracted the human keypoint coordinates from human targets in multimodal fused videos. The coordinate values are used as inputs to train a multilayer perceptron network to classify human behaviors as normal or abnormal. Our experiment demonstrates a significant result on multimodal imaging dataset. The proposed model can capture the complex distribution pattern for both normal and abnormal behaviors.

Fall and Direction Detection Using Multiple Cameras and Sensors (다중 카메라와 센서를 활용한 낙상 및 방향 감지)

  • Insu Jeon;Dayeong So;Chomyong Kim;Jung-Yeon Kim;Yunyoung Nam;Jihoon Moon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.191-192
    • /
    • 2024
  • 고령 인구의 지속적인 증가로 인해 고령자의 안전과 관련된 문제는 주요한 관심사 중 하나로 부상하고 있다. 특히, 고령자들 사이에서 자주 발생하는 낙상 사고는 심각한 건강 문제를 일으킬 수 있으며, 이를 예방하고 대응하는 것은 고령 인구의 삶의 질을 향상하는 데 중요한 역할을 한다. 본 연구는 8대의 카메라로 촬영된 영상과 센서 데이터를 통합한 낙상 감지 기법을 제안한다. 제안한 기법은 MediaPipe를 활용하여 Skeleton Keypoint를 추출하는 이미지 인식 기법과 센서 데이터에서 얻은 특징을 활용하는 센서 기반 기술을 결합하여 낙상 사고의 발생 및 방향을 효과적으로 감지할 수 있다. 이러한 결과를 바탕으로 본 연구는 향후 고령자들의 생활 안전성과 의료 시스템의 효율성을 높이는 데 이바지할 수 있을 것으로 기대한다.

  • PDF

Image-based Image Retrieval System Using Duplicated Point of PCA-SIFT (PCA-SIFT의 차원 중복점을 이용한 이미지 기반 이미지 검색 시스템)

  • Choi, GiRyong;Jung, Hye-Wuk;Lee, Jee-Hyoung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.3
    • /
    • pp.275-279
    • /
    • 2013
  • Recently, as multimedia information becomes popular, there are many studies to retrieve images based on images in the web. However, it is hard to find the matching images which users want to find because of various patterns in images. In this paper, we suggest an efficient images retrieval system based on images for finding products in internet shopping malls. We extract features for image retrieval by using SIFT (Scale Invariant Feature Transform) algorithm, repeat keypoint matching in various dimension by using PCA-SIFT, and find the image which users search for by combining them. To verify efficiency of the proposed method, we compare the performance of our approach with that of SIFT and PCA-SIFT by using images with various patterns. We verify that the proposed method shows the best distinction in the case that product labels are not included in images.

A Comparison of 3D Reconstruction through the Passive and Pseudo-Active Acquisition of Images (수동 및 반자동 영상획득을 통한 3차원 공간복원의 비교)

  • Jeona, MiJeong;Kim, DuBeom;Chai, YoungHo
    • Journal of Broadcast Engineering
    • /
    • v.21 no.1
    • /
    • pp.3-10
    • /
    • 2016
  • In this paper, two reconstructed point cloud sets with the information of 3D features are analyzed. For a certain 3D reconstruction of the interior of a building, the first image set is taken from the sequential passive camera movement along the regular grid path and the second set is from the application of the laser scanning process. Matched key points over all images are obtained by the SIFT(Scale Invariant Feature Transformation) algorithm and are used for the registration of the point cloud data. The obtained results are point cloud number, average density of point cloud and the generating time for point cloud. Experimental results show the necessity of images from the additional sensors as well as the images from the camera for the more accurate 3D reconstruction of the interior of a building.

Adaptive Keyframe-Based Tracking for Augmented Books (증강 책을 위한 적응형 키프레임 기반 트래킹)

  • Yoo, Jae-Sang;Cho, Kyu-Sung;Yang, Hyun-S.
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.502-506
    • /
    • 2010
  • An augmented book is an application that augments such multimedia elements as virtual 3D objects generated by computer graphics, movie clips, or sound clips to a real book using AR technologies. It is intended to bring additional education and entertainment effects to users. For augmented books, this paper proposes an adaptive keyframe-based page tracking method to estimate the camera's 6 DOF pose in real-time after recognizing a page and performing wide-baseline keypoint matching. For a page tracking, proposed method in this paper chooses a proper keyframe and performs a tracking in two step of coarse-to-fine stage. As a result, the proposed method in this paper guarantees a robust tracking to view-point and illumination variations and real-time.

A Hardware Design of Feature Detector for Realtime Processing of SIFT(Scale Invariant Feature Transform) Algorithm in Embedded Systems (임베디드 환경에서 SIFT 알고리즘의 실시간 처리를 위한 특징점 검출기의 하드웨어 구현)

  • Park, Chan-Il;Lee, Su-Hyun;Jeong, Yong-Jin
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.3
    • /
    • pp.86-95
    • /
    • 2009
  • SIFT is an algorithm to extract vectors at pixels around keypoints, in which the pixel colors are very different from neighbors, such as vertices and edges of an object. The SIFT algorithm is being actively researched for various image processing applications including 3D image reconstructions and intelligent vision system for robots. In this paper, we implement a hardware to sift feature detection algorithm for real time processing in embedded systems. We estimate that the hardware implementation give a performance 25ms of $1,280{\times}960$ image and 5ms of $640{\times}480$ image at 100MHz. And the implemented hardware consumes 45,792 LUTs(85%) with Synplify 8.li synthesis tool.

SIFT based Image Similarity Search using an Edge Image Pyramid and an Interesting Region Detection (윤곽선 이미지 피라미드와 관심영역 검출을 이용한 SIFT 기반 이미지 유사성 검색)

  • Yu, Seung-Hoon;Kim, Deok-Hwan;Lee, Seok-Lyong;Chung, Chin-Wan;Kim, Sang-Hee
    • Journal of KIISE:Databases
    • /
    • v.35 no.4
    • /
    • pp.345-355
    • /
    • 2008
  • SIFT is popularly used in computer vision application such as object recognition, motion tracking, and 3D reconstruction among various shape descriptors. However, it is not easy to apply SIFT into the image similarity search as it is since it uses many high dimensional keypoint vectors. In this paper, we present a SIFT based image similarity search method using an edge image pyramid and an interesting region detection. The proposed method extracts keypoints, which is invariant to contrast, scale, and rotation of image, by using the edge image pyramid and removes many unnecessary keypoints from the image by using the hough transform. The proposed hough transform can detect objects of ellipse type so that it can be used to find interesting regions. Experimental results demonstrate that the retrieval performance of the proposed method is about 20% better than that of traditional SIFT in average recall.