• Title/Summary/Keyword: Pose Prediction

Search Result 50, Processing Time 0.023 seconds

Augmented Reality Service Based on Object Pose Prediction Using PnP Algorithm

  • Kim, In-Seon;Jung, Tae-Won;Jung, Kye-Dong
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.295-301
    • /
    • 2021
  • Digital media technology is gradually developing with the development of convergence quaternary industrial technology and mobile devices. The combination of deep learning and augmented reality can provide more convenient and lively services through the interaction of 3D virtual images with the real world. We combine deep learning-based pose prediction with augmented reality technology. We predict the eight vertices of the bounding box of the object in the image. Using the predicted eight vertices(x,y), eight vertices(x,y,z) of 3D mesh, and the intrinsic parameter of the smartphone camera, we compute the external parameters of the camera through the PnP algorithm. We calculate the distance to the object and the degree of rotation of the object using the external parameter and apply to AR content. Our method provides services in a web environment, making it highly accessible to users and easy to maintain the system. As we provide augmented reality services using consumers' smartphone cameras, we can apply them to various business fields.

Predicting Unseen Object Pose with an Adaptive Depth Estimator (적응형 깊이 추정기를 이용한 미지 물체의 자세 예측)

  • Sungho, Song;Incheol, Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.12
    • /
    • pp.509-516
    • /
    • 2022
  • Accurate pose prediction of objects in 3D space is an important visual recognition technique widely used in many applications such as scene understanding in both indoor and outdoor environments, robotic object manipulation, autonomous driving, and augmented reality. Most previous works for object pose estimation have the limitation that they require an exact 3D CAD model for each object. Unlike such previous works, this paper proposes a novel neural network model that can predict the poses of unknown objects based on only their RGB color images without the corresponding 3D CAD models. The proposed model can obtain depth maps required for unknown object pose prediction by using an adaptive depth estimator, AdaBins,. In this paper, we evaluate the usefulness and the performance of the proposed model through experiments using benchmark datasets.

Robust 2D human upper-body pose estimation with fully convolutional network

  • Lee, Seunghee;Koo, Jungmo;Kim, Jinki;Myung, Hyun
    • Advances in robotics research
    • /
    • v.2 no.2
    • /
    • pp.129-140
    • /
    • 2018
  • With the increasing demand for the development of human pose estimation, such as human-computer interaction and human activity recognition, there have been numerous approaches to detect the 2D poses of people in images more efficiently. Despite many years of human pose estimation research, the estimation of human poses with images remains difficult to produce satisfactory results. In this study, we propose a robust 2D human body pose estimation method using an RGB camera sensor. Our pose estimation method is efficient and cost-effective since the use of RGB camera sensor is economically beneficial compared to more commonly used high-priced sensors. For the estimation of upper-body joint positions, semantic segmentation with a fully convolutional network was exploited. From acquired RGB images, joint heatmaps accurately estimate the coordinates of the location of each joint. The network architecture was designed to learn and detect the locations of joints via the sequential prediction processing method. Our proposed method was tested and validated for efficient estimation of the human upper-body pose. The obtained results reveal the potential of a simple RGB camera sensor for human pose estimation applications.

Key Pose-based Proposal Distribution for Upper Body Pose Tracking (상반신 포즈 추적을 위한 키포즈 기반 예측분포)

  • Oh, Chi-Min;Lee, Chil-Woo
    • The KIPS Transactions:PartB
    • /
    • v.18B no.1
    • /
    • pp.11-20
    • /
    • 2011
  • Pictorial Structures is known as an effective method that recognizes and tracks human poses. In this paper, the upper body pose is also tracked by PS and a particle filter(PF). PF is one of dynamic programming methods. But Markov chain-based dynamic motion model which is used in dynamic programming methods such as PF, couldn't predict effectively the highly articulated upper body motions. Therefore PF often fails to track upper body pose. In this paper we propose the key pose-based proposal distribution for proper particle prediction based on the similarities between key poses and an upper body silhouette. In the experimental results we confirmed our 70.51% improved performance comparing with a conventional method.

Search Space Reduction Techniques in Small Molecular Docking (소분자 도킹에서 탐색공간의 축소 방법)

  • Cho, Seung Joo
    • Journal of Integrative Natural Science
    • /
    • v.3 no.3
    • /
    • pp.143-147
    • /
    • 2010
  • Since it is of great importance to know how a ligand binds to a receptor, there have been a lot of efforts to improve the quality of prediction of docking poses. Earlier efforts were focused on improving search algorithm and scoring function in a docking program resulting in a partial improvement with a lot of variations. Although these are basically very important and essential, more tangible improvements came from the reduction of search space. In a normal docking study, the approximate active site is assumed to be known. After defining active site, scoring functions and search algorithms are used to locate the expected binding pose within this search space. A good search algorithm will sample wisely toward the correct binding pose. By careful study of receptor structure, it was possible to prioritize sub-space in the active site using "receptor-based pharmacophores" or "hot spots". In a sense, these techniques reduce the search space from the beginning. Further improvements were made when the bound ligand structure is available, i.e., the searching could be directed by molecular similarity using ligand information. This could be very helpful to increase the accuracy of binding pose. In addition, if the biological activity data is available, docking program could be improved to the level of being useful in affinity prediction for a series of congeneric ligands. Since the number of co-crystal structures is increasing in protein databank, "Ligand-Guided Docking" to reduce the search space would be more important to improve the accuracy of docking pose prediction and the efficiency of virtual screening. Further improvements in this area would be useful to produce more reliable docking programs.

A Multi-Stage Convolution Machine with Scaling and Dilation for Human Pose Estimation

  • Nie, Yali;Lee, Jaehwan;Yoon, Sook;Park, Dong Sun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.6
    • /
    • pp.3182-3198
    • /
    • 2019
  • Vision-based Human Pose Estimation has been considered as one of challenging research subjects due to problems including confounding background clutter, diversity of human appearances and illumination changes in scenes. To tackle these problems, we propose to use a new multi-stage convolution machine for estimating human pose. To provide better heatmap prediction of body joints, the proposed machine repeatedly produces multiple predictions according to stages with receptive field large enough for learning the long-range spatial relationship. And stages are composed of various modules according to their strategic purposes. Pyramid stacking module and dilation module are used to handle problem of human pose at multiple scales. Their multi-scale information from different receptive fields are fused with concatenation, which can catch more contextual information from different features. And spatial and channel information of a given input are converted to gating factors by squeezing the feature maps to a single numeric value based on its importance in order to give each of the network channels different weights. Compared with other ConvNet-based architectures, we demonstrated that our proposed architecture achieved higher accuracy on experiments using standard benchmarks of LSP and MPII pose datasets.

Fast Convergence GRU Model for Sign Language Recognition

  • Subramanian, Barathi;Olimov, Bekhzod;Kim, Jeonghong
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.9
    • /
    • pp.1257-1265
    • /
    • 2022
  • Recognition of sign language is challenging due to the occlusion of hands, accuracy of hand gestures, and high computational costs. In recent years, deep learning techniques have made significant advances in this field. Although these methods are larger and more complex, they cannot manage long-term sequential data and lack the ability to capture useful information through efficient information processing with faster convergence. In order to overcome these challenges, we propose a word-level sign language recognition (SLR) system that combines a real-time human pose detection library with the minimized version of the gated recurrent unit (GRU) model. Each gate unit is optimized by discarding the depth-weighted reset gate in GRU cells and considering only current input. Furthermore, we use sigmoid rather than hyperbolic tangent activation in standard GRUs due to performance loss associated with the former in deeper networks. Experimental results demonstrate that our pose-based optimized GRU (Pose-OGRU) outperforms the standard GRU model in terms of prediction accuracy, convergency, and information processing capability.

Behavior Pattern Prediction Algorithm Based on 2D Pose Estimation and LSTM from Videos (비디오 영상에서 2차원 자세 추정과 LSTM 기반의 행동 패턴 예측 알고리즘)

  • Choi, Jiho;Hwang, Gyutae;Lee, Sang Jun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.191-197
    • /
    • 2022
  • This study proposes an image-based Pose Intention Network (PIN) algorithm for rehabilitation via patients' intentions. The purpose of the PIN algorithm is for enabling an active rehabilitation exercise, which is implemented by estimating the patient's motion and classifying the intention. Existing rehabilitation involves the inconvenience of attaching a sensor directly to the patient's skin. In addition, the rehabilitation device moves the patient, which is a passive rehabilitation method. Our algorithm consists of two steps. First, we estimate the user's joint position through the OpenPose algorithm, which is efficient in estimating 2D human pose in an image. Second, an intention classifier is constructed for classifying the motions into three categories, and a sequence of images including joint information is used as input. The intention network also learns correlations between joints and changes in joints over a short period of time, which can be easily used to determine the intention of the motion. To implement the proposed algorithm and conduct real-world experiments, we collected our own dataset, which is composed of videos of three classes. The network is trained using short segment clips of the video. Experimental results demonstrate that the proposed algorithm is effective for classifying intentions based on a short video clip.

2D Human Pose Estimation based on Object Detection using RGB-D information

  • Park, Seohee;Ji, Myunggeun;Chun, Junchul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.2
    • /
    • pp.800-816
    • /
    • 2018
  • In recent years, video surveillance research has been able to recognize various behaviors of pedestrians and analyze the overall situation of objects by combining image analysis technology and deep learning method. Human Activity Recognition (HAR), which is important issue in video surveillance research, is a field to detect abnormal behavior of pedestrians in CCTV environment. In order to recognize human behavior, it is necessary to detect the human in the image and to estimate the pose from the detected human. In this paper, we propose a novel approach for 2D Human Pose Estimation based on object detection using RGB-D information. By adding depth information to the RGB information that has some limitation in detecting object due to lack of topological information, we can improve the detecting accuracy. Subsequently, the rescaled region of the detected object is applied to ConVol.utional Pose Machines (CPM) which is a sequential prediction structure based on ConVol.utional Neural Network. We utilize CPM to generate belief maps to predict the positions of keypoint representing human body parts and to estimate human pose by detecting 14 key body points. From the experimental results, we can prove that the proposed method detects target objects robustly in occlusion. It is also possible to perform 2D human pose estimation by providing an accurately detected region as an input of the CPM. As for the future work, we will estimate the 3D human pose by mapping the 2D coordinate information on the body part onto the 3D space. Consequently, we can provide useful human behavior information in the research of HAR.

Enhanced Sign Language Transcription System via Hand Tracking and Pose Estimation

  • Kim, Jung-Ho;Kim, Najoung;Park, Hancheol;Park, Jong C.
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.3
    • /
    • pp.95-101
    • /
    • 2016
  • In this study, we propose a new system for constructing parallel corpora for sign languages, which are generally under-resourced in comparison to spoken languages. In order to achieve scalability and accessibility regarding data collection and corpus construction, our system utilizes deep learning-based techniques and predicts depth information to perform pose estimation on hand information obtainable from video recordings by a single RGB camera. These estimated poses are then transcribed into expressions in SignWriting. We evaluate the accuracy of hand tracking and hand pose estimation modules of our system quantitatively, using the American Sign Language Image Dataset and the American Sign Language Lexicon Video Dataset. The evaluation results show that our transcription system has a high potential to be successfully employed in constructing a sizable sign language corpus using various types of video resources.