• Title/Summary/Keyword: Object-based Video Recognition

Search Result 108, Processing Time 0.025 seconds

2D Human Pose Estimation based on Object Detection using RGB-D information

  • Park, Seohee;Ji, Myunggeun;Chun, Junchul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.2
    • /
    • pp.800-816
    • /
    • 2018
  • In recent years, video surveillance research has been able to recognize various behaviors of pedestrians and analyze the overall situation of objects by combining image analysis technology and deep learning method. Human Activity Recognition (HAR), which is important issue in video surveillance research, is a field to detect abnormal behavior of pedestrians in CCTV environment. In order to recognize human behavior, it is necessary to detect the human in the image and to estimate the pose from the detected human. In this paper, we propose a novel approach for 2D Human Pose Estimation based on object detection using RGB-D information. By adding depth information to the RGB information that has some limitation in detecting object due to lack of topological information, we can improve the detecting accuracy. Subsequently, the rescaled region of the detected object is applied to ConVol.utional Pose Machines (CPM) which is a sequential prediction structure based on ConVol.utional Neural Network. We utilize CPM to generate belief maps to predict the positions of keypoint representing human body parts and to estimate human pose by detecting 14 key body points. From the experimental results, we can prove that the proposed method detects target objects robustly in occlusion. It is also possible to perform 2D human pose estimation by providing an accurately detected region as an input of the CPM. As for the future work, we will estimate the 3D human pose by mapping the 2D coordinate information on the body part onto the 3D space. Consequently, we can provide useful human behavior information in the research of HAR.

Implementation of the Broadcasting System for Digital Media Contents (디지털 미디어 콘텐츠 방송 시스템 구현)

  • Shin, Jae-Heung;Kim, Hong-Ryul;Lee, Sang-Cheal
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.10
    • /
    • pp.1883-1887
    • /
    • 2008
  • Most of digital media contents are composed with video and audio, picture and animation informations. Sometime, there is some deviation of information recognition quality for the video and audio information according to information receiver's characteristics or the understanding. But visual information using the text provide most clear and accurate ways for information recognition to human being. In this paper, we propose a new broadcasting system(BSDMC) to transmit clear and accurate meaning of the digital media contents. We implement general-purpose components to display the video, picture, text and symbol simultaneously. Only plug-in and call these components with proper parameters on the application developing tool, we can easily develop the multimedia contents broadcasting system. These components are implemented based on the object-oriented framework and modular structure so that increase the reusability and can be develop other applications quick and reliable.

Abnormal Situation Detection on Surveillance Video Using Object Detection and Action Recognition (객체 탐지와 행동인식을 이용한 영상내의 비정상적인 상황 탐지 네트워크)

  • Kim, Jeong-Hun;Choi, Jong-Hyeok;Park, Young-Ho;Nasridinov, Aziz
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.2
    • /
    • pp.186-198
    • /
    • 2021
  • Security control using surveillance cameras is established when people observe all surveillance videos directly. However, this task is labor-intensive and it is difficult to detect all abnormal situations. In this paper, we propose a deep neural network model, called AT-Net, that automatically detects abnormal situations in the surveillance video, and introduces an automatic video surveillance system developed based on this network model. In particular, AT-Net alleviates the ambiguity of existing abnormal situation detection methods by mapping features representing relationships between people and objects in surveillance video to the new tensor structure based on sparse coding. Through experiments on actual surveillance videos, AT-Net achieved an F1-score of about 89%, and improved abnormal situation detection performance by more than 25% compared to existing methods.

Development of a Cooking Assistance System Based on Voice and Video Object Recognition (음성 및 동영상 객체 인식 기반 요리 보조 시스템 개발)

  • Lee, Jong-Hwan;Kwak, Hee-Woong;Park, Gi-Su;Song, Mi-Hwa
    • Annual Conference of KIPS
    • /
    • 2022.05a
    • /
    • pp.727-729
    • /
    • 2022
  • 모바일 서비스에서 음성인식을 활용한 애플리케이션이 가져다 주는 편리함으로 레시피 애플리케이션에 접목시켜 데이터베이스를 사용한 레시피 추천, Google Video Intelligence API를 사용하여 객체 영상분할, Google Assistant를 활용한 음성인식을 기반으로 한 레시피 애플리케이션을 제공한다.

Distance Measurement Using the Kinect Sensor with Neuro-image Processing

  • Sharma, Kajal
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.6
    • /
    • pp.379-383
    • /
    • 2015
  • This paper presents an approach to detect object distance with the use of the recently developed low-cost Kinect sensor. The technique is based on Kinect color depth-image processing and can be used to design various computer-vision applications, such as object recognition, video surveillance, and autonomous path finding. The proposed technique uses keypoint feature detection in the Kinect depth image and advantages of depth pixels to directly obtain the feature distance in the depth images. This highly reduces the computational overhead and obtains the pixel distance in the Kinect captured images.

A Study on Swarm Robot-Based Invader-Enclosing Technique on Multiple Distributed Object Environments

  • Ko, Kwang-Eun;Park, Seung-Min;Park, Jun-Heong;Sim, Kwee-Bo
    • Journal of Electrical Engineering and Technology
    • /
    • v.6 no.6
    • /
    • pp.806-816
    • /
    • 2011
  • Interest about social security has recently increased in favor of safety for infrastructure. In addition, advances in computer vision and pattern recognition research are leading to video-based surveillance systems with improved scene analysis capabilities. However, such video surveillance systems, which are controlled by human operators, cannot actively cope with dynamic and anomalous events, such as having an invader in the corporate, commercial, or public sectors. For this reason, intelligent surveillance systems are increasingly needed to provide active social security services. In this study, we propose a core technique for intelligent surveillance system that is based on swarm robot technology. We present techniques for invader enclosing using swarm robots based on multiple distributed object environment. The proposed methods are composed of three main stages: location estimation of the object, specified object tracking, and decision of the cooperative behavior of the swarm robots. By using particle filter, object tracking and location estimation procedures are performed and a specified enclosing point for the swarm robots is located on the interactive positions in their coordinate system. Furthermore, the cooperative behaviors of the swarm robots are determined via the result of path navigation based on the combination of potential field and wall-following methods. The results of each stage are combined into the swarm robot-based invader-enclosing technique on multiple distributed object environments. Finally, several simulation results are provided to further discuss and verify the accuracy and effectiveness of the proposed techniques.

An Implementation of Embedded Linux System for Embossed Digit Recognition using CNN based Deep Learning (CNN 기반 딥러닝을 이용한 임베디드 리눅스 양각 문자 인식 시스템 구현)

  • Yu, Yeon-Seung;Kim, Cheong Ghil;Hong, Chung-Pyo
    • Journal of the Semiconductor & Display Technology
    • /
    • v.19 no.2
    • /
    • pp.100-104
    • /
    • 2020
  • Over the past several years, deep learning has been widely used for feature extraction in image and video for various applications such as object classification and facial recognition. This paper introduces an implantation of embedded Linux system for embossed digits recognition using CNN based deep learning methods. For this purpose, we implemented a coin recognition system based on deep learning with the Keras open source library on Raspberry PI. The performance evaluation has been made with the success rate of coin classification using the images captured with ultra-wide angle camera on Raspberry PI. The simulation result shows 98% of the success rate on average.

Determining Method of Factors for Effective Real Time Background Modeling (효과적인 실시간 배경 모델링을 위한 환경 변수 결정 방법)

  • Lee, Jun-Cheol;Ryu, Sang-Ryul;Kang, Sung-Hwan;Kim, Sung-Ho
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.1
    • /
    • pp.59-69
    • /
    • 2007
  • In the video with a various environment, background modeling is important for extraction and recognition the moving object. For this object recognition, many methods of the background modeling are proposed in a process of preprocess. Among these there is a Kumar method which represents the Queue-based background modeling. Because this has a fixed period of updating examination of the frame, there is a limit for various system. This paper use a background modeling based on the queue. We propose the method that major parameters are decided as adaptive by background model. They are the queue size of the sliding window, the sire of grouping by the brightness of the visual and the period of updating examination of the frame. In order to determine the factors, in every process, RCO (Ratio of Correct Object), REO (Ratio of Error Object) and UR (Update Ratio) are considered to be the standard of evaluation. The proposed method can improve the existing techniques of the background modeling which is unfit for the real-time processing and recognize the object more efficient.

Detection and Recognition of Illegally Parked Vehicles Based on an Adaptive Gaussian Mixture Model and a Seed Fill Algorithm

  • Sarker, Md. Mostafa Kamal;Weihua, Cai;Song, Moon Kyou
    • Journal of information and communication convergence engineering
    • /
    • v.13 no.3
    • /
    • pp.197-204
    • /
    • 2015
  • In this paper, we present an algorithm for the detection of illegally parked vehicles based on a combination of some image processing algorithms. A digital camera is fixed in the illegal parking region to capture the video frames. An adaptive Gaussian mixture model (GMM) is used for background subtraction in a complex environment to identify the regions of moving objects in our test video. Stationary objects are detected by using the pixel-level features in time sequences. A stationary vehicle is detected by using the local features of the object, and thus, information about illegally parked vehicles is successfully obtained. An automatic alarm system can be utilized according to the different regulations of different illegal parking regions. The results of this study obtained using a test video sequence of a real-time traffic scene show that the proposed method is effective.

A Collaborative Video Annotation and Browsing System using Linked Data (링크드 데이터를 이용한 협업적 비디오 어노테이션 및 브라우징 시스템)

  • Lee, Yeon-Ho;Oh, Kyeong-Jin;Sean, Vi-Sal;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.203-219
    • /
    • 2011
  • Previously common users just want to watch the video contents without any specific requirements or purposes. However, in today's life while watching video user attempts to know and discover more about things that appear on the video. Therefore, the requirements for finding multimedia or browsing information of objects that users want, are spreading with the increasing use of multimedia such as videos which are not only available on the internet-capable devices such as computers but also on smart TV and smart phone. In order to meet the users. requirements, labor-intensive annotation of objects in video contents is inevitable. For this reason, many researchers have actively studied about methods of annotating the object that appear on the video. In keyword-based annotation related information of the object that appeared on the video content is immediately added and annotation data including all related information about the object must be individually managed. Users will have to directly input all related information to the object. Consequently, when a user browses for information that related to the object, user can only find and get limited resources that solely exists in annotated data. Also, in order to place annotation for objects user's huge workload is required. To cope with reducing user's workload and to minimize the work involved in annotation, in existing object-based annotation automatic annotation is being attempted using computer vision techniques like object detection, recognition and tracking. By using such computer vision techniques a wide variety of objects that appears on the video content must be all detected and recognized. But until now it is still a problem facing some difficulties which have to deal with automated annotation. To overcome these difficulties, we propose a system which consists of two modules. The first module is the annotation module that enables many annotators to collaboratively annotate the objects in the video content in order to access the semantic data using Linked Data. Annotation data managed by annotation server is represented using ontology so that the information can easily be shared and extended. Since annotation data does not include all the relevant information of the object, existing objects in Linked Data and objects that appear in the video content simply connect with each other to get all the related information of the object. In other words, annotation data which contains only URI and metadata like position, time and size are stored on the annotation sever. So when user needs other related information about the object, all of that information is retrieved from Linked Data through its relevant URI. The second module enables viewers to browse interesting information about the object using annotation data which is collaboratively generated by many users while watching video. With this system, through simple user interaction the query is automatically generated and all the related information is retrieved from Linked Data and finally all the additional information of the object is offered to the user. With this study, in the future of Semantic Web environment our proposed system is expected to establish a better video content service environment by offering users relevant information about the objects that appear on the screen of any internet-capable devices such as PC, smart TV or smart phone.