• Title/Summary/Keyword: Object-based Video Recognition

Search Result 108, Processing Time 0.026 seconds

Real-Time Moving Object Detection and Shadow Removal in Video Surveillance System (비디오 감시 시스템에서 실시간 움직이는 물체 검출 및 그림자 제거)

  • Lee, Young-Sook;Chung, Wan-Young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.574-578
    • /
    • 2009
  • Real-time object detection for distinguishing a moving object of interests from the background image in still image or video image sequence is an essential step to a correct object tracking and recognition. Moving cast shadow can be misclassified as part of objects or moving objects because the shadow region is included in the moving object region after object segmentation. For this reason, an algorithm for shadow removal plays an important role in the results of accurate moving object detection and tracking systems. To handle with the problems, an accurate algorithm based on the features of moving object and shadow in color space is presented in this paper. Experimental results show that the proposed algorithm is effective to detect a moving object and to remove shadow in test video sequences.

  • PDF

Video Analysis System for Action and Emotion Detection by Object with Hierarchical Clustering based Re-ID (계층적 군집화 기반 Re-ID를 활용한 객체별 행동 및 표정 검출용 영상 분석 시스템)

  • Lee, Sang-Hyun;Yang, Seong-Hun;Oh, Seung-Jin;Kang, Jinbeom
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.89-106
    • /
    • 2022
  • Recently, the amount of video data collected from smartphones, CCTVs, black boxes, and high-definition cameras has increased rapidly. According to the increasing video data, the requirements for analysis and utilization are increasing. Due to the lack of skilled manpower to analyze videos in many industries, machine learning and artificial intelligence are actively used to assist manpower. In this situation, the demand for various computer vision technologies such as object detection and tracking, action detection, emotion detection, and Re-ID also increased rapidly. However, the object detection and tracking technology has many difficulties that degrade performance, such as re-appearance after the object's departure from the video recording location, and occlusion. Accordingly, action and emotion detection models based on object detection and tracking models also have difficulties in extracting data for each object. In addition, deep learning architectures consist of various models suffer from performance degradation due to bottlenects and lack of optimization. In this study, we propose an video analysis system consists of YOLOv5 based DeepSORT object tracking model, SlowFast based action recognition model, Torchreid based Re-ID model, and AWS Rekognition which is emotion recognition service. Proposed model uses single-linkage hierarchical clustering based Re-ID and some processing method which maximize hardware throughput. It has higher accuracy than the performance of the re-identification model using simple metrics, near real-time processing performance, and prevents tracking failure due to object departure and re-emergence, occlusion, etc. By continuously linking the action and facial emotion detection results of each object to the same object, it is possible to efficiently analyze videos. The re-identification model extracts a feature vector from the bounding box of object image detected by the object tracking model for each frame, and applies the single-linkage hierarchical clustering from the past frame using the extracted feature vectors to identify the same object that failed to track. Through the above process, it is possible to re-track the same object that has failed to tracking in the case of re-appearance or occlusion after leaving the video location. As a result, action and facial emotion detection results of the newly recognized object due to the tracking fails can be linked to those of the object that appeared in the past. On the other hand, as a way to improve processing performance, we introduce Bounding Box Queue by Object and Feature Queue method that can reduce RAM memory requirements while maximizing GPU memory throughput. Also we introduce the IoF(Intersection over Face) algorithm that allows facial emotion recognized through AWS Rekognition to be linked with object tracking information. The academic significance of this study is that the two-stage re-identification model can have real-time performance even in a high-cost environment that performs action and facial emotion detection according to processing techniques without reducing the accuracy by using simple metrics to achieve real-time performance. The practical implication of this study is that in various industrial fields that require action and facial emotion detection but have many difficulties due to the fails in object tracking can analyze videos effectively through proposed model. Proposed model which has high accuracy of retrace and processing performance can be used in various fields such as intelligent monitoring, observation services and behavioral or psychological analysis services where the integration of tracking information and extracted metadata creates greate industrial and business value. In the future, in order to measure the object tracking performance more precisely, there is a need to conduct an experiment using the MOT Challenge dataset, which is data used by many international conferences. We will investigate the problem that the IoF algorithm cannot solve to develop an additional complementary algorithm. In addition, we plan to conduct additional research to apply this model to various fields' dataset related to intelligent video analysis.

Efficient Representation and Matching of Object Movement using Shape Sequence Descriptor (모양 시퀀스 기술자를 이용한 효과적인 동작 표현 및 검색 방법)

  • Choi, Min-Seok
    • The KIPS Transactions:PartB
    • /
    • v.15B no.5
    • /
    • pp.391-396
    • /
    • 2008
  • Motion of object in a video clip often plays an important role in characterizing the content of the clip. A number of methods have been developed to analyze and retrieve video contents using motion information. However, most of these methods focused more on the analysis of direction or trajectory of motion but less on the analysis of the movement of an object itself. In this paper, we propose the shape sequence descriptor to describe and compare the movement based on the shape deformation caused by object motion along the time. A movement information is first represented a sequence of 2D shape of object extracted from input image sequence, and then 2D shape information is converted 1D shape feature using the shape descriptor. The shape sequence descriptor is obtained from the shape descriptor sequence by frequency transform along the time. Our experiment results show that the proposed method can be very simple and effective to describe the object movement and can be applicable to semantic applications such as content-based video retrieval and human movement recognition.

Development of Real-time Video Search System Using the Intelligent Object Recognition Technology (지능형 객체 인식 기술을 이용한 실시간 동영상 검색시스템)

  • Chang, Jae-Young;Kang, Chan-Hyeok;Yoon, Jae-Min;Cho, Jae-Won;Jung, Ji-Sung;Chun, Jonghoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.6
    • /
    • pp.85-91
    • /
    • 2020
  • Recently, video-taping equipment such as CCTV have been seeing more use for crime prevention and general safety concerns. Since these video-taping equipment operates all throughout the day, the need for security personnel is lessened, and naturally costs incurred from managing such manpower should also decrease. However, technology currently used predominantly lacks self-sufficiency when given the task of searching for a specific object in the recorded video such as a person, and has to be done manually; current security-based video equipment is insufficient in an environment where real-time information retrieval is required. In this paper, we propose a technology that uses the latest deep-learning technology and OpenCV library to quickly search for a specific person in a video; the search is based on the clothing information that is inputted by the user and transmits the result in real time. We implemented our system to automatically recognize specific human objects in real time by using the YOLO library, whilst deep learning technology is used to classify human clothes into top/bottom clothes. Colors are also detected through the OpenCV library which are then all combined to identify the requested object. The system presented in this paper not only accurately and quickly recognizes a person object with a specific clothing, but also has a potential extensibility that can be used for other types of object recognition in a video surveillance system for various purposes.

An Efficient Car Management System based on an Object-Oriented Modeling using Car Number Recognition and Smart Phone (자동차 번호판 인식 및 스마트폰을 활용한 객체지향 설계 기반의 효율적인 차량 관리 시스템)

  • Jung, Se-Hoon;Kwon, Young-Wook;Sim, Chun-Bo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.7 no.5
    • /
    • pp.1153-1164
    • /
    • 2012
  • In this paper, we propose an efficient car management system based on object-oriented modeling using car number recognition and smart phone. The proposed system perceives car number of repair vehicle after recognizing the licence plate using an IP camera in real time. And then, existing repair history information of the recognized car is be displayed in DID. In addition, maintenance process is shooting video while auto maintenance mechanic repairs car through IP-camera. That will be provide customer car identification and repairs history management function by sending key frames extracted from recorded video automatically. We provide user graphic interface based on web and mobile for your convenience. The module design of the proposed system apply software design modeling based on granular object-oriented considering reuse and extensibility after implementation. Car repairs center and maintenance companies can improve business efficiency, as well as the requested vehicle repair can increase customer confidence.

A Method for Body Keypoint Localization based on Object Detection using the RGB-D information (RGB-D 정보를 이용한 객체 탐지 기반의 신체 키포인트 검출 방법)

  • Park, Seohee;Chun, Junchul
    • Journal of Internet Computing and Services
    • /
    • v.18 no.6
    • /
    • pp.85-92
    • /
    • 2017
  • Recently, in the field of video surveillance, a Deep Learning based learning method has been applied to a method of detecting a moving person in a video and analyzing the behavior of a detected person. The human activity recognition, which is one of the fields this intelligent image analysis technology, detects the object and goes through the process of detecting the body keypoint to recognize the behavior of the detected object. In this paper, we propose a method for Body Keypoint Localization based on Object Detection using RGB-D information. First, the moving object is segmented and detected from the background using color information and depth information generated by the two cameras. The input image generated by rescaling the detected object region using RGB-D information is applied to Convolutional Pose Machines for one person's pose estimation. CPM are used to generate Belief Maps for 14 body parts per person and to detect body keypoints based on Belief Maps. This method provides an accurate region for objects to detect keypoints an can be extended from single Body Keypoint Localization to multiple Body Keypoint Localization through the integration of individual Body Keypoint Localization. In the future, it is possible to generate a model for human pose estimation using the detected keypoints and contribute to the field of human activity recognition.

Context-aware Video Surveillance System

  • An, Tae-Ki;Kim, Moon-Hyun
    • Journal of Electrical Engineering and Technology
    • /
    • v.7 no.1
    • /
    • pp.115-123
    • /
    • 2012
  • A video analysis system used to detect events in video streams generally has several processes, including object detection, object trajectories analysis, and recognition of the trajectories by comparison with an a priori trained model. However, these processes do not work well in a complex environment that has many occlusions, mirror effects, and/or shadow effects. We propose a new approach to a context-aware video surveillance system to detect predefined contexts in video streams. The proposed system consists of two modules: a feature extractor and a context recognizer. The feature extractor calculates the moving energy that represents the amount of moving objects in a video stream and the stationary energy that represents the amount of still objects in a video stream. We represent situations and events as motion changes and stationary energy in video streams. The context recognizer determines whether predefined contexts are included in video streams using the extracted moving and stationary energies from a feature extractor. To train each context model and recognize predefined contexts in video streams, we propose and use a new ensemble classifier based on the AdaBoost algorithm, DAdaBoost, which is one of the most famous ensemble classifier algorithms. Our proposed approach is expected to be a robust method in more complex environments that have a mirror effect and/or a shadow effect.

Development of System for Real-Time Object Recognition and Matching using Deep Learning at Simulated Lunar Surface Environment (딥러닝 기반 달 표면 모사 환경 실시간 객체 인식 및 매칭 시스템 개발)

  • Jong-Ho Na;Jun-Ho Gong;Su-Deuk Lee;Hyu-Soung Shin
    • Tunnel and Underground Space
    • /
    • v.33 no.4
    • /
    • pp.281-298
    • /
    • 2023
  • Continuous research efforts are being devoted to unmanned mobile platforms for lunar exploration. There is an ongoing demand for real-time information processing to accurately determine the positioning and mapping of areas of interest on the lunar surface. To apply deep learning processing and analysis techniques to practical rovers, research on software integration and optimization is imperative. In this study, a foundational investigation has been conducted on real-time analysis of virtual lunar base construction site images, aimed at automatically quantifying spatial information of key objects. This study involved transitioning from an existing region-based object recognition algorithm to a boundary box-based algorithm, thus enhancing object recognition accuracy and inference speed. To facilitate extensive data-based object matching training, the Batch Hard Triplet Mining technique was introduced, and research was conducted to optimize both training and inference processes. Furthermore, an improved software system for object recognition and identical object matching was integrated, accompanied by the development of visualization software for the automatic matching of identical objects within input images. Leveraging satellite simulative captured video data for training objects and moving object-captured video data for inference, training and inference for identical object matching were successfully executed. The outcomes of this research suggest the feasibility of implementing 3D spatial information based on continuous-capture video data of mobile platforms and utilizing it for positioning objects within regions of interest. As a result, these findings are expected to contribute to the integration of an automated on-site system for video-based construction monitoring and control of significant target objects within future lunar base construction sites.

Human Action Recognition Based on 3D Human Modeling and Cyclic HMMs

  • Ke, Shian-Ru;Thuc, Hoang Le Uyen;Hwang, Jenq-Neng;Yoo, Jang-Hee;Choi, Kyoung-Ho
    • ETRI Journal
    • /
    • v.36 no.4
    • /
    • pp.662-672
    • /
    • 2014
  • Human action recognition is used in areas such as surveillance, entertainment, and healthcare. This paper proposes a system to recognize both single and continuous human actions from monocular video sequences, based on 3D human modeling and cyclic hidden Markov models (CHMMs). First, for each frame in a monocular video sequence, the 3D coordinates of joints belonging to a human object, through actions of multiple cycles, are extracted using 3D human modeling techniques. The 3D coordinates are then converted into a set of geometrical relational features (GRFs) for dimensionality reduction and discrimination increase. For further dimensionality reduction, k-means clustering is applied to the GRFs to generate clustered feature vectors. These vectors are used to train CHMMs separately for different types of actions, based on the Baum-Welch re-estimation algorithm. For recognition of continuous actions that are concatenated from several distinct types of actions, a designed graphical model is used to systematically concatenate different separately trained CHMMs. The experimental results show the effective performance of our proposed system in both single and continuous action recognition problems.

Low-Light Invariant Video Enhancement Scheme Using Zero Reference Deep Curve Estimation (Zero Deep Curve 추정방식을 이용한 저조도에 강인한 비디오 개선 방법)

  • Choi, Hyeong-Seok;Yang, Yoon Gi
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.8
    • /
    • pp.991-998
    • /
    • 2022
  • Recently, object recognition using image/video signals is rapidly spreading on autonomous driving and mobile phones. However, the actual input image/video signals are easily exposed to a poor illuminance environment. A recent researches for improving illumination enable to estimate and compensate the illumination parameters. In this study, we propose VE-DCE (video enhancement zero-reference deep curve estimation) to improve the illumination of low-light images. The proposed VE-DCE uses unsupervised learning-based zero-reference deep curve, which is one of the latest among learning based estimation techniques. Experimental results show that the proposed method can achieve the quality of low-light video as well as images compared to the previous method. In addition, it can reduce the computational complexity with respect to the existing method.