Search | Korea Science

Video Analysis System for Action and Emotion Detection by Object with Hierarchical Clustering based Re-ID (계층적 군집화 기반 Re-ID를 활용한 객체별 행동 및 표정 검출용 영상 분석 시스템)

Lee, Sang-Hyun;Yang, Seong-Hun;Oh, Seung-Jin;Kang, Jinbeom
- Journal of Intelligence and Information Systems
- /
- v.28 no.1
- /
- pp.89-106
- /
- 2022
Recently, the amount of video data collected from smartphones, CCTVs, black boxes, and high-definition cameras has increased rapidly. According to the increasing video data, the requirements for analysis and utilization are increasing. Due to the lack of skilled manpower to analyze videos in many industries, machine learning and artificial intelligence are actively used to assist manpower. In this situation, the demand for various computer vision technologies such as object detection and tracking, action detection, emotion detection, and Re-ID also increased rapidly. However, the object detection and tracking technology has many difficulties that degrade performance, such as re-appearance after the object's departure from the video recording location, and occlusion. Accordingly, action and emotion detection models based on object detection and tracking models also have difficulties in extracting data for each object. In addition, deep learning architectures consist of various models suffer from performance degradation due to bottlenects and lack of optimization. In this study, we propose an video analysis system consists of YOLOv5 based DeepSORT object tracking model, SlowFast based action recognition model, Torchreid based Re-ID model, and AWS Rekognition which is emotion recognition service. Proposed model uses single-linkage hierarchical clustering based Re-ID and some processing method which maximize hardware throughput. It has higher accuracy than the performance of the re-identification model using simple metrics, near real-time processing performance, and prevents tracking failure due to object departure and re-emergence, occlusion, etc. By continuously linking the action and facial emotion detection results of each object to the same object, it is possible to efficiently analyze videos. The re-identification model extracts a feature vector from the bounding box of object image detected by the object tracking model for each frame, and applies the single-linkage hierarchical clustering from the past frame using the extracted feature vectors to identify the same object that failed to track. Through the above process, it is possible to re-track the same object that has failed to tracking in the case of re-appearance or occlusion after leaving the video location. As a result, action and facial emotion detection results of the newly recognized object due to the tracking fails can be linked to those of the object that appeared in the past. On the other hand, as a way to improve processing performance, we introduce Bounding Box Queue by Object and Feature Queue method that can reduce RAM memory requirements while maximizing GPU memory throughput. Also we introduce the IoF(Intersection over Face) algorithm that allows facial emotion recognized through AWS Rekognition to be linked with object tracking information. The academic significance of this study is that the two-stage re-identification model can have real-time performance even in a high-cost environment that performs action and facial emotion detection according to processing techniques without reducing the accuracy by using simple metrics to achieve real-time performance. The practical implication of this study is that in various industrial fields that require action and facial emotion detection but have many difficulties due to the fails in object tracking can analyze videos effectively through proposed model. Proposed model which has high accuracy of retrace and processing performance can be used in various fields such as intelligent monitoring, observation services and behavioral or psychological analysis services where the integration of tracking information and extracted metadata creates greate industrial and business value. In the future, in order to measure the object tracking performance more precisely, there is a need to conduct an experiment using the MOT Challenge dataset, which is data used by many international conferences. We will investigate the problem that the IoF algorithm cannot solve to develop an additional complementary algorithm. In addition, we plan to conduct additional research to apply this model to various fields' dataset related to intelligent video analysis.
https://doi.org/10.13088/jiis.2022.28.1.089 인용 PDF KSCI

Realtime Facial Expression Data Tracking System using Color Information (컬러 정보를 이용한 실시간 표정 데이터 추적 시스템)

Lee, Yun-Jung;Kim, Young-Bong
- The Journal of the Korea Contents Association
- /
- v.9 no.7
- /
- pp.159-170
- /
- 2009
It is very important to extract the expression data and capture a face image from a video for online-based 3D face animation. In recently, there are many researches on vision-based approach that captures the expression of an actor in a video and applies them to 3D face model. In this paper, we propose an automatic data extraction system, which extracts and traces a face and expression data from realtime video inputs. The procedures of our system consist of three steps: face detection, face feature extraction, and face tracing. In face detection, we detect skin pixels using YCbCr skin color model and verifies the face area using Haar-based classifier. We use the brightness and color information for extracting the eyes and lips data related facial expression. We extract 10 feature points from eyes and lips area considering FAP defined in MPEG-4. Then, we trace the displacement of the extracted features from continuous frames using color probabilistic distribution model. The experiments showed that our system could trace the expression data to about 8fps.
https://doi.org/10.5392/JKCA.2009.9.7.159 인용 PDF

Eye Location Algorithm For Natural Video-Conferencing (화상 회의 인터페이스를 위한 눈 위치 검출)

Lee, Jae-Jun;Choi, Jung-Il;Lee, Phill-Kyu
- The Transactions of the Korea Information Processing Society
- /
- v.4 no.12
- /
- pp.3211-3218
- /
- 1997
This paper addresses an eye location algorithm which is essential process of human face tracking system for natural video-conferencing. In current video-conferencing systems, user's facial movements are restricted by fixed camera, therefore it is inconvenient to users. We Propose an eye location algorithm for automatic face tracking. Because, locations of other facial features guessed from locations of eye and scale of face in the image can be calculated using inter-ocular distance. Most previous feature extraction methods for face recognition system are approached under assumption that approximative face region or location of each facial feature is known. The proposed algorithm in this paper uses no prior information on the given image. It is not sensitive to backgrounds and lighting conditions. The proposed algorithm uses the valley representation as major information to locate eyes. The experiments have been performed for 213 frames of 17 people and show very encouraging results.
PDF

Performance Analysis for Accuracy of Personality Recognition Models based on Setting of Margin Values at Face Region Extraction (얼굴 영역 추출 시 여유값의 설정에 따른 개성 인식 모델 정확도 성능 분석)

Qiu Xu;Gyuwon Han;Bongjae Kim
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.24 no.1
- /
- pp.141-147
- /
- 2024
Recently, there has been growing interest in personalized services tailored to an individual's preferences. This has led to ongoing research aimed at recognizing and leveraging an individual's personality traits. Among various methods for personality assessment, the OCEAN model stands out as a prominent approach. In utilizing OCEAN for personality recognition, a multi modal artificial intelligence model that incorporates linguistic, paralinguistic, and non-linguistic information is often employed. This paper examines the impact of the margin value set for extracting facial areas from video data on the accuracy of a personality recognition model that uses facial expressions to determine OCEAN traits. The study employed personality recognition models based on 2D Patch Partition, R2plus1D, 3D Patch Partition, and Video Swin Transformer technologies. It was observed that setting the facial area extraction margin to 60 resulted in the highest 1-MAE performance, scoring at 0.9118. These findings indicate the importance of selecting an optimal margin value to maximize the efficiency of personality recognition models.
https://doi.org/10.7236/JIIBC.2024.24.1.141 인용 PDF HTML

Automatic Poster Generation System Using Protagonist Face Analysis

Yeonhwi You;Sungjung Yong;Hyogyeong Park;Seoyoung Lee;Il-Young Moon
- Journal of information and communication convergence engineering
- /
- v.21 no.4
- /
- pp.287-293
- /
- 2023
With the rapid development of domestic and international over-the-top markets, a large amount of video content is being created. As the volume of video content increases, consumers tend to increasingly check data concerning the videos before watching them. To address this demand, video summaries in the form of plot descriptions, thumbnails, posters, and other formats are provided to consumers. This study proposes an approach that automatically generates posters to effectively convey video content while reducing the cost of video summarization. In the automatic generation of posters, face recognition and clustering are used to gather and classify character data, and keyframes from the video are extracted to learn the overall atmosphere of the video. This study used the facial data of the characters and keyframes as training data and employed technologies such as DreamBooth, a text-to-image generation model, to automatically generate video posters. This process significantly reduces the time and cost of video-poster production.
https://doi.org/10.56977/jicce.2023.21.4.287 인용 PDF

Object Segmentation for Image Transmission Services and Facial Characteristic Detection based on Knowledge (화상전송 서비스를 위한 객체 분할 및 지식 기반 얼굴 특징 검출)

Lim, Chun-Hwan;Yang, Hong-Young
- Journal of the Korean Institute of Telematics and Electronics T
- /
- v.36T no.3
- /
- pp.26-31
- /
- 1999
In this paper, we propose a facial characteristic detection algorithm based on knowledge and object segmentation method for image communication. In this algorithm, under the condition of the same lumination and distance from the fixed video camera to human face, we capture input images of 256 $\times$ 256 of gray scale 256 level and then remove the noise using the Gaussian filter. Two images are captured with a video camera, One contains the human face; the other contains only background region without including a face. And then we get a differential image between two images. After removing noise of the differential image by eroding End dilating, divide background image into a facial image. We separate eyes, ears, a nose and a mouth after searching the edge component in the facial image. From simulation results, we have verified the efficiency of the Proposed algorithm.
PDF

Multiresolution 3D Facial Model Compression (다해상도 3D 얼굴 모델의 압축)

박동희;이종석;이영식;배철수
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2002.05a
- /
- pp.602-607
- /
- 2002
In this paper, we proposed an approach to efficiently compress and transmit multiresoltion 3D lariat models for multimedia and very low bit rate applications. A personal facial model is obtained by a 3D laser digitizer, and further re-quantized at several resolutions according to different scope of applications, such as animation, video game, and video conference. By deforming 2D templates to match and re-quantize a 3D digitized facial model, we obtain its compressed model. In the present study, we create hierarchical 2D lariat wireframe templates are adapted according to facial feature points and the proposed piecewise chainlet affined transformation(PACT) method. The 3D digitized model after requantization are reduced significantly without perceptual loss. Moreover the proposed multiresoulation lariat models possessed of hierarchial data structure are apt to be progressively transmitted and displayed across internet.
PDF

Implementation of Hair Style Recommendation System Based on Big data and Deepfakes (빅데이터와 딥페이크 기반의 헤어스타일 추천 시스템 구현)

Tae-Kook Kim
- Journal of Internet of Things and Convergence
- /
- v.9 no.3
- /
- pp.13-19
- /
- 2023
In this paper, we investigated the implementation of a hairstyle recommendation system based on big data and deepfake technology. The proposed hairstyle recommendation system recognizes the facial shapes based on the user's photo (image). Facial shapes are classified into oval, round, and square shapes, and hairstyles that suit each facial shape are synthesized using deepfake technology and provided as videos. Hairstyles are recommended based on big data by applying the latest trends and styles that suit the facial shape. With the image segmentation map and the Motion Supervised Co-Part Segmentation algorithm, it is possible to synthesize elements between images belonging to the same category (such as hair, face, etc.). Next, the synthesized image with the hairstyle and a pre-defined video are applied to the Motion Representations for Articulated Animation algorithm to generate a video animation. The proposed system is expected to be used in various aspects of the beauty industry, including virtual fitting and other related areas. In future research, we plan to study the development of a smart mirror that recommends hairstyles and incorporates features such as Internet of Things (IoT) functionality.
https://doi.org/10.20465/KIOTS.2023.9.3.013 인용 PDF

A Study on Flow-emotion-state for Analyzing Flow-situation of Video Content Viewers (영상콘텐츠 시청자의 몰입상황 분석을 위한 몰입감정상태 연구)

Kim, Seunghwan;Kim, Cheolki
- Journal of Korea Multimedia Society
- /
- v.21 no.3
- /
- pp.400-414
- /
- 2018
It is required for today's video contents to interact with a viewer in order to provide more personalized experience to viewer(s) than before. In order to do so by providing friendly experience to a viewer from video contents' systemic perspective, understanding and analyzing the situation of the viewer have to be preferentially considered. For this purpose, it is effective to analyze the situation of a viewer by understanding the state of the viewer based on the viewer' s behavior(s) in the process of watching the video contents, and classifying the behavior(s) into the viewer's emotion and state during the flow. The term 'Flow-emotion-state' presented in this study is the state of the viewer to be assumed based on the emotions that occur subsequently in relation to the target video content in a situation which the viewer of the video content is already engaged in the viewing behavior. This Flow-emotion-state of a viewer can be expected to be utilized to identify characteristics of the viewer's Flow-situation by observing and analyzing the gesture and the facial expression that serve as the input modality of the viewer to the video content.
https://doi.org/10.9717/kmms.2018.21.3.400 인용 PDF KSCI

3D Facial Animation with Head Motion Estimation and Facial Expression Cloning (얼굴 모션 추정과 표정 복제에 의한 3차원 얼굴 애니메이션)

Kwon, Oh-Ryun;Chun, Jun-Chul
- The KIPS Transactions:PartB
- /
- v.14B no.4
- /
- pp.311-320
- /
- 2007
This paper presents vision-based 3D facial expression animation technique and system which provide the robust 3D head pose estimation and real-time facial expression control. Many researches of 3D face animation have been done for the facial expression control itself rather than focusing on 3D head motion tracking. However, the head motion tracking is one of critical issues to be solved for developing realistic facial animation. In this research, we developed an integrated animation system that includes 3D head motion tracking and facial expression control at the same time. The proposed system consists of three major phases: face detection, 3D head motion tracking, and facial expression control. For face detection, with the non-parametric HT skin color model and template matching, we can detect the facial region efficiently from video frame. For 3D head motion tracking, we exploit the cylindrical head model that is projected to the initial head motion template. Given an initial reference template of the face image and the corresponding head motion, the cylindrical head model is created and the foil head motion is traced based on the optical flow method. For the facial expression cloning we utilize the feature-based method, The major facial feature points are detected by the geometry of information of the face with template matching and traced by optical flow. Since the locations of varying feature points are composed of head motion and facial expression information, the animation parameters which describe the variation of the facial features are acquired from geometrically transformed frontal head pose image. Finally, the facial expression cloning is done by two fitting process. The control points of the 3D model are varied applying the animation parameters to the face model, and the non-feature points around the control points are changed by use of Radial Basis Function(RBF). From the experiment, we can prove that the developed vision-based animation system can create realistic facial animation with robust head pose estimation and facial variation from input video image.
https://doi.org/10.3745/KIPSTB.2007.14-B.4.311 인용 PDF KSCI

Search Result 126, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)