• 제목/요약/키워드: Video recognition

검색결과 681건 처리시간 0.023초

ADD-Net: Attention Based 3D Dense Network for Action Recognition

  • Man, Qiaoyue;Cho, Young Im
    • 한국컴퓨터정보학회논문지
    • /
    • 제24권6호
    • /
    • pp.21-28
    • /
    • 2019
  • Recent years with the development of artificial intelligence and the success of the deep model, they have been deployed in all fields of computer vision. Action recognition, as an important branch of human perception and computer vision system research, has attracted more and more attention. Action recognition is a challenging task due to the special complexity of human movement, the same movement may exist between multiple individuals. The human action exists as a continuous image frame in the video, so action recognition requires more computational power than processing static images. And the simple use of the CNN network cannot achieve the desired results. Recently, the attention model has achieved good results in computer vision and natural language processing. In particular, for video action classification, after adding the attention model, it is more effective to focus on motion features and improve performance. It intuitively explains which part the model attends to when making a particular decision, which is very helpful in real applications. In this paper, we proposed a 3D dense convolutional network based on attention mechanism(ADD-Net), recognition of human motion behavior in the video.

비디오에서 불투명 및 반투명 TV 로고 인식을 위한 로고 전이 검출 방법 (A Logo Transition Detection Method for Opaque and Semi-Transparent TV Logo Recognition in Video)

  • 노명철;강승연;이성환
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제35권12호
    • /
    • pp.753-763
    • /
    • 2008
  • UCC(User Created Contents)의 급격한 증가에 따라 저작권 문제도 크게 대두되고 있다. 자동 로고 인식은 이러한 저작권 문제를 해결하기 위한 효율적인 방법이다. 로고는 다양한 특징을 가지고 있고, 이러한 특징들은 로고 검출과 인식을 어렵게 한다. 특히, 비디오 내에 빈번한 로고 전이가 일어날 경우, 정확한 로고 인식과 로고 기반 분할이 어렵다. 따라서 본 논문에서는 디지털 비디오에서 로고 인식을 위한 정확한 전이 검출 방법과 다양한 로고 타입 인식 방법을 제안한다. 제안한 로고 검출과 로고에 따른 비디오 분할을 이용하여 다양한 비디오에 대한 좋은 실험 결과를 얻을 수 있었다.

음성-영상 특징 추출 멀티모달 모델을 이용한 감정 인식 모델 개발 (Development of Emotion Recognition Model Using Audio-video Feature Extraction Multimodal Model)

  • 김종구;권장우
    • 융합신호처리학회논문지
    • /
    • 제24권4호
    • /
    • pp.221-228
    • /
    • 2023
  • 감정으로 인해 생기는 신체적 정신적인 변화는 운전이나 학습 행동 등 다양한 행동에 영향을 미칠 수 있다. 따라서 이러한 감정을 인식하는 것은 운전 중 위험한 감정 인식 및 제어 등 다양한 산업에서 이용될 수 있기 때문에 매우 중요한 과업이다. 본 논문에는 서로 도메인이 다른 음성과 영상 데이터를 모두 이용하여 감정을 인식하는 멀티모달 모델을 구현하여 감정 인식 연구를 진행했다. 본 연구에서는 RAVDESS 데이터를 이용하여 영상 데이터에 음성을 추출한 뒤 2D-CNN을 이용한 모델을 통해 음성 데이터 특징을 추출하였으며 영상 데이터는 Slowfast feature extractor를 통해 영상 데이터 특징을 추출하였다. 감정 인식을 위한 제안된 멀티모달 모델에서 음성 데이터와 영상 데이터의 특징 벡터를 통합하여 감정 인식을 시도하였다. 또한 멀티모달 모델을 구현할 때 많이 쓰인 방법론인 각 모델의 결과 스코어를 합치는 방법, 투표하는 방법을 이용하여 멀티모달 모델을 구현하고 본 논문에서 제안하는 방법과 비교하여 각 모델의 성능을 확인하였다.

Dual-Stream Fusion and Graph Convolutional Network for Skeleton-Based Action Recognition

  • Hu, Zeyuan;Feng, Yiran;Lee, Eung-Joo
    • 한국멀티미디어학회논문지
    • /
    • 제24권3호
    • /
    • pp.423-430
    • /
    • 2021
  • Aiming Graph convolutional networks (GCNs) have achieved outstanding performances on skeleton-based action recognition. However, several problems remain in existing GCN-based methods, and the problem of low recognition rate caused by single input data information has not been effectively solved. In this article, we propose a Dual-stream fusion method that combines video data and skeleton data. The two networks respectively identify skeleton data and video data and fuse the probabilities of the two outputs to achieve the effect of information fusion. Experiments on two large dataset, Kinetics and NTU-RGBC+D Human Action Dataset, illustrate that our proposed method achieves state-of-the-art. Compared with the traditional method, the recognition accuracy is improved better.

얼굴 감정을 이용한 시청자 감정 패턴 분석 및 흥미도 예측 연구 (A Study on Sentiment Pattern Analysis of Video Viewers and Predicting Interest in Video using Facial Emotion Recognition)

  • 조인구;공연우;전소이;조서영;이도훈
    • 한국멀티미디어학회논문지
    • /
    • 제25권2호
    • /
    • pp.215-220
    • /
    • 2022
  • Emotion recognition is one of the most important and challenging areas of computer vision. Nowadays, many studies on emotion recognition were conducted and the performance of models is also improving. but, more research is needed on emotion recognition and sentiment analysis of video viewers. In this paper, we propose an emotion analysis system the includes a sentiment analysis model and an interest prediction model. We analyzed the emotional patterns of people watching popular and unpopular videos and predicted the level of interest using the emotion analysis system. Experimental results showed that certain emotions were strongly related to the popularity of videos and the interest prediction model had high accuracy in predicting the level of interest.

비디오에서 동체의 행위인지를 위한 효율적 학습 단위에 관한 연구 (A Study on Efficient Learning Units for Behavior-Recognition of People in Video)

  • 권익환;부베나 하제르;이도훈
    • 한국멀티미디어학회논문지
    • /
    • 제20권2호
    • /
    • pp.196-204
    • /
    • 2017
  • Behavior of intelligent video surveillance system is recognized by analyzing the pattern of the object of interest by using the frame information of video inputted from the camera and analyzes the behavior. Detection of object's certain behaviors in the crowd has become a critical problem because in the event of terror strikes. Recognition of object's certain behaviors is an important but difficult problem in the area of computer vision. As the realization of big data utilizing machine learning, data mining techniques, the amount of video through the CCTV, Smart-phone and Drone's video has increased dramatically. In this paper, we propose a multiple-sliding window method to recognize the cumulative change as one piece in order to improve the accuracy of the recognition. The experimental results demonstrated the method was robust and efficient learning units in the classification of certain behaviors.

Extraction of User Preference for Video Stimuli Using EEG-Based User Responses

  • Moon, Jinyoung;Kim, Youngrae;Lee, Hyungjik;Bae, Changseok;Yoon, Wan Chul
    • ETRI Journal
    • /
    • 제35권6호
    • /
    • pp.1105-1114
    • /
    • 2013
  • Owing to the large number of video programs available, a method for accessing preferred videos efficiently through personalized video summaries and clips is needed. The automatic recognition of user states when viewing a video is essential for extracting meaningful video segments. Although there have been many studies on emotion recognition using various user responses, electroencephalogram (EEG)-based research on preference recognition of videos is at its very early stages. This paper proposes classification models based on linear and nonlinear classifiers using EEG features of band power (BP) values and asymmetry scores for four preference classes. As a result, the quadratic-discriminant-analysis-based model using BP features achieves a classification accuracy of 97.39% (${\pm}0.73%$), and the models based on the other nonlinear classifiers using the BP features achieve an accuracy of over 96%, which is superior to that of previous work only for binary preference classification. The result proves that the proposed approach is sufficient for employment in personalized video segmentation with high accuracy and classification power.

비디오 게임 인터페이스를 위한 인식 기반 제스처 분할 (Recognition-Based Gesture Spotting for Video Game Interface)

  • 한은정;강현;정기철
    • 한국멀티미디어학회논문지
    • /
    • 제8권9호
    • /
    • pp.1177-1186
    • /
    • 2005
  • 키보드나 조이스틱 대신 카메라를 통해 입력되는 사용자의 제스처를 이용하는 시각 기반 비디오 게임 인터페이스를 사용할 때 자연스러운 동작을 허용하기 위해서는, 연속 제스처를 인식할 수 있고 사용자의 의미없는 동작이 허용되어야 한다. 본 논문에서는 비디오 게임 인터페이스를 위한 인식과 분할을 결합한 제스처 인식 방법을 제안하며, 이는 주어진 연속 영상에서 의미있는 동작을 인식함과 동시에 의미없는 동작을 구별하는 방법이다. 제안된 방법을 사용자의 상체 제스처를 게임의 명령어로 사용하는 1인칭 액션 게임인 Quke II 게임에 적용한 결과, 연속 제스처에 대해 평균 $93.36\%$의 분할 결과로써 비디오 게임 인터페이스에서 유용한 성능을 낼 수 있음을 보였다.

  • PDF

A Local Feature-Based Robust Approach for Facial Expression Recognition from Depth Video

  • Uddin, Md. Zia;Kim, Jaehyoun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권3호
    • /
    • pp.1390-1403
    • /
    • 2016
  • Facial expression recognition (FER) plays a very significant role in computer vision, pattern recognition, and image processing applications such as human computer interaction as it provides sufficient information about emotions of people. For video-based facial expression recognition, depth cameras can be better candidates over RGB cameras as a person's face cannot be easily recognized from distance-based depth videos hence depth cameras also resolve some privacy issues that can arise using RGB faces. A good FER system is very much reliant on the extraction of robust features as well as recognition engine. In this work, an efficient novel approach is proposed to recognize some facial expressions from time-sequential depth videos. First of all, efficient Local Binary Pattern (LBP) features are obtained from the time-sequential depth faces that are further classified by Generalized Discriminant Analysis (GDA) to make the features more robust and finally, the LBP-GDA features are fed into Hidden Markov Models (HMMs) to train and recognize different facial expressions successfully. The depth information-based proposed facial expression recognition approach is compared to the conventional approaches such as Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA) where the proposed one outperforms others by obtaining better recognition rates.

Misclassified Samples based Hierarchical Cascaded Classifier for Video Face Recognition

  • Fan, Zheyi;Weng, Shuqin;Zeng, Yajun;Jiang, Jiao;Pang, Fengqian;Liu, Zhiwen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권2호
    • /
    • pp.785-804
    • /
    • 2017
  • Due to various factors such as postures, facial expressions and illuminations, face recognition by videos often suffer from poor recognition accuracy and generalization ability, since the within-class scatter might even be higher than the between-class one. Herein we address this problem by proposing a hierarchical cascaded classifier for video face recognition, which is a multi-layer algorithm and accounts for the misclassified samples plus their similar samples. Specifically, it can be decomposed into single classifier construction and multi-layer classifier design stages. In single classifier construction stage, classifier is created by clustering and the number of classes is computed by analyzing distance tree. In multi-layer classifier design stage, the next layer is created for the misclassified samples and similar ones, then cascaded to a hierarchical classifier. The experiments on the database collected by ourselves show that the recognition accuracy of the proposed classifier outperforms the compared recognition algorithms, such as neural network and sparse representation.