• Title/Summary/Keyword: Video recognition

Search Result 681, Processing Time 0.023 seconds

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

  • Liu, Min;Tang, Jun
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.754-771
    • /
    • 2021
  • In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.

Caption Detection Algorithm Using Temporal Information in Video (동영상에서 시간 영역 정보를 이용한 자막 검출 알고리듬)

  • 권철현;신청호;김수연;박상희
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.8
    • /
    • pp.606-610
    • /
    • 2004
  • A noble caption text detection and recognition algorithm using the temporal nature of video is proposed in this paper. A text registration technique is used to locate the temporal and spatial positions of captions in video from the accumulated frame difference information. Experimental results show that the proposed method is effective and robust. Also, a high processing speed is achieved since no time consuming operation is included.

Frame Mix-Up for Long-Term Temporal Context in Video Action Recognition

  • LEE, Dongho;CHOI, Jinwoo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.1278-1281
    • /
    • 2022
  • 현재 Action classification model은 computational resources의 제약으로 인해 video전체의 frame으로 학습하지 못한다. Model에 따라 다르지만, 대부분의 경우 하나의 action을 학습시키기 위해 보통 많게는 32frame, 적게는 8frame으로 model을 학습시킨다. 본 논문에서는 이 한계를 극복하기 위해 하나의 video의 많은 frame들을 mix-up과정을 거쳐 한장의 frame에 여러장의 frame 정보를 담고자 한다. 이 과정에서 video의 시간에 따른 변화(temporal- dynamics)를 손상시키지 않기 위해 linear mix-up이라는 방법을 제안하고 그 성능을 증명하며, 여러장의 frame을 mix-up시켜 모델의 성능을 향상시키는 가능성에 대해 논하고자 한다.

  • PDF

Effects of Caption-Utilized English Classes on Primary School Students' Character Recognition and Vocabulary Ability (자막을 활용한 영어수업이 초등학생의 문자인지 능력과 어휘력에 미치는 효과)

  • So, Suk;Lee, Je-Young;Hwang, Chee-Bok
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.7
    • /
    • pp.423-431
    • /
    • 2018
  • The purpose of the present study was to investigate the effect of caption-embedded video on character recognition and vocabulary ability of the primary school students, The subjects of this study were the students in two elementary schools in G city, Jeonbuk province. They were divided into two groups including a control group which used utilization video materials without captions, and a experimental group which used utilization video materials with captions. Each group was tested over the course of two months (10 classes). And then a statistical analysis was conducted to find out the effects of captions on character recognition and vocabulary ability through independent samples t-test and paired samples t-test. There were no significant differences in a comparison between the groups, but significant differences were found within the groups. Pedagogical implications based on the research findings and suggestions for further research were also discussed.

A Study on Recognition of Dangerous Behaviors using Privacy Protection Video in Single-person Household Environments

  • Lim, ChaeHyun;Kim, Myung Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.5
    • /
    • pp.47-54
    • /
    • 2022
  • Recently, with the development of deep learning technology, research on recognizing human behavior is in progress. In this paper, a study was conducted to recognize risky behaviors that may occur in a single-person household environment using deep learning technology. Due to the nature of single-person households, personal privacy protection is necessary. In this paper, we recognize human dangerous behavior in privacy protection video with Gaussian blur filters for privacy protection of individuals. The dangerous behavior recognition method uses the YOLOv5 model to detect and preprocess human object from video, and then uses it as an input value for the behavior recognition model to recognize dangerous behavior. The experiments used ResNet3D, I3D, and SlowFast models, and the experimental results show that the SlowFast model achieved the highest accuracy of 95.7% in privacy-protected video. Through this, it is possible to recognize human dangerous behavior in a single-person household environment while protecting individual privacy.

Virtual Contamination Lane Image and Video Generation Method for the Performance Evaluation of the Lane Departure Warning System (차선 이탈 경고 시스템의 성능 검증을 위한 가상의 오염 차선 이미지 및 비디오 생성 방법)

  • Kwak, Jae-Ho;Kim, Whoi-Yul
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.24 no.6
    • /
    • pp.627-634
    • /
    • 2016
  • In this paper, an augmented video generation method to evaluate the performance of lane departure warning system is proposed. In our system, the input is a video which have road scene with general clean lane, and the content of output video is the same but the lane is synthesized with contamination image. In order to synthesize the contamination lane image, two approaches were used. One is example-based image synthesis, and the other is background-based image synthesis. Example-based image synthesis is generated in the assumption of the situation that contamination is applied to the lane, and background-based image synthesis is for the situation that the lane is erased due to aging. In this paper, a new contamination pattern generation method using Gaussian function is also proposed in order to produce contamination with various shape and size. The contamination lane video can be generated by shifting synthesized image as lane movement amount obtained empirically. Our experiment showed that the similarity between the generated contamination lane image and real lane image is over 90 %. Futhermore, we can verify the reliability of the video generated from the proposed method through the analysis of the change of lane recognition rate. In other words, the recognition rate based on the video generated from the proposed method is very similar to that of the real contamination lane video.

An Efficient Implementation of Key Frame Extraction and Sharing in Android for Wireless Video Sensor Network

  • Kim, Kang-Wook
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.9
    • /
    • pp.3357-3376
    • /
    • 2015
  • Wireless sensor network is an important research topic that has attracted a lot of attention in recent years. However, most of the interest has focused on wireless sensor network to gather scalar data such as temperature, humidity and vibration. Scalar data are insufficient for diverse applications such as video surveillance, target recognition and traffic monitoring. However, if we use camera sensors in wireless sensor network to collect video data which are vast in information, they can provide important visual information. Video sensor networks continue to gain interest due to their ability to collect video information for a wide range of applications in the past few years. However, how to efficiently store the massive data that reflect environmental state of different times in video sensor network and how to quickly search interested information from them are challenging issues in current research, especially when the sensor network environment is complicated. Therefore, in this paper, we propose a fast algorithm for extracting key frames from video and describe the design and implementation of key frame extraction and sharing in Android for wireless video sensor network.

Technical and Managerial Requirements for Privacy Protection Using Face Detection and Recognition in CCTV Systems (영상감시 시스템에서의 얼굴 영상 정보보호를 위한 기술적·관리적 요구사항)

  • Shin, Yong-Nyuo;Chun, Myung Geun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.24 no.1
    • /
    • pp.97-106
    • /
    • 2014
  • CCTV(Closed Circuit television) is one of the widely used physical security technologies and video acquisition device installed at specific point with various purposes. Recently, as the CCTV capabilities improve, facial recognition from the information collected from CCTV video is under development. However, in case these technologies are exploited, concerns on major privacy infringement are high. Especially, a computer connected to a particular space images taken by the camera in real time over the Internet has emerged to show information services. In the privacy law, safety measures which is related with biometric template are notified. Accordingly, in this paper, for the protection of privacy video information in the video surveillance system, the technical and managerial requirements for video information security are suggested.

Real-time Identification of Traffic Light and Road Sign for the Next Generation Video-Based Navigation System (차세대 실감 내비게이션을 위한 실시간 신호등 및 표지판 객체 인식)

  • Kim, Yong-Kwon;Lee, Ki-Sung;Cho, Seong-Ik;Park, Jeong-Ho;Choi, Kyoung-Ho
    • Journal of Korea Spatial Information System Society
    • /
    • v.10 no.2
    • /
    • pp.13-24
    • /
    • 2008
  • A next generation video based car navigation is researched to supplement the drawbacks of existed 2D based navigation and to provide the various services for safety driving. The components of this navigation system could be a load object database, identification module for load lines, and crossroad identification module, etc. In this paper, we proposed the traffic lights and road sign recognition method which can be effectively exploited for crossroad recognition in video-based car navigation systems. The method uses object color information and other spatial features in the video image. The results show average 90% recognition rate from 30m to 60m distance for traffic lights and 97% at 40-90m distance for load sign. The algorithm also achieves 46msec/frame processing time which also indicates the appropriateness of the algorithm in real-time processing.

  • PDF

Implementation of the Broadcasting System for Digital Media Contents (디지털 미디어 콘텐츠 방송 시스템 구현)

  • Shin, Jae-Heung;Kim, Hong-Ryul;Lee, Sang-Cheal
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.10
    • /
    • pp.1883-1887
    • /
    • 2008
  • Most of digital media contents are composed with video and audio, picture and animation informations. Sometime, there is some deviation of information recognition quality for the video and audio information according to information receiver's characteristics or the understanding. But visual information using the text provide most clear and accurate ways for information recognition to human being. In this paper, we propose a new broadcasting system(BSDMC) to transmit clear and accurate meaning of the digital media contents. We implement general-purpose components to display the video, picture, text and symbol simultaneously. Only plug-in and call these components with proper parameters on the application developing tool, we can easily develop the multimedia contents broadcasting system. These components are implemented based on the object-oriented framework and modular structure so that increase the reusability and can be develop other applications quick and reliable.