• Title/Summary/Keyword: Video extraction

Search Result 466, Processing Time 0.024 seconds

Performance Analysis for Accuracy of Personality Recognition Models based on Setting of Margin Values at Face Region Extraction (얼굴 영역 추출 시 여유값의 설정에 따른 개성 인식 모델 정확도 성능 분석)

  • Qiu Xu;Gyuwon Han;Bongjae Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.141-147
    • /
    • 2024
  • Recently, there has been growing interest in personalized services tailored to an individual's preferences. This has led to ongoing research aimed at recognizing and leveraging an individual's personality traits. Among various methods for personality assessment, the OCEAN model stands out as a prominent approach. In utilizing OCEAN for personality recognition, a multi modal artificial intelligence model that incorporates linguistic, paralinguistic, and non-linguistic information is often employed. This paper examines the impact of the margin value set for extracting facial areas from video data on the accuracy of a personality recognition model that uses facial expressions to determine OCEAN traits. The study employed personality recognition models based on 2D Patch Partition, R2plus1D, 3D Patch Partition, and Video Swin Transformer technologies. It was observed that setting the facial area extraction margin to 60 resulted in the highest 1-MAE performance, scoring at 0.9118. These findings indicate the importance of selecting an optimal margin value to maximize the efficiency of personality recognition models.

An Automatic Camera Tracking System for Video Surveillance

  • Lee, Sang-Hwa;Sharma, Siddharth;Lin, Sang-Lin;Park, Jong-Il
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2010.07a
    • /
    • pp.42-45
    • /
    • 2010
  • This paper proposes an intelligent video surveillance system for human object tracking. The proposed system integrates the object extraction, human object recognition, face detection, and camera control. First, the object in the video signals is extracted using the background subtraction. Then, the object region is examined whether it is human or not. For this recognition, the region-based shape descriptor, angular radial transform (ART) in MPEG-7, is used to learn and train the shapes of human bodies. When it is decided that the object is human or something to be investigated, the face region is detected. Finally, the face or object region is tracked in the video, and the pan/tilt/zoom (PTZ) controllable camera tracks the moving object with the motion information of the object. This paper performs the simulation with the real CCTV cameras and their communication protocol. According to the experiments, the proposed system is able to track the moving object(human) automatically not only in the image domain but also in the real 3-D space. The proposed system reduces the human supervisors and improves the surveillance efficiency with the computer vision techniques.

  • PDF

Segmentation and Appearance Features Index for Digital Video Data

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • v.8 no.6
    • /
    • pp.697-701
    • /
    • 2010
  • The numbers of digital video cameras are fast increased. Accordingly, digital video data management is becoming more important. Efficient storing method and fast browsing method still remains to be one of significant issue. In this paper, an optimized data storing process without losing information and an organized appearance features indexing method are proposed. Also, the data removing policy could be used to reduce large amount of space and it facilitates fast sequential search. The appearance features index constructs key information of moving objects to answer queries about what people are doing, particularly when, where and who they move. The evaluation results showed better performance in the transfer time and the saving in storage space.

Video Processing of MPEG Compressed Data For 3D Stereoscopic Conversion (3차원 입체 변환을 위한 MPGE 압축 데이터에서의 영상 처리 기법)

  • 김만배
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 1998.06a
    • /
    • pp.3-8
    • /
    • 1998
  • The conversion of monoscopic video to 3D stereoscopic video has been studied by some pioneering researchers. In spite of the commercial of potential of the technology, two problems have bothered the progress of this research area: vertical motion parallax and high computational complexity. The former causes the low 3D perception, while the hardware complexity is required by the latter. The previous research has dealt with NTSC video, thur requiring complex processing steps, one of which is motion estimation. This paper proposes 3D stereoscopic conversion method of MPGE encoded data. Our proposed method has the advantage that motion estimation can be avoided by processing MPEG compressed data for the extraction of motion data as well as that camera and object motion in random in random directions can be handled.

  • PDF

A Study on Real-time Face Detection in Video (동영상에서 실시간 얼굴검출에 관한 연구)

  • Kim, Hyeong-Gyun;Bae, Yong-Guen
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.2
    • /
    • pp.47-53
    • /
    • 2010
  • This paper proposed Residual Image detection and Color Info using the face detection technique. The proposed technique was fast processing speed and high rate of face detection on the video. In addition, this technique is to detection error rate reduced through the calibration tasks for tilted face image. The first process is to extract target image from the transmitted video images. Next, extracted image processed by window rotated algorithm for detection of tilted face image. Feature extraction for face detection was used for AdaBoost algorithm.

Extended Temporal Ordinal Measurement Using Spatially Normalized Mean for Video Copy Detection

  • Lee, Heung-Kyu;Kim, June
    • ETRI Journal
    • /
    • v.32 no.3
    • /
    • pp.490-492
    • /
    • 2010
  • This letter proposes a robust feature extraction method using a spatially normalized mean for temporal ordinal measurement. Before computing a rank matrix from the mean values of non-overlapped blocks, each block mean is normalized so that it obeys the invariance property against linear additive and subtractive noise effects and is insensitive against multiplied and divided noise effects. Then, the temporal ordinal measures of spatially normalized mean values are computed for the feature matching. The performance of the proposed method showed about 95% accuracy in both precision and recall rates on various distortion environments, which represents the 2.7% higher performance on average compared to the temporal ordinal measurement.

Fast key-frame extraction for 3D reconstruction from a handheld video

  • Choi, Jongho;Kwon, Soonchul;Son, Kwangchul;Yoo, Jisang
    • International journal of advanced smart convergence
    • /
    • v.5 no.4
    • /
    • pp.1-9
    • /
    • 2016
  • In order to reconstruct a 3D model in video sequences, to select key frames that are easy to estimate a geometric model is essential. This paper proposes a method to easily extract informative frames from a handheld video. The method combines selection criteria based on appropriate-baseline determination between frames, frame jumping for fast searching in the video, geometric robust information criterion (GRIC) scores for the frame-to-frame homography and fundamental matrix, and blurry-frame removal. Through experiments with videos taken in indoor space, the proposed method shows creating a more robust 3D point cloud than existing methods, even in the presence of motion blur and degenerate motions.

Extraction and Recognition of Character from MPEG-2 news Video Images (MPEG-2 뉴스영상에서 문자영역 추출 및 문자 인식)

  • Park, Yeong-Gyu;Kim, Seong-Guk;Yu, Won-Yeong;Kim, Jun-Cheol;Lee, Jun-Hwan
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.5
    • /
    • pp.1410-1417
    • /
    • 1999
  • In this paper, we propose the method of extracting the caption regions from news video and the method of recognizing the captions that can be used mainly for content-based indexing and retrieving the MPEG-2 compressed news for NOD(News On Demand). The proposed method can reduce the searching time on detecting caption frames with minimum MPEG-2 decoding, and effectively eliminate the noise in caption regions by deliberately devised preprocessing. Because the kind of fonts that are used for captions is not various in the news video, an enhanced template matching method is used for recognizing characters. We could obtain good recognition result in the experiment of sports news video by the proposed methods.

  • PDF

Improving Transformer with Dynamic Convolution and Shortcut for Video-Text Retrieval

  • Liu, Zhi;Cai, Jincen;Zhang, Mengmeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.7
    • /
    • pp.2407-2424
    • /
    • 2022
  • Recently, Transformer has made great progress in video retrieval tasks due to its high representation capability. For the structure of a Transformer, the cascaded self-attention modules are capable of capturing long-distance feature dependencies. However, the local feature details are likely to have deteriorated. In addition, increasing the depth of the structure is likely to produce learning bias in the learned features. In this paper, an improved Transformer structure named TransDCS (Transformer with Dynamic Convolution and Shortcut) is proposed. A Multi-head Conv-Self-Attention module is introduced to model the local dependencies and improve the efficiency of local features extraction. Meanwhile, the augmented shortcuts module based on a dual identity matrix is applied to enhance the conduction of input features, and mitigate the learning bias. The proposed model is tested on MSRVTT, LSMDC and Activity-Net benchmarks, and it surpasses all previous solutions for the video-text retrieval task. For example, on the LSMDC benchmark, a gain of about 2.3% MdR and 6.1% MnR is obtained over recently proposed multimodal-based methods.

Optimization of Action Recognition based on Slowfast Deep Learning Model using RGB Video Data (RGB 비디오 데이터를 이용한 Slowfast 모델 기반 이상 행동 인식 최적화)

  • Jeong, Jae-Hyeok;Kim, Min-Suk
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.8
    • /
    • pp.1049-1058
    • /
    • 2022
  • HAR(Human Action Recognition) such as anomaly and object detection has become a trend in research field(s) that focus on utilizing Artificial Intelligence (AI) methods to analyze patterns of human action in crime-ridden area(s), media services, and industrial facilities. Especially, in real-time system(s) using video streaming data, HAR has become a more important AI-based research field in application development and many different research fields using HAR have currently been developed and improved. In this paper, we propose and analyze a deep-learning-based HAR that provides more efficient scheme(s) using an intelligent AI models, such system can be applied to media services using RGB video streaming data usage without feature extraction pre-processing. For the method, we adopt Slowfast based on the Deep Neural Network(DNN) model under an open dataset(HMDB-51 or UCF101) for improvement in prediction accuracy.