• Title/Summary/Keyword: Video representation

Search Result 195, Processing Time 0.026 seconds

Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning

  • Maity, Sayan;Abdel-Mottaleb, Mohamed;Asfour, Shihab S.
    • Journal of Information Processing Systems
    • /
    • v.16 no.1
    • /
    • pp.6-29
    • /
    • 2020
  • Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a deep learning network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Moreover, the proposed technique has proven robust when some of the above modalities were missing during the testing. The proposed system has three main components that are responsible for detection, which consists of modality specific detectors to automatically detect images of different modalities present in facial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capture discriminative representations that are robust to the illumination and pose variations; and classification, which consists of a set of modality specific sparse representation classifiers for unimodal recognition, followed by score level fusion of the recognition results of the available modalities. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips even in the situation of missing modalities.

A Scheme for News Videos based on MPEG-7 and Its Summarization Mechanism by using the Key-Frames of Selected Shot Types (MPEG-7을 기반으로 한 뉴스 동영상 스키마 및 샷 종류별 키프레임을 이용한 요약 생성 방법)

  • Jeong, Jin-Guk;Sim, Jin-Sun;Nang, Jong-Ho;Kim, Gyung-Su;Ha, Myung-Hwan;Jung, Byung-Heei
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.5
    • /
    • pp.530-539
    • /
    • 2002
  • Recently, there have been a lot of researches to develop an archive system for news videos that usually has a fixed structure. However, since the meta-data representation and storing schemes for news video are different from each other in the previously proposed archive systems, it was very hard to exchange these meta-data. This paper proposes a scheme for news video based on MPEG-7 MDS that is an international standard to represent the contents of multimedia, and a summarization mechanism reflecting the characteristics of shots in the news videos. The proposed scheme for news video uses the MPEG-7 MDS schemes such as VideoSegment and TextAnnotation to keep the original structure of news video, and the proposed summarization mechanism uses a slide-show style presentation of key frames with associated audio to reduce the data size of the summary video.

Post-Processing for JPEG-Coded Image Deblocking via Sparse Representation and Adaptive Residual Threshold

  • Wang, Liping;Zhou, Xiao;Wang, Chengyou;Jiang, Baochen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.3
    • /
    • pp.1700-1721
    • /
    • 2017
  • The problem of blocking artifacts is very common in block-based image and video compression, especially at very low bit rates. In this paper, we propose a post-processing method for JPEG-coded image deblocking via sparse representation and adaptive residual threshold. This method includes three steps. First, we obtain the dictionary by online dictionary learning and the compressed images. The dictionary is then modified by the histogram of oriented gradient (HOG) feature descriptor and K-means cluster. Second, an adaptive residual threshold for orthogonal matching pursuit (OMP) is proposed and used for sparse coding by combining blind image blocking assessment. At last, to take advantage of human visual system (HVS), the edge regions of the obtained deblocked image can be further modified by the edge regions of the compressed image. The experimental results show that our proposed method can keep the image more texture and edge information while reducing the image blocking artifacts.

Person Re-identification using Sparse Representation with a Saliency-weighted Dictionary

  • Kim, Miri;Jang, Jinbeum;Paik, Joonki
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.4
    • /
    • pp.262-268
    • /
    • 2017
  • Intelligent video surveillance systems have been developed to monitor global areas and find specific target objects using a large-scale database. However, person re-identification presents some challenges, such as pose change and occlusions. To solve the problems, this paper presents an improved person re-identification method using sparse representation and saliency-based dictionary construction. The proposed method consists of three parts: i) feature description based on salient colors and textures for dictionary elements, ii) orthogonal atom selection using cosine similarity to deal with pose and viewpoint change, and iii) measurement of reconstruction error to rank the gallery corresponding a probe object. The proposed method provides good performance, since robust descriptors used as a dictionary atom are generated by weighting some salient features, and dictionary atoms are selected by reducing excessive redundancy causing low accuracy. Therefore, the proposed method can be applied in a large scale-database surveillance system to search for a specific object.

A robust Correlation Filter based tracker with rich representation and a relocation component

  • Jin, Menglei;Liu, Weibin;Xing, Weiwei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.10
    • /
    • pp.5161-5178
    • /
    • 2019
  • Correlation Filter was recently demonstrated to have good characteristics in the field of video object tracking. The advantages of Correlation Filter based trackers are reflected in the high accuracy and robustness it provides while maintaining a high speed. However, there are still some necessary improvements that should be made. First, most trackers cannot handle multi-scale problems. To solve this problem, our algorithm combines position estimation with scale estimation. The difference from the traditional method in regard to the scale estimation is that, the proposed method can track the scale of the object more quickly and effective. Additionally, in the feature extraction module, the feature representation of traditional algorithms is relatively simple, and furthermore, the tracking performance is easily affected in complex scenarios. In this paper, we design a novel and powerful feature that can significantly improve the tracking performance. Finally, traditional trackers often suffer from model drift, which is caused by occlusion and other complex scenarios. We introduce a relocation component to detect object at other locations such as the secondary peak of the response map. It partly alleviates the model drift problem.

Analysis on Signification of Actant for Representation in Dumb Ways to Die - as the Centre Semiotic Analysis of A. J. Greimas (Dumb Ways to Die에서 재현된 행위소의 의미해석작용 분석 - A.J . 그레마스의 기호학을 중심으로)

  • Kwon, Sangwoo
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.6
    • /
    • pp.1095-1105
    • /
    • 2016
  • This study is semiotic analysis about 'dumb ways to die' that is produced by Melbourne Railway of Australia in 2012. By analyzing the symbolic representations through 'Actants model' of A.J. Greimas, extract the relationship between signification of the represented object in 'dumb ways to die'. Greimas' model helps to analyze the semiotic interpretation action that occurs above a layer of discourse. In addition, this study is to compare the property to distinguish the Semiotic 'Actants' to cause the behavior of the recipient by represented in the same ideological situation. This can determine the properties of the signification and gauge the level of symbolic images that are reproduced in the discourse process. Accordingly, guidance to become a developer by extracting the reproduction principle of the video that causes a user's action.

A Study on block histogram's comparison for cut detection (컷 검출을 위한 블록별 히스토그램 비교에 관한 연구)

  • 고석만;김형균;오무송
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.5 no.7
    • /
    • pp.1301-1307
    • /
    • 2001
  • Video retrieval system must offer representation frame list to do to do play from point that user wants. Representation frame list can get though cut detection point exactly. This paper dismembers frame to fixed block to cut detection point, and compare same block histogram cost of next time frame. If result that compare does not exceed thresold, detect next frame to cutting.

  • PDF

Analysis on the Backgrounds Expression for 3D Animation (3D 애니메이션의 배경 표현에 관한 분석)

  • Park, Sung-Dae;Jung, Yee-Ji;Kim, Cheeyong
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.2
    • /
    • pp.268-276
    • /
    • 2015
  • This article analyzes the background representation of 3D animation and look at what its proper background expression. With the development of computer graphics technology, the background of the 3D animations can be expressed as The actual background. In contrast, "The Smurfs" which was released recently was created to take the actual background. However, 3D animation with real background is not appropriate in terms of creative expression space in the main role of the animation. In this Study, we analyze the character and background of the animation made in 3D graphics. Based on this, we propose a correct representation of 3D animation background.

Exploiting Chaotic Feature Vector for Dynamic Textures Recognition

  • Wang, Yong;Hu, Shiqiang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.11
    • /
    • pp.4137-4152
    • /
    • 2014
  • This paper investigates the description ability of chaotic feature vector to dynamic textures. First a chaotic feature and other features are calculated from each pixel intensity series. Then these features are combined to a chaotic feature vector. Therefore a video is modeled as a feature vector matrix. Next by the aid of bag of words framework, we explore the representation ability of the proposed chaotic feature vector. Finally we investigate recognition rate between different combinations of chaotic features. Experimental results show the merit of chaotic feature vector for pixel intensity series representation.

Motion Flow Analysis using Bi-directional Prediction-Independent Framework in MPEG Compressed Domain (압축 영역에서의 양방향 예측 구조를 이용한 움직임 흐름 분석)

  • 김낙우;김태용;최종수
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.5
    • /
    • pp.13-22
    • /
    • 2004
  • Because video sequence consists of dynamic objects in nature, the object motion in video is an effective feature in describing the contents of video sequence and motion feature plays an important role in video retrieval. In this paper, we propose a method that converts motion vectors (MVs) to a uniform set on MPEG coded domain, independent of the frame type and the direction of prediction, and utilizes these normalized MVs (N-MVs) as motion descriptor to understand video contents. We describe a frame-type independent representation of the various types of frames presented in an MPEG video in which all frames can be considered equivalently, without full-decoding. In the experiments, we show that the proposed method is better than the conventional one in terms of performance.