• 제목/요약/키워드: facial video

검색결과 126건 처리시간 0.148초

Fake News Detection on Social Media using Video Information: Focused on YouTube (영상정보를 활용한 소셜 미디어상에서의 가짜 뉴스 탐지: 유튜브를 중심으로)

  • Chang, Yoon Ho;Choi, Byoung Gu
    • The Journal of Information Systems
    • /
    • 제32권2호
    • /
    • pp.87-108
    • /
    • 2023
  • Purpose The main purpose of this study is to improve fake news detection performance by using video information to overcome the limitations of extant text- and image-oriented studies that do not reflect the latest news consumption trend. Design/methodology/approach This study collected video clips and related information including news scripts, speakers' facial expression, and video metadata from YouTube to develop fake news detection model. Based on the collected data, seven combinations of related information (i.e. scripts, video metadata, facial expression, scripts and video metadata, scripts and facial expression, and scripts, video metadata, and facial expression) were used as an input for taining and evaluation. The input data was analyzed using six models such as support vector machine and deep neural network. The area under the curve(AUC) was used to evaluate the performance of classification model. Findings The results showed that the ACU and accuracy values of three features combination (scripts, video metadata, and facial expression) were the highest in logistic regression, naïve bayes, and deep neural network models. This result implied that the fake news detection could be improved by using video information(video metadata and facial expression). Sample size of this study was relatively small. The generalizablity of the results would be enhanced with a larger sample size.

Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion

  • Zhou, Xuan
    • Journal of Information Processing Systems
    • /
    • 제17권2호
    • /
    • pp.337-351
    • /
    • 2021
  • Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.

CREATING JOYFUL DIGESTS BY EXPLOITING SMILE/LAUGHTER FACIAL EXPRESSIONS PRESENT IN VIDEO

  • Kowalik, Uwe;Hidaka, Kota;Irie, Go;Kojima, Akira
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 한국방송공학회 2009년도 IWAIT
    • /
    • pp.267-272
    • /
    • 2009
  • Video digests provide an effective way of confirming a video content rapidly due to their very compact form. By watching a digest, users can easily check whether a specific content is worth seeing in full. The impression created by the digest greatly influences the user's choice in selecting video contents. We propose a novel method of automatic digest creation that evokes a joyful impression through the created digest by exploiting smile/laughter facial expressions as emotional cues of joy from video. We assume that a digest presenting smiling/laughing faces appeals to the user since he/she is assured that the smile/laughter expression is caused by joyful events inside the video. For detecting smile/laughter faces we have developed a neural network based method for classifying facial expressions. Video segmentation is performed by automatic shot detection. For creating joyful digests, appropriate shots are automatically selected by shot ranking based on the smile/laughter detection result. We report the results of user trials conducted for assessing the visual impression with automatically created 'joyful' digests produced by our system. The results show that users tend to prefer emotional digests containing laughter faces. This result suggests that the attractiveness of automatically created video digests can be improved by extracting emotional cues of the contents through automatic facial expression analysis as proposed in this paper.

  • PDF

Detection of video editing points using facial keypoints (얼굴 특징점을 활용한 영상 편집점 탐지)

  • Joshep Na;Jinho Kim;Jonghyuk Park
    • Journal of Intelligence and Information Systems
    • /
    • 제29권4호
    • /
    • pp.15-30
    • /
    • 2023
  • Recently, various services using artificial intelligence(AI) are emerging in the media field as well However, most of the video editing, which involves finding an editing point and attaching the video, is carried out in a passive manner, requiring a lot of time and human resources. Therefore, this study proposes a methodology that can detect the edit points of video according to whether person in video are spoken by using Video Swin Transformer. First, facial keypoints are detected through face alignment. To this end, the proposed structure first detects facial keypoints through face alignment. Through this process, the temporal and spatial changes of the face are reflected from the input video data. And, through the Video Swin Transformer-based model proposed in this study, the behavior of the person in the video is classified. Specifically, after combining the feature map generated through Video Swin Transformer from video data and the facial keypoints detected through Face Alignment, utterance is classified through convolution layers. In conclusion, the performance of the image editing point detection model using facial keypoints proposed in this paper improved from 87.46% to 89.17% compared to the model without facial keypoints.

Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning

  • Maity, Sayan;Abdel-Mottaleb, Mohamed;Asfour, Shihab S.
    • Journal of Information Processing Systems
    • /
    • 제16권1호
    • /
    • pp.6-29
    • /
    • 2020
  • Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a deep learning network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Moreover, the proposed technique has proven robust when some of the above modalities were missing during the testing. The proposed system has three main components that are responsible for detection, which consists of modality specific detectors to automatically detect images of different modalities present in facial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capture discriminative representations that are robust to the illumination and pose variations; and classification, which consists of a set of modality specific sparse representation classifiers for unimodal recognition, followed by score level fusion of the recognition results of the available modalities. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips even in the situation of missing modalities.

Recognition of Human Facial Expression in a Video Image using the Active Appearance Model

  • Jo, Gyeong-Sic;Kim, Yong-Guk
    • Journal of Information Processing Systems
    • /
    • 제6권2호
    • /
    • pp.261-268
    • /
    • 2010
  • Tracking human facial expression within a video image has many useful applications, such as surveillance and teleconferencing, etc. Initially, the Active Appearance Model (AAM) was proposed for facial recognition; however, it turns out that the AAM has many advantages as regards continuous facial expression recognition. We have implemented a continuous facial expression recognition system using the AAM. In this study, we adopt an independent AAM using the Inverse Compositional Image Alignment method. The system was evaluated using the standard Cohn-Kanade facial expression database, the results of which show that it could have numerous potential applications.

A Local Feature-Based Robust Approach for Facial Expression Recognition from Depth Video

  • Uddin, Md. Zia;Kim, Jaehyoun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권3호
    • /
    • pp.1390-1403
    • /
    • 2016
  • Facial expression recognition (FER) plays a very significant role in computer vision, pattern recognition, and image processing applications such as human computer interaction as it provides sufficient information about emotions of people. For video-based facial expression recognition, depth cameras can be better candidates over RGB cameras as a person's face cannot be easily recognized from distance-based depth videos hence depth cameras also resolve some privacy issues that can arise using RGB faces. A good FER system is very much reliant on the extraction of robust features as well as recognition engine. In this work, an efficient novel approach is proposed to recognize some facial expressions from time-sequential depth videos. First of all, efficient Local Binary Pattern (LBP) features are obtained from the time-sequential depth faces that are further classified by Generalized Discriminant Analysis (GDA) to make the features more robust and finally, the LBP-GDA features are fed into Hidden Markov Models (HMMs) to train and recognize different facial expressions successfully. The depth information-based proposed facial expression recognition approach is compared to the conventional approaches such as Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA) where the proposed one outperforms others by obtaining better recognition rates.

Face Detection Using Multi-level Features for Privacy Protection in Large-scale Surveillance Video (대규모 비디오 감시 환경에서 프라이버시 보호를 위한 다중 레벨 특징 기반 얼굴검출 방법에 관한 연구)

  • Lee, Seung Ho;Moon, Jung Ik;Kim, Hyung-Il;Ro, Yong Man
    • Journal of Korea Multimedia Society
    • /
    • 제18권11호
    • /
    • pp.1268-1280
    • /
    • 2015
  • In video surveillance system, the exposure of a person's face is a serious threat to personal privacy. To protect the personal privacy in large amount of videos, an automatic face detection method is required to locate and mask the person's face. However, in real-world surveillance videos, the effectiveness of existing face detection methods could deteriorate due to large variations in facial appearance (e.g., facial pose, illumination etc.) or degraded face (e.g., occluded face, low-resolution face etc.). This paper proposes a new face detection method based on multi-level facial features. In a video frame, different kinds of spatial features are independently extracted, and analyzed, which could complement each other in the aforementioned challenges. Temporal domain analysis is also exploited to consolidate the proposed method. Experimental results show that, compared to competing methods, the proposed method is able to achieve very high recall rates while maintaining acceptable precision rates.

A Video Expression Recognition Method Based on Multi-mode Convolution Neural Network and Multiplicative Feature Fusion

  • Ren, Qun
    • Journal of Information Processing Systems
    • /
    • 제17권3호
    • /
    • pp.556-570
    • /
    • 2021
  • The existing video expression recognition methods mainly focus on the spatial feature extraction of video expression images, but tend to ignore the dynamic features of video sequences. To solve this problem, a multi-mode convolution neural network method is proposed to effectively improve the performance of facial expression recognition in video. Firstly, OpenFace 2.0 is used to detect face images in video, and two deep convolution neural networks are used to extract spatiotemporal expression features. Furthermore, spatial convolution neural network is used to extract the spatial information features of each static expression image, and the dynamic information feature is extracted from the optical flow information of multiple expression images based on temporal convolution neural network. Then, the spatiotemporal features learned by the two deep convolution neural networks are fused by multiplication. Finally, the fused features are input into support vector machine to realize the facial expression classification. Experimental results show that the recognition accuracy of the proposed method can reach 64.57% and 60.89%, respectively on RML and Baum-ls datasets. It is better than that of other contrast methods.

Security Verification of Video Telephony System Implemented on the DM6446 DaVinci Processor

  • Ghimire, Deepak;Kim, Joon-Cheol;Lee, Joon-Whoan
    • International Journal of Contents
    • /
    • 제8권1호
    • /
    • pp.16-22
    • /
    • 2012
  • In this paper we propose a method for verifying video in a video telephony system implemented in DM6446 DaVinci Processor. Each frame is categorized either error free frame or error frame depending on the predefined criteria. Human face is chosen as a basic means for authenticating the video frame. Skin color based algorithm is implemented for detecting the face in the video frame. The video frame is classified as error free frame if there is single face object with clear view of facial features (eyes, nose, mouth etc.) and the background of the image frame is not different then the predefined background, otherwise it will be classified as error frame. We also implemented the image histogram based NCC (Normalized Cross Correlation) comparison for video verification to speed up the system. The experimental result shows that the system is able to classify frames with 90.83% of accuracy.