• 제목/요약/키워드: Multiple Modalities

검색결과 118건 처리시간 0.027초

얼굴 스푸핑 방지를 위한 다중 양식에 관한 연구 (A Study on Multiple Modalities for Face Anti-Spoofing)

  • 오신모;이효종
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2021년도 추계학술발표대회
    • /
    • pp.651-654
    • /
    • 2021
  • Face anti-spoofing (FAS) techniques play a significant role in the defense of facial recognition systems against spoofing attacks. Existing FAS methods achieve the great performance depending on annotated additional modalities. However, labeling these high-cost modalities need a lot of manpower, device resources and time. In this work, we proposed to use self-transforming modalities instead the annotated modalities. Three different modalities based on frequency domain and temporal domain are applied and analyzed. Intuitive visualization analysis shows the advantages of each modality. Comprehensive experiments in both the CNN-based and transformer-based architecture with various modalities combination demonstrate that self-transforming modalities improve the vanilla network a lot. The codes are available at https://github.com/chenmou0410/FAS-Challenge2021.

MD, PD법을 이용한 VDT 직무의 단기기억 다중자원처리에의 영향평가 (An evaluation of the effects of VDT tasks on multiple resources processing in working menory using MD, PD method)

  • 윤철호;노병옥
    • 대한인간공학회지
    • /
    • 제16권1호
    • /
    • pp.85-96
    • /
    • 1997
  • This article reviews the effects of VDT tasks on multiple resources for processing and storage in short-term working memory. MD and PD method were introduced toevaluate the modalities (auditory-visual) in the multiple resources model. The subjects conducted 2 sessions of 50 minites VDT tasks. Before, between and after VDT tasks, MD, PD task performance scores and CFF(critical flicker frequency0 values were measured. The review suggested that the modalities of human information processing in working memory were affected by VDT tasks with different task contents.

  • PDF

Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning

  • Maity, Sayan;Abdel-Mottaleb, Mohamed;Asfour, Shihab S.
    • Journal of Information Processing Systems
    • /
    • 제16권1호
    • /
    • pp.6-29
    • /
    • 2020
  • Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a deep learning network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Moreover, the proposed technique has proven robust when some of the above modalities were missing during the testing. The proposed system has three main components that are responsible for detection, which consists of modality specific detectors to automatically detect images of different modalities present in facial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capture discriminative representations that are robust to the illumination and pose variations; and classification, which consists of a set of modality specific sparse representation classifiers for unimodal recognition, followed by score level fusion of the recognition results of the available modalities. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips even in the situation of missing modalities.

Hybrid feature extraction of multimodal images for face recognition

  • Cheema, Usman;Moon, Seungbin
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2018년도 추계학술발표대회
    • /
    • pp.880-881
    • /
    • 2018
  • Recently technological advancements have allowed visible, infrared and thermal imaging systems to be readily available for security and access control. Increasing applications of facial recognition for security and access control leads to emerging spoofing methodologies. To overcome these challenges of occlusion, replay attack and disguise, researches have proposed using multiple imaging modalities. Using infrared and thermal modalities alongside visible imaging helps to overcome the shortcomings of visible imaging. In this paper we review and propose hybrid feature extraction methods to combine data from multiple imaging systems simultaneously.

Emotion Recognition based on Multiple Modalities

  • Kim, Dong-Ju;Lee, Hyeon-Gu;Hong, Kwang-Seok
    • 융합신호처리학회논문지
    • /
    • 제12권4호
    • /
    • pp.228-236
    • /
    • 2011
  • Emotion recognition plays an important role in the research area of human-computer interaction, and it allows a more natural and more human-like communication between humans and computer. Most of previous work on emotion recognition focused on extracting emotions from face, speech or EEG information separately. Therefore, a novel approach is presented in this paper, including face, speech and EEG, to recognize the human emotion. The individual matching scores obtained from face, speech, and EEG are combined using a weighted-summation operation, and the fused-score is utilized to classify the human emotion. In the experiment results, the proposed approach gives an improvement of more than 18.64% when compared to the most successful unimodal approach, and also provides better performance compared to approaches integrating two modalities each other. From these results, we confirmed that the proposed approach achieved a significant performance improvement and the proposed method was very effective.

DNN 학습을 이용한 퍼스널 비디오 시퀀스의 멀티 모달 기반 이벤트 분류 방법 (A Personal Video Event Classification Method based on Multi-Modalities by DNN-Learning)

  • 이유진;낭종호
    • 정보과학회 논문지
    • /
    • 제43권11호
    • /
    • pp.1281-1297
    • /
    • 2016
  • 최근 스마트 기기의 보급으로 자유롭게 비디오 컨텐츠를 생성하고 이를 빠르고 편리하게 공유할 수 있는 네트워크 환경이 갖추어지면서, 퍼스널 비디오가 급증하고 있다. 그러나, 퍼스널 비디오는 비디오라는 특성 상 멀티 모달리티로 구성되어 있으면서 데이터가 시간의 흐름에 따라 변화하기 때문에 이벤트 분류를 할 때 이에 대한 고려가 필요하다. 본 논문에서는 비디오 내의 멀티 모달리티들로부터 고수준의 특징을 추출하여 시간 순으로 재배열한 것을 바탕으로 모달리티 사이의 연관관계를 Deep Neural Network(DNN)으로 학습하여 퍼스널 비디오 이벤트를 분류하는 방법을 제안한다. 제안하는 방법은 비디오에 내포된 이미지와 오디오를 시간적으로 동기화하여 추출한 후 GoogLeNet과 Multi-Layer Perceptron(MLP)을 이용하여 각각 고수준 정보를 추출한다. 그리고 이들을 비디오에 표현된 시간순으로 재 배열하여 비디오 한 편당 하나의 특징으로 재 생성하고 이를 바탕으로 학습한 DNN을 이용하여 퍼스널 비디오 이벤트를 분류한다.

Emotion Recognition Implementation with Multimodalities of Face, Voice and EEG

  • Udurume, Miracle;Caliwag, Angela;Lim, Wansu;Kim, Gwigon
    • Journal of information and communication convergence engineering
    • /
    • 제20권3호
    • /
    • pp.174-180
    • /
    • 2022
  • Emotion recognition is an essential component of complete interaction between human and machine. The issues related to emotion recognition are a result of the different types of emotions expressed in several forms such as visual, sound, and physiological signal. Recent advancements in the field show that combined modalities, such as visual, voice and electroencephalography signals, lead to better result compared to the use of single modalities separately. Previous studies have explored the use of multiple modalities for accurate predictions of emotion; however the number of studies regarding real-time implementation is limited because of the difficulty in simultaneously implementing multiple modalities of emotion recognition. In this study, we proposed an emotion recognition system for real-time emotion recognition implementation. Our model was built with a multithreading block that enables the implementation of each modality using separate threads for continuous synchronization. First, we separately achieved emotion recognition for each modality before enabling the use of the multithreaded system. To verify the correctness of the results, we compared the performance accuracy of unimodal and multimodal emotion recognitions in real-time. The experimental results showed real-time user emotion recognition of the proposed model. In addition, the effectiveness of the multimodalities for emotion recognition was observed. Our multimodal model was able to obtain an accuracy of 80.1% as compared to the unimodality, which obtained accuracies of 70.9, 54.3, and 63.1%.

Enhancing Recommender Systems by Fusing Diverse Information Sources through Data Transformation and Feature Selection

  • Thi-Linh Ho;Anh-Cuong Le;Dinh-Hong Vu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권5호
    • /
    • pp.1413-1432
    • /
    • 2023
  • Recommender systems aim to recommend items to users by taking into account their probable interests. This study focuses on creating a model that utilizes multiple sources of information about users and items by employing a multimodality approach. The study addresses the task of how to gather information from different sources (modalities) and transform them into a uniform format, resulting in a multi-modal feature description for users and items. This work also aims to transform and represent the features extracted from different modalities so that the information is in a compatible format for integration and contains important, useful information for the prediction model. To achieve this goal, we propose a novel multi-modal recommendation model, which involves extracting latent features of users and items from a utility matrix using matrix factorization techniques. Various transformation techniques are utilized to extract features from other sources of information such as user reviews, item descriptions, and item categories. We also proposed the use of Principal Component Analysis (PCA) and Feature Selection techniques to reduce the data dimension and extract important features as well as remove noisy features to increase the accuracy of the model. We conducted several different experimental models based on different subsets of modalities on the MovieLens and Amazon sub-category datasets. According to the experimental results, the proposed model significantly enhances the accuracy of recommendations when compared to SVD, which is acknowledged as one of the most effective models for recommender systems. Specifically, the proposed model reduces the RMSE by a range of 4.8% to 21.43% and increases the Precision by a range of 2.07% to 26.49% for the Amazon datasets. Similarly, for the MovieLens dataset, the proposed model reduces the RMSE by 45.61% and increases the Precision by 14.06%. Additionally, the experimental results on both datasets demonstrate that combining information from multiple modalities in the proposed model leads to superior outcomes compared to relying on a single type of information.

W3C 기반 상호연동 가능한 멀티모달 커뮤니케이터 (W3C based Interoperable Multimodal Communicator)

  • 박대민;권대혁;최진혁;이인재;최해철
    • 방송공학회논문지
    • /
    • 제20권1호
    • /
    • pp.140-152
    • /
    • 2015
  • 최근 사용자와 컴퓨터간의 양방향 상호작용을 가능하게 하는 HCI(Human Computer Interaction) 연구를 위해 인간의 의사소통 체계와 유사한 인터페이스 기술들이 개발되고 있다. 이러한 인간과의 의사소통 과정에서 사용되는 커뮤니케이션 채널을 모달리티라고 부르며, 다양한 단말기 및 서비스 환경에 따라 최적의 사용자 인터페이스를 제공하기 위해서 두 개 이상의 모달리티를 활용하는 멀티모달 인터페이스가 활발히 연구되고 있다. 하지만, 멀티모달 인터페이스를 사용하기에는 각각의 모달리티가 갖는 정보 형식이 서로 상이하기 때문에 상호 연동이 어려우며 상호 보완적인 성능을 발휘하는데 한계가 있다. 이에 따라 본 논문은 W3C(World Wide Web Consortium)의 EMMA(Extensible Multimodal Annotation Markup language)와 MMI(Multimodal Interaction Framework)표준에 기반하여 복수의 모달리티를 상호연동할 수 있는 멀티모달 커뮤니케이터를 제안한다. 멀티모달 커뮤니케이터는 W3C 표준에 포함된 MC(Modality Component), IM(Interaction Manager), PC(Presentation Component)로 구성되며 국제 표준에 기반하여 설계하였기 때문에 다양한 모달리티의 수용 및 확장이 용이하다. 실험에서는 시선 추적과 동작 인식 모달리티를 이용하여 지도 탐색 시나리오에 멀티모달 커뮤니케이터를 적용한 사례를 제시한다.

Uncooperative Person Recognition Based on Stochastic Information Updates and Environment Estimators

  • Kim, Hye-Jin;Kim, Dohyung;Lee, Jaeyeon;Jeong, Il-Kwon
    • ETRI Journal
    • /
    • 제37권2호
    • /
    • pp.395-405
    • /
    • 2015
  • We address the problem of uncooperative person recognition through continuous monitoring. Multiple modalities, such as face, height, clothes color, and voice, can be used when attempting to recognize a person. In general, not all modalities are available for a given frame; furthermore, only some modalities will be useful as some frames in a video sequence are of a quality that is too low to be able to recognize a person. We propose a method that makes use of stochastic information updates of temporal modalities and environment estimators to improve person recognition performance. The environment estimators provide information on whether a given modality is reliable enough to be used in a particular instance; such indicators mean that we can easily identify and eliminate meaningless data, thus increasing the overall efficiency of the method. Our proposed method was tested using movie clips acquired under an unconstrained environment that included a wide variation of scale and rotation; illumination changes; uncontrolled distances from a camera to users (varying from 0.5 m to 5 m); and natural views of the human body with various types of noise. In this real and challenging scenario, our proposed method resulted in an outstanding performance.