• Title/Summary/Keyword: multimodal information fusion

Search Result 38, Processing Time 0.023 seconds

A study of using quality for Radial Basis Function based score-level fusion in multimodal biometrics (RBF 기반 유사도 단계 융합 다중 생체 인식에서의 품질 활용 방안 연구)

  • Choi, Hyun-Soek;Shin, Mi-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.5
    • /
    • pp.192-200
    • /
    • 2008
  • Multimodal biometrics is a method for personal authentication and verification using more than two types of biometrics data. RBF based score-level fusion uses pattern recognition algorithm for multimodal biometrics, seeking the optimal decision boundary to classify score feature vectors each of which consists of matching scores obtained from several unimodal biometrics system for each sample. In this case, all matching scores are assumed to have the same reliability. However, in recent research it is reported that the quality of input sample affects the result of biometrics. Currently the matching scores having low reliability caused by low quality of samples are not currently considered for pattern recognition modelling in multimodal biometrics. To solve this problem, in this paper, we proposed the RBF based score-level fusion approach which employs quality information of input biometrics data to adjust decision boundary. As a result the proposed method with Qualify information showed better recognition performance than both the unimodal biometrics and the usual RBF based score-level fusion without using quality information.

Deep Multimodal MRI Fusion Model for Brain Tumor Grading (뇌 종양 등급 분류를 위한 심층 멀티모달 MRI 통합 모델)

  • Na, In-ye;Park, Hyunjin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.416-418
    • /
    • 2022
  • Glioma is a type of brain tumor that occurs in glial cells and is classified into two types: high hrade hlioma with a poor prognosis and low grade glioma. Magnetic resonance imaging (MRI) as a non-invasive method is widely used in glioma diagnosis research. Studies to obtain complementary information by combining multiple modalities to overcome the incomplete information limitation of single modality are being conducted. In this study, we developed a 3D CNN-based model that applied input-level fusion to MRI of four modalities (T1, T1Gd, T2, T2-FLAIR). The trained model showed classification performance of 0.8926 accuracy, 0.9688 sensitivity, 0.6400 specificity, and 0.9467 AUC on the validation data. Through this, it was confirmed that the grade of glioma was effectively classified by learning the internal relationship between various modalities.

  • PDF

Multi-Object Goal Visual Navigation Based on Multimodal Context Fusion (멀티모달 맥락정보 융합에 기초한 다중 물체 목표 시각적 탐색 이동)

  • Jeong Hyun Choi;In Cheol Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.9
    • /
    • pp.407-418
    • /
    • 2023
  • The Multi-Object Goal Visual Navigation(MultiOn) is a visual navigation task in which an agent must visit to multiple object goals in an unknown indoor environment in a given order. Existing models for the MultiOn task suffer from the limitation that they cannot utilize an integrated view of multimodal context because use only a unimodal context map. To overcome this limitation, in this paper, we propose a novel deep neural network-based agent model for MultiOn task. The proposed model, MCFMO, uses a multimodal context map, containing visual appearance features, semantic features of environmental objects, and goal object features. Moreover, the proposed model effectively fuses these three heterogeneous features into a global multimodal context map by using a point-wise convolutional neural network module. Lastly, the proposed model adopts an auxiliary task learning module to predict the observation status, goal direction and the goal distance, which can guide to learn the navigational policy efficiently. Conducting various quantitative and qualitative experiments using the Habitat-Matterport3D simulation environment and scene dataset, we demonstrate the superiority of the proposed model.

The Effect of AI Agent's Multi Modal Interaction on the Driver Experience in the Semi-autonomous Driving Context : With a Focus on the Existence of Visual Character (반자율주행 맥락에서 AI 에이전트의 멀티모달 인터랙션이 운전자 경험에 미치는 효과 : 시각적 캐릭터 유무를 중심으로)

  • Suh, Min-soo;Hong, Seung-Hye;Lee, Jeong-Myeong
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.8
    • /
    • pp.92-101
    • /
    • 2018
  • As the interactive AI speaker becomes popular, voice recognition is regarded as an important vehicle-driver interaction method in case of autonomous driving situation. The purpose of this study is to confirm whether multimodal interaction in which feedback is transmitted by auditory and visual mode of AI characters on screen is more effective in user experience optimization than auditory mode only. We performed the interaction tasks for the music selection and adjustment through the AI speaker while driving to the experiment participant and measured the information and system quality, presence, the perceived usefulness and ease of use, and the continuance intention. As a result of analysis, the multimodal effect of visual characters was not shown in most user experience factors, and the effect was not shown in the intention of continuous use. Rather, it was found that auditory single mode was more effective than multimodal in information quality factor. In the semi-autonomous driving stage, which requires driver 's cognitive effort, multimodal interaction is not effective in optimizing user experience as compared to single mode interaction.

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

  • Liu, Min;Tang, Jun
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.754-771
    • /
    • 2021
  • In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.

Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning

  • Maity, Sayan;Abdel-Mottaleb, Mohamed;Asfour, Shihab S.
    • Journal of Information Processing Systems
    • /
    • v.16 no.1
    • /
    • pp.6-29
    • /
    • 2020
  • Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a deep learning network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Moreover, the proposed technique has proven robust when some of the above modalities were missing during the testing. The proposed system has three main components that are responsible for detection, which consists of modality specific detectors to automatically detect images of different modalities present in facial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capture discriminative representations that are robust to the illumination and pose variations; and classification, which consists of a set of modality specific sparse representation classifiers for unimodal recognition, followed by score level fusion of the recognition results of the available modalities. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips even in the situation of missing modalities.

Speech and Textual Data Fusion for Emotion Detection: A Multimodal Deep Learning Approach (감정 인지를 위한 음성 및 텍스트 데이터 퓨전: 다중 모달 딥 러닝 접근법)

  • Edward Dwijayanto Cahyadi;Mi-Hwa Song
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.526-527
    • /
    • 2023
  • Speech emotion recognition(SER) is one of the interesting topics in the machine learning field. By developing multi-modal speech emotion recognition system, we can get numerous benefits. This paper explain about fusing BERT as the text recognizer and CNN as the speech recognizer to built a multi-modal SER system.

Fusion algorithm for Integrated Face and Gait Identification (얼굴과 발걸음을 결합한 인식)

  • Nizami, Imran Fareed;Hong, Sug-Jun;Lee, Hee-Sung;Ann, Toh-Kar;Kim, Eun-Tai;Park, Mig-Non
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.15-18
    • /
    • 2007
  • Identification of humans from multiple view points is an important task for surveillance and security purposes. For optimal performance the system should use the maximum information available from sensors. Multimodal biometric systems are capable of utilizing more than one physiological or behavioral characteristic for enrollment, verification, or identification. Since gait alone is not yet established as a very distinctive feature, this paper presents an approach to fuse face and gait for identification. In this paper we will use the single camera case i.e. both the face and gait recognition is done using the same set of images captured by a single camera. The aim of this paper is to improve the performance of the system by utilizing the maximum amount of information available in the images. Fusion is considered at decision level. The proposed algorithm is tested on the NLPR database.

  • PDF

Intelligent Hybrid Fusion Algorithm with Vision Patterns for Generation of Precise Digital Road Maps in Self-driving Vehicles

  • Jung, Juho;Park, Manbok;Cho, Kuk;Mun, Cheol;Ahn, Junho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.10
    • /
    • pp.3955-3971
    • /
    • 2020
  • Due to the significant increase in the use of autonomous car technology, it is essential to integrate this technology with high-precision digital map data containing more precise and accurate roadway information, as compared to existing conventional map resources, to ensure the safety of self-driving operations. While existing map technologies may assist vehicles in identifying their locations via Global Positioning System, it is however difficult to update the environmental changes of roadways in these maps. Roadway vision algorithms can be useful for building autonomous vehicles that can avoid accidents and detect real-time location changes. We incorporate a hybrid architectural design that combines unsupervised classification of vision data with supervised joint fusion classification to achieve a better noise-resistant algorithm. We identify, via a deep learning approach, an intelligent hybrid fusion algorithm for fusing multimodal vision feature data for roadway classifications and characterize its improvement in accuracy over unsupervised identifications using image processing and supervised vision classifiers. We analyzed over 93,000 vision frame data collected from a test vehicle in real roadways. The performance indicators of the proposed hybrid fusion algorithm are successfully evaluated for the generation of roadway digital maps for autonomous vehicles, with a recall of 0.94, precision of 0.96, and accuracy of 0.92.

Multimodal Biometric Recognition System using Real Fuzzy Vault (실수형 퍼지볼트를 이용한 다중 바이오인식 시스템)

  • Lee, Dae-Jong;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.4
    • /
    • pp.310-316
    • /
    • 2013
  • Biometric techniques have been widely used for various areas including criminal identification due to their reliability. However, they have some drawbacks when the biometric information is divulged to illegal users. This paper proposed multimodal biometric system using a real fuzzy vault by RN-ECC for protecting fingerprint and face template. This proposed method has some advantages to regenerate a key value compared with face or fingerprint based verification system having non-regenerative nature and to implement advanced biometric verification system by fusion of both fingerprint and face recognition. From the various experiments, we found that the proposed method shows high recognition rates comparing with the conventional methods.