• Title/Summary/Keyword: Learning Modalities

Search Result 45, Processing Time 0.028 seconds

Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning

  • Maity, Sayan;Abdel-Mottaleb, Mohamed;Asfour, Shihab S.
    • Journal of Information Processing Systems
    • /
    • v.16 no.1
    • /
    • pp.6-29
    • /
    • 2020
  • Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a deep learning network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Moreover, the proposed technique has proven robust when some of the above modalities were missing during the testing. The proposed system has three main components that are responsible for detection, which consists of modality specific detectors to automatically detect images of different modalities present in facial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capture discriminative representations that are robust to the illumination and pose variations; and classification, which consists of a set of modality specific sparse representation classifiers for unimodal recognition, followed by score level fusion of the recognition results of the available modalities. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips even in the situation of missing modalities.

An Analysis of Collaborative Visualization Processing of Text Information for Developing e-Learning Contents

  • SUNG, Eunmo
    • Educational Technology International
    • /
    • v.10 no.1
    • /
    • pp.25-40
    • /
    • 2009
  • The purpose of this study was to explore procedures and modalities on collaborative visualization processing of text information for developing e-Learning contents. In order to investigate, two research questions were explored: 1) what are procedures on collaborative visualization processing of text information, 2) what kinds of patterns and modalities can be found in each procedure of collaborative visualization of text information. This research method was employed a qualitative research approaches by means of grounded theory. As a result of this research, collaborative visualization processing of text information were emerged six steps: identifying text, analyzing text, exploring visual clues, creating visuals, discussing visuals, elaborating visuals, and creating visuals. Collaborative visualization processing of text information came out the characteristic of systemic and systematic system like spiral sequencing. Also, another result of this study, modalities in collaborative visualization processing of text information was divided two dimensions: individual processing by internal representation, social processing by external representation. This case study suggested that collaborative visualization strategy has full possibility of providing ideal methods for sharing cognitive system or thinking system as using human visual intelligence.

Human Action Recognition Using Pyramid Histograms of Oriented Gradients and Collaborative Multi-task Learning

  • Gao, Zan;Zhang, Hua;Liu, An-An;Xue, Yan-Bing;Xu, Guang-Ping
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.2
    • /
    • pp.483-503
    • /
    • 2014
  • In this paper, human action recognition using pyramid histograms of oriented gradients and collaborative multi-task learning is proposed. First, we accumulate global activities and construct motion history image (MHI) for both RGB and depth channels respectively to encode the dynamics of one action in different modalities, and then different action descriptors are extracted from depth and RGB MHI to represent global textual and structural characteristics of these actions. Specially, average value in hierarchical block, GIST and pyramid histograms of oriented gradients descriptors are employed to represent human motion. To demonstrate the superiority of the proposed method, we evaluate them by KNN, SVM with linear and RBF kernels, SRC and CRC models on DHA dataset, the well-known dataset for human action recognition. Large scale experimental results show our descriptors are robust, stable and efficient, and outperform the state-of-the-art methods. In addition, we investigate the performance of our descriptors further by combining these descriptors on DHA dataset, and observe that the performances of combined descriptors are much better than just using only sole descriptor. With multimodal features, we also propose a collaborative multi-task learning method for model learning and inference based on transfer learning theory. The main contributions lie in four aspects: 1) the proposed encoding the scheme can filter the stationary part of human body and reduce noise interference; 2) different kind of features and models are assessed, and the neighbor gradients information and pyramid layers are very helpful for representing these actions; 3) The proposed model can fuse the features from different modalities regardless of the sensor types, the ranges of the value, and the dimensions of different features; 4) The latent common knowledge among different modalities can be discovered by transfer learning to boost the performance.

Tumor Segmentation in Multimodal Brain MRI Using Deep Learning Approaches

  • Al Shehri, Waleed;Jannah, Najlaa
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.343-351
    • /
    • 2022
  • A brain tumor forms when some tissue becomes old or damaged but does not die when it must, preventing new tissue from being born. Manually finding such masses in the brain by analyzing MRI images is challenging and time-consuming for experts. In this study, our main objective is to detect the brain's tumorous part, allowing rapid diagnosis to treat the primary disease instantly. With image processing techniques and deep learning prediction algorithms, our research makes a system capable of finding a tumor in MRI images of a brain automatically and accurately. Our tumor segmentation adopts the U-Net deep learning segmentation on the standard MICCAI BRATS 2018 dataset, which has MRI images with different modalities. The proposed approach was evaluated and achieved Dice Coefficients of 0.9795, 0.9855, 0.9793, and 0.9950 across several test datasets. These results show that the proposed system achieves excellent segmentation of tumors in MRIs using deep learning techniques such as the U-Net algorithm.

Educational Paradigm Shift from E-Learning to Mobile Learning Toward Ubiquitous Learning (U-Learning을 위한 E-Learning에서 M-Learning으로의 교육적 패러다임 전환)

  • Kim, Hye-Jin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.11
    • /
    • pp.4788-4795
    • /
    • 2011
  • The purpose of this study is to present and review the possible effect of the learning paradigm shift from traditional method to ubiquitous learning, the societal issues that need to be address in order to design a new pedagogical platform trending from e-learning to m-learning and now the u-learning. That without the proper study of how learning environment may affect the learning process of an individual will lead to poor quality of education. This new era of learning environment offer a big opportunity for "anytime, anywhere" learning. Thus, Lifelong learning is at hand of everyone. Maximizing the benefit of new trend will be a great help and addressing the limitations will lead to quality education. The components that comprise the ubiquitous learning are also discussed together with the technologies that will make it possible. The research learning domains that are in progress which shows that the interest in pervasive or lifelong learning attracted the interest of many research institutions. The types of learning mode and learning modalities are also briefly discussed in this paper.

A Survey of Multimodal Systems and Techniques for Motor Learning

  • Tadayon, Ramin;McDaniel, Troy;Panchanathan, Sethuraman
    • Journal of Information Processing Systems
    • /
    • v.13 no.1
    • /
    • pp.8-25
    • /
    • 2017
  • This survey paper explores the application of multimodal feedback in automated systems for motor learning. In this paper, we review the findings shown in recent studies in this field using rehabilitation and various motor training scenarios as context. We discuss popular feedback delivery and sensing mechanisms for motion capture and processing in terms of requirements, benefits, and limitations. The selection of modalities is presented via our having reviewed the best-practice approaches for each modality relative to motor task complexity with example implementations in recent work. We summarize the advantages and disadvantages of several approaches for integrating modalities in terms of fusion and frequency of feedback during motor tasks. Finally, we review the limitations of perceptual bandwidth and provide an evaluation of the information transfer for each modality.

A Personal Video Event Classification Method based on Multi-Modalities by DNN-Learning (DNN 학습을 이용한 퍼스널 비디오 시퀀스의 멀티 모달 기반 이벤트 분류 방법)

  • Lee, Yu Jin;Nang, Jongho
    • Journal of KIISE
    • /
    • v.43 no.11
    • /
    • pp.1281-1297
    • /
    • 2016
  • In recent years, personal videos have seen a tremendous growth due to the substantial increase in the use of smart devices and networking services in which users create and share video content easily without many restrictions. However, taking both into account would significantly improve event detection performance because videos generally have multiple modalities and the frame data in video varies at different time points. This paper proposes an event detection method. In this method, high-level features are first extracted from multiple modalities in the videos, and the features are rearranged according to time sequence. Then the association of the modalities is learned by means of DNN to produce a personal video event detector. In our proposed method, audio and image data are first synchronized and then extracted. Then, the result is input into GoogLeNet as well as Multi-Layer Perceptron (MLP) to extract high-level features. The results are then re-arranged in time sequence, and every video is processed to extract one feature each for training by means of DNN.

Deep Multimodal MRI Fusion Model for Brain Tumor Grading (뇌 종양 등급 분류를 위한 심층 멀티모달 MRI 통합 모델)

  • Na, In-ye;Park, Hyunjin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.416-418
    • /
    • 2022
  • Glioma is a type of brain tumor that occurs in glial cells and is classified into two types: high hrade hlioma with a poor prognosis and low grade glioma. Magnetic resonance imaging (MRI) as a non-invasive method is widely used in glioma diagnosis research. Studies to obtain complementary information by combining multiple modalities to overcome the incomplete information limitation of single modality are being conducted. In this study, we developed a 3D CNN-based model that applied input-level fusion to MRI of four modalities (T1, T1Gd, T2, T2-FLAIR). The trained model showed classification performance of 0.8926 accuracy, 0.9688 sensitivity, 0.6400 specificity, and 0.9467 AUC on the validation data. Through this, it was confirmed that the grade of glioma was effectively classified by learning the internal relationship between various modalities.

  • PDF

Clinical Implementation of Deep Learning in Thoracic Radiology: Potential Applications and Challenges

  • Eui Jin Hwang;Chang Min Park
    • Korean Journal of Radiology
    • /
    • v.21 no.5
    • /
    • pp.511-525
    • /
    • 2020
  • Chest X-ray radiography and computed tomography, the two mainstay modalities in thoracic radiology, are under active investigation with deep learning technology, which has shown promising performance in various tasks, including detection, classification, segmentation, and image synthesis, outperforming conventional methods and suggesting its potential for clinical implementation. However, the implementation of deep learning in daily clinical practice is in its infancy and facing several challenges, such as its limited ability to explain the output results, uncertain benefits regarding patient outcomes, and incomplete integration in daily workflow. In this review article, we will introduce the potential clinical applications of deep learning technology in thoracic radiology and discuss several challenges for its implementation in daily clinical practice.

Predicting Session Conversion on E-commerce: A Deep Learning-based Multimodal Fusion Approach

  • Minsu Kim;Woosik Shin;SeongBeom Kim;Hee-Woong Kim
    • Asia pacific journal of information systems
    • /
    • v.33 no.3
    • /
    • pp.737-767
    • /
    • 2023
  • With the availability of big customer data and advances in machine learning techniques, the prediction of customer behavior at the session-level has attracted considerable attention from marketing practitioners and scholars. This study aims to predict customer purchase conversion at the session-level by employing customer profile, transaction, and clickstream data. For this purpose, we develop a multimodal deep learning fusion model with dynamic and static features (i.e., DS-fusion). Specifically, we base page views within focal visist and recency, frequency, monetary value, and clumpiness (RFMC) for dynamic and static features, respectively, to comprehensively capture customer characteristics for buying behaviors. Our model with deep learning architectures combines these features for conversion prediction. We validate the proposed model using real-world e-commerce data. The experimental results reveal that our model outperforms unimodal classifiers with each feature and the classical machine learning models with dynamic and static features, including random forest and logistic regression. In this regard, this study sheds light on the promise of the machine learning approach with the complementary method for different modalities in predicting customer behaviors.