• Title/Summary/Keyword: multimodal input

Search Result 34, Processing Time 0.024 seconds

An Implementation of Multimodal Speaker Verification System using Teeth Image and Voice on Mobile Environment (이동환경에서 치열영상과 음성을 이용한 멀티모달 화자인증 시스템 구현)

  • Kim, Dong-Ju;Ha, Kil-Ram;Hong, Kwang-Seok
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.5
    • /
    • pp.162-172
    • /
    • 2008
  • In this paper, we propose a multimodal speaker verification method using teeth image and voice as biometric trait for personal verification in mobile terminal equipment. The proposed method obtains the biometric traits using image and sound input devices of smart-phone that is one of mobile terminal equipments, and performs verification with biometric traits. In addition, the proposed method consists the multimodal-fashion of combining two biometric authentication scores for totally performance enhancement, the fusion method is accompanied a weighted-summation method which has comparative simple structure and superior performance for considering limited resources of system. The performance evaluation of proposed multimodal speaker authentication system conducts using a database acquired in smart-phone for 40 subjects. The experimental result shows 8.59% of EER in case of teeth verification 11.73% in case of voice verification and the multimodal speaker authentication result presented the 4.05% of EER. In the experimental result, we obtain the enhanced performance more than each using teeth and voice by using the simple weight-summation method in the multimodal speaker verification system.

Using Spatial Ontology in the Semantic Integration of Multimodal Object Manipulation in Virtual Reality

  • Irawati, Sylvia;Calderon, Daniela;Ko, Hee-Dong
    • Journal of the HCI Society of Korea
    • /
    • v.1 no.1
    • /
    • pp.9-20
    • /
    • 2006
  • This paper describes a framework for multimodal object manipulation in virtual environments. The gist of the proposed framework is the semantic integration of multimodal input using spatial ontology and user context to integrate the interpretation results from the inputs into a single one. The spatial ontology, describing the spatial relationships between objects, is used together with the current user context to solve ambiguities coming from the user's commands. These commands are used to reposition the objects in the virtual environments. We discuss how the spatial ontology is defined and used to assist the user to perform object placements in the virtual environment as it will be in the real world.

  • PDF

An Experimental Multimodal Command Control Interface toy Car Navigation Systems

  • Kim, Kyungnam;Ko, Jong-Gook;SeungHo choi;Kim, Jin-Young;Kim, Ki-Jung
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.249-252
    • /
    • 2000
  • An experimental multimodal system combining natural input modes such as speech, lip movement, and gaze is proposed in this paper. It benefits from novel human-compute. interaction (HCI) modalities and from multimodal integration for tackling the problem of the HCI bottleneck. This system allows the user to select menu items on the screen by employing speech recognition, lip reading, and gaze tracking components in parallel. Face tracking is a supplementary component to gaze tracking and lip movement analysis. These key components are reviewed and preliminary results are shown with multimodal integration and user testing on the prototype system. It is noteworthy that the system equipped with gaze tracking and lip reading is very effective in noisy environment, where the speech recognition rate is low, moreover, not stable. Our long term interest is to build a user interface embedded in a commercial car navigation system (CNS).

  • PDF

Automated detection of panic disorder based on multimodal physiological signals using machine learning

  • Eun Hye Jang;Kwan Woo Choi;Ah Young Kim;Han Young Yu;Hong Jin Jeon;Sangwon Byun
    • ETRI Journal
    • /
    • v.45 no.1
    • /
    • pp.105-118
    • /
    • 2023
  • We tested the feasibility of automated discrimination of patients with panic disorder (PD) from healthy controls (HCs) based on multimodal physiological responses using machine learning. Electrocardiogram (ECG), electrodermal activity (EDA), respiration (RESP), and peripheral temperature (PT) of the participants were measured during three experimental phases: rest, stress, and recovery. Eleven physiological features were extracted from each phase and used as input data. Logistic regression (LoR), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and multilayer perceptron (MLP) algorithms were implemented with nested cross-validation. Linear regression analysis showed that ECG and PT features obtained in the stress and recovery phases were significant predictors of PD. We achieved the highest accuracy (75.61%) with MLP using all 33 features. With the exception of MLP, applying the significant predictors led to a higher accuracy than using 24 ECG features. These results suggest that combining multimodal physiological signals measured during various states of autonomic arousal has the potential to differentiate patients with PD from HCs.

Dialog-based multi-item recommendation using automatic evaluation

  • Euisok Chung;Hyun Woo Kim;Byunghyun Yoo;Ran Han;Jeongmin Yang;Hwa Jeon Song
    • ETRI Journal
    • /
    • v.46 no.2
    • /
    • pp.277-289
    • /
    • 2024
  • In this paper, we describe a neural network-based application that recommends multiple items using dialog context input and simultaneously outputs a response sentence. Further, we describe a multi-item recommendation by specifying it as a set of clothing recommendations. For this, a multimodal fusion approach that can process both cloth-related text and images is required. We also examine achieving the requirements of downstream models using a pretrained language model. Moreover, we propose a gate-based multimodal fusion and multiprompt learning based on a pretrained language model. Specifically, we propose an automatic evaluation technique to solve the one-to-many mapping problem of multi-item recommendations. A fashion-domain multimodal dataset based on Koreans is constructed and tested. Various experimental environment settings are verified using an automatic evaluation method. The results show that our proposed method can be used to obtain confidence scores for multi-item recommendation results, which is different from traditional accuracy evaluation.

Multimodal Interaction Framework for Collaborative Augmented Reality in Education

  • Asiri, Dalia Mohammed Eissa;Allehaibi, Khalid Hamed;Basori, Ahmad Hoirul
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.268-282
    • /
    • 2022
  • One of the most important technologies today is augmented reality technology, it allows users to experience the real world using virtual objects that are combined with the real world. This technology is interesting and has become applied in many sectors such as the shopping and medicine, also it has been included in the sector of education. In the field of education, AR technology has become widely used due to its effectiveness. It has many benefits, such as arousing students' interest in learning imaginative concepts that are difficult to understand. On the other hand, studies have proven that collaborative between students increases learning opportunities by exchanging information, and this is known as Collaborative Learning. The use of multimodal creates a distinctive and interesting experience, especially for students, as it increases the interaction of users with the technologies. The research aims at developing collaborative framework for developing achievement of 6th graders through designing a framework that integrated a collaborative framework with a multimodal input "hand-gesture and touch", considering the development of an effective, fun and easy to use framework with a multimodal interaction in AR technology that was applied to reformulate the genetics and traits lesson from the science textbook for the 6th grade, the first semester, the second lesson, in an interactive manner by creating a video based on the science teachers' consultations and a puzzle game in which the game images were inserted. As well, the framework adopted the cooperative between students to solve the questions. The finding showed a significant difference between post-test and pre-test of the experimental group on the mean scores of the science course at the level of remembering, understanding, and applying. Which indicates the success of the framework, in addition to the fact that 43 students preferred to use the framework over traditional education.

A study of using quality for Radial Basis Function based score-level fusion in multimodal biometrics (RBF 기반 유사도 단계 융합 다중 생체 인식에서의 품질 활용 방안 연구)

  • Choi, Hyun-Soek;Shin, Mi-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.5
    • /
    • pp.192-200
    • /
    • 2008
  • Multimodal biometrics is a method for personal authentication and verification using more than two types of biometrics data. RBF based score-level fusion uses pattern recognition algorithm for multimodal biometrics, seeking the optimal decision boundary to classify score feature vectors each of which consists of matching scores obtained from several unimodal biometrics system for each sample. In this case, all matching scores are assumed to have the same reliability. However, in recent research it is reported that the quality of input sample affects the result of biometrics. Currently the matching scores having low reliability caused by low quality of samples are not currently considered for pattern recognition modelling in multimodal biometrics. To solve this problem, in this paper, we proposed the RBF based score-level fusion approach which employs quality information of input biometrics data to adjust decision boundary. As a result the proposed method with Qualify information showed better recognition performance than both the unimodal biometrics and the usual RBF based score-level fusion without using quality information.

Design of Lightweight Artificial Intelligence System for Multimodal Signal Processing (멀티모달 신호처리를 위한 경량 인공지능 시스템 설계)

  • Kim, Byung-Soo;Lee, Jea-Hack;Hwang, Tae-Ho;Kim, Dong-Sun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.5
    • /
    • pp.1037-1042
    • /
    • 2018
  • The neuromorphic technology has been researched for decades, which learns and processes the information by imitating the human brain. The hardware implementations of neuromorphic systems are configured with highly parallel processing structures and a number of simple computational units. It can achieve high processing speed, low power consumption, and low hardware complexity. Recently, the interests of the neuromorphic technology for low power and small embedded systems have been increasing rapidly. To implement low-complexity hardware, it is necessary to reduce input data dimension without accuracy loss. This paper proposed a low-complexity artificial intelligent engine which consists of parallel neuron engines and a feature extractor. A artificial intelligent engine has a number of neuron engines and its controller to process multimodal sensor data. We verified the performance of the proposed neuron engine including the designed artificial intelligent engines, the feature extractor, and a Micro Controller Unit(MCU).

Ventral Striatal Connections of Unimodal and Multimodal Cortex of the Superior Temporal Sulcus in Macaque Monkeys(Macacca nemestrina)

  • Jung, Yong-Wook;Hong, Sung-Won
    • Animal cells and systems
    • /
    • v.8 no.4
    • /
    • pp.319-328
    • /
    • 2004
  • Extrinsic connections between the cortex of the superior temporal sulcus (STS) and the ventral striatum in pigtail macaque monkeys (Macacca nemestrina) were studied by injection of retrograde tracers into the ventromedial caudate nucleus, the ventral and central shells of the nucleus accumbens (NA), the dorsal core of the NA, and the ventrolateral putamen. In the present study, we demonstrate that the projections from the unimodal (area TAa, IPa, TEa, and TEm) and the multimodal (area TPO and PGa) sensory association areas in the STS mainly terminate in the ventromedial caudate nucleus as well as in the ventral and central shells of the NA. However, there are only few projections to the dorsal core of the NA and the ventrolateral putamen from the sensory association cortex in the STS. Based on these differential neural connections between the subterritories of the ventral striatum and the sensory association areas, the ventromedial caudate nucleus and the shells of NA appear to be major integration sites for sensory input from the STS and functionally different from the dorsal core of NA and the ventrolateral putamen.

A Deep Learning Based Approach to Recognizing Accompanying Status of Smartphone Users Using Multimodal Data (스마트폰 다종 데이터를 활용한 딥러닝 기반의 사용자 동행 상태 인식)

  • Kim, Kilho;Choi, Sangwoo;Chae, Moon-jung;Park, Heewoong;Lee, Jaehong;Park, Jonghun
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.163-177
    • /
    • 2019
  • As smartphones are getting widely used, human activity recognition (HAR) tasks for recognizing personal activities of smartphone users with multimodal data have been actively studied recently. The research area is expanding from the recognition of the simple body movement of an individual user to the recognition of low-level behavior and high-level behavior. However, HAR tasks for recognizing interaction behavior with other people, such as whether the user is accompanying or communicating with someone else, have gotten less attention so far. And previous research for recognizing interaction behavior has usually depended on audio, Bluetooth, and Wi-Fi sensors, which are vulnerable to privacy issues and require much time to collect enough data. Whereas physical sensors including accelerometer, magnetic field and gyroscope sensors are less vulnerable to privacy issues and can collect a large amount of data within a short time. In this paper, a method for detecting accompanying status based on deep learning model by only using multimodal physical sensor data, such as an accelerometer, magnetic field and gyroscope, was proposed. The accompanying status was defined as a redefinition of a part of the user interaction behavior, including whether the user is accompanying with an acquaintance at a close distance and the user is actively communicating with the acquaintance. A framework based on convolutional neural networks (CNN) and long short-term memory (LSTM) recurrent networks for classifying accompanying and conversation was proposed. First, a data preprocessing method which consists of time synchronization of multimodal data from different physical sensors, data normalization and sequence data generation was introduced. We applied the nearest interpolation to synchronize the time of collected data from different sensors. Normalization was performed for each x, y, z axis value of the sensor data, and the sequence data was generated according to the sliding window method. Then, the sequence data became the input for CNN, where feature maps representing local dependencies of the original sequence are extracted. The CNN consisted of 3 convolutional layers and did not have a pooling layer to maintain the temporal information of the sequence data. Next, LSTM recurrent networks received the feature maps, learned long-term dependencies from them and extracted features. The LSTM recurrent networks consisted of two layers, each with 128 cells. Finally, the extracted features were used for classification by softmax classifier. The loss function of the model was cross entropy function and the weights of the model were randomly initialized on a normal distribution with an average of 0 and a standard deviation of 0.1. The model was trained using adaptive moment estimation (ADAM) optimization algorithm and the mini batch size was set to 128. We applied dropout to input values of the LSTM recurrent networks to prevent overfitting. The initial learning rate was set to 0.001, and it decreased exponentially by 0.99 at the end of each epoch training. An Android smartphone application was developed and released to collect data. We collected smartphone data for a total of 18 subjects. Using the data, the model classified accompanying and conversation by 98.74% and 98.83% accuracy each. Both the F1 score and accuracy of the model were higher than the F1 score and accuracy of the majority vote classifier, support vector machine, and deep recurrent neural network. In the future research, we will focus on more rigorous multimodal sensor data synchronization methods that minimize the time stamp differences. In addition, we will further study transfer learning method that enables transfer of trained models tailored to the training data to the evaluation data that follows a different distribution. It is expected that a model capable of exhibiting robust recognition performance against changes in data that is not considered in the model learning stage will be obtained.