• Title/Summary/Keyword: Multimodal recognition

Search Result 101, Processing Time 0.031 seconds

A Methodology of Multimodal Public Transportation Network Building and Path Searching Using Transportation Card Data (교통카드 기반자료를 활용한 복합대중교통망 구축 및 경로탐색 방안 연구)

  • Cheon, Seung-Hoon;Shin, Seong-Il;Lee, Young-Ihn;Lee, Chang-Ju
    • Journal of Korean Society of Transportation
    • /
    • v.26 no.3
    • /
    • pp.233-243
    • /
    • 2008
  • Recognition for the importance and roles of public transportation is increasing because of traffic problems in many cities. In spite of this paradigm change, previous researches related with public transportation trip assignment have limits in some aspects. Especially, in case of multimodal public transportation networks, many characters should be considered such as transfers. operational time schedules, waiting time and travel cost. After metropolitan integrated transfer discount system was carried out, transfer trips are increasing among traffic modes and this takes the variation of users' route choices. Moreover, the advent of high-technology public transportation card called smart card, public transportation users' travel information can be recorded automatically and this gives many researchers new analytical methodology for multimodal public transportation networks. In this paper, it is suggested that the methodology for establishment of brand new multimodal public transportation networks based on computer programming methods using transportation card data. First, we propose the building method of integrated transportation networks based on bus and urban railroad stations in order to make full use of travel information from transportation card data. Second, it is offered how to connect the broken transfer links by computer-based programming techniques. This is very helpful to solve the transfer problems that existing transportation networks have. Lastly, we give the methodology for users' paths finding and network establishment among multi-modes in multimodal public transportation networks. By using proposed methodology in this research, it becomes easy to build multimodal public transportation networks with existing bus and urban railroad station coordinates. Also, without extra works including transfer links connection, it is possible to make large-scaled multimodal public transportation networks. In the end, this study can contribute to solve users' paths finding problem among multi-modes which is regarded as an unsolved issue in existing transportation networks.

Improvement of Lipreading Performance Using Gabor Filter for Ship Environment (선박 환경에서 Gabor 여파기를 적용한 입술 읽기 성능향상)

  • Shin, Do-Sung;Lee, Seong-Ro;Kwon, Jang-Woo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.7C
    • /
    • pp.598-603
    • /
    • 2010
  • In this paper, we work for Lipreading using visual information for ship environment. Lipreading is studied for using image information including lips of a speaker at the existing speech recognition system. This technique is a compensation method to increase recognition rate decreasing remarkably in noisy circumstances. Proposed way improved the rate of recognition improving methode of preprocessing using the Gabor Filter for Ship Environment. The experiment were carried out under changing of light with time in the ship environment with lip image. For Comparing with recognition, make a compare with between method of lip region of interest (ROI) before Gabor filtering and after Gabor filtering. In the case of using method of lip ROI before Gabor filtering, the result of the experiments applying to the proposed ways recognition resulting in 44% of recognition.

Wavelet-based Statistical Noise Detection and Emotion Classification Method for Improving Multimodal Emotion Recognition (멀티모달 감정인식률 향상을 위한 웨이블릿 기반의 통계적 잡음 검출 및 감정분류 방법 연구)

  • Yoon, Jun-Han;Kim, Jin-Heon
    • Journal of IKEEE
    • /
    • v.22 no.4
    • /
    • pp.1140-1146
    • /
    • 2018
  • Recently, a methodology for analyzing complex bio-signals using a deep learning model has emerged among studies that recognize human emotions. At this time, the accuracy of emotion classification may be changed depending on the evaluation method and reliability depending on the kind of data to be learned. In the case of biological signals, the reliability of data is determined according to the noise ratio, so that the noise detection method is as important as that. Also, according to the methodology for defining emotions, appropriate emotional evaluation methods will be needed. In this paper, we propose a wavelet -based noise threshold setting algorithm for verifying the reliability of data for multimodal bio-signal data labeled Valence and Arousal and a method for improving the emotion recognition rate by weighting the evaluation data. After extracting the wavelet component of the signal using the wavelet transform, the distortion and kurtosis of the component are obtained, the noise is detected at the threshold calculated by the hampel identifier, and the training data is selected considering the noise ratio of the original signal. In addition, weighting is applied to the overall evaluation of the emotion recognition rate using the euclidean distance from the median value of the Valence-Arousal plane when classifying emotional data. To verify the proposed algorithm, we use ASCERTAIN data set to observe the degree of emotion recognition rate improvement.

Fusion algorithm for Integrated Face and Gait Identification (얼굴과 발걸음을 결합한 인식)

  • Nizami, Imran Fareed;Hong, Sug-Jun;Lee, Hee-Sung;Ann, Toh-Kar;Kim, Eun-Tai;Park, Mig-Non
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.15-18
    • /
    • 2007
  • Identification of humans from multiple view points is an important task for surveillance and security purposes. For optimal performance the system should use the maximum information available from sensors. Multimodal biometric systems are capable of utilizing more than one physiological or behavioral characteristic for enrollment, verification, or identification. Since gait alone is not yet established as a very distinctive feature, this paper presents an approach to fuse face and gait for identification. In this paper we will use the single camera case i.e. both the face and gait recognition is done using the same set of images captured by a single camera. The aim of this paper is to improve the performance of the system by utilizing the maximum amount of information available in the images. Fusion is considered at decision level. The proposed algorithm is tested on the NLPR database.

  • PDF

Performance Evaluation of Multimodal Biometric System for Normalization Methods and Classifiers (균등화 및 분류기에 따른 다중 생체 인식 시스템의 성능 평가)

  • Go, Hyoun-Ju;Woo, Na-Young;Shin, Yong-Nyuo;Kim, Jae-Sung;Kim, Hak-Il;Chun, Myung-Geun
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.4
    • /
    • pp.377-388
    • /
    • 2007
  • In this paper, we propose a multi-modal biometric system based on face, iris and fingerprint recognition system. To effectively aggregate two systems, we use statistical distribution models based on matching values for genuine and impostor, respectively. And then, We performed reveal fusion algorithms including weighted summation, Support Vector Machine(SVM), Fisher discriminant analysis, Bayesian classifier. From the various experiments, we found that the performance of multi-modal biometric system was influenced with the normalization methods and classifiers.

Interaction Intent Analysis of Multiple Persons using Nonverbal Behavior Features (인간의 비언어적 행동 특징을 이용한 다중 사용자의 상호작용 의도 분석)

  • Yun, Sang-Seok;Kim, Munsang;Choi, Mun-Taek;Song, Jae-Bok
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.19 no.8
    • /
    • pp.738-744
    • /
    • 2013
  • According to the cognitive science research, the interaction intent of humans can be estimated through an analysis of the representing behaviors. This paper proposes a novel methodology for reliable intention analysis of humans by applying this approach. To identify the intention, 8 behavioral features are extracted from the 4 characteristics in human-human interaction and we outline a set of core components for nonverbal behavior of humans. These nonverbal behaviors are associated with various recognition modules including multimodal sensors which have each modality with localizing sound source of the speaker in the audition part, recognizing frontal face and facial expression in the vision part, and estimating human trajectories, body pose and leaning, and hand gesture in the spatial part. As a post-processing step, temporal confidential reasoning is utilized to improve the recognition performance and integrated human model is utilized to quantitatively classify the intention from multi-dimensional cues by applying the weight factor. Thus, interactive robots can make informed engagement decision to effectively interact with multiple persons. Experimental results show that the proposed scheme works successfully between human users and a robot in human-robot interaction.

Human Action Recognition Using Pyramid Histograms of Oriented Gradients and Collaborative Multi-task Learning

  • Gao, Zan;Zhang, Hua;Liu, An-An;Xue, Yan-Bing;Xu, Guang-Ping
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.2
    • /
    • pp.483-503
    • /
    • 2014
  • In this paper, human action recognition using pyramid histograms of oriented gradients and collaborative multi-task learning is proposed. First, we accumulate global activities and construct motion history image (MHI) for both RGB and depth channels respectively to encode the dynamics of one action in different modalities, and then different action descriptors are extracted from depth and RGB MHI to represent global textual and structural characteristics of these actions. Specially, average value in hierarchical block, GIST and pyramid histograms of oriented gradients descriptors are employed to represent human motion. To demonstrate the superiority of the proposed method, we evaluate them by KNN, SVM with linear and RBF kernels, SRC and CRC models on DHA dataset, the well-known dataset for human action recognition. Large scale experimental results show our descriptors are robust, stable and efficient, and outperform the state-of-the-art methods. In addition, we investigate the performance of our descriptors further by combining these descriptors on DHA dataset, and observe that the performances of combined descriptors are much better than just using only sole descriptor. With multimodal features, we also propose a collaborative multi-task learning method for model learning and inference based on transfer learning theory. The main contributions lie in four aspects: 1) the proposed encoding the scheme can filter the stationary part of human body and reduce noise interference; 2) different kind of features and models are assessed, and the neighbor gradients information and pyramid layers are very helpful for representing these actions; 3) The proposed model can fuse the features from different modalities regardless of the sensor types, the ranges of the value, and the dimensions of different features; 4) The latent common knowledge among different modalities can be discovered by transfer learning to boost the performance.

Improvement of Environment Recognition using Multimodal Signal (멀티 신호를 이용한 환경 인식 성능 개선)

  • Park, Jun-Qyu;Baek, Seong-Joon
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.12
    • /
    • pp.27-33
    • /
    • 2010
  • In this study, we conducted the classification experiments with GMM (Gaussian Mixture Model) from combining the extracted features by using microphone, Gyro sensor and Acceleration sensor in 9 different environment types. Existing studies of Context Aware wanted to recognize the Environment situation mainly using the Environment sound data with microphone, but there was limitation of reflecting recognition owing to structural characteristics of Environment sound which are composed of various noises combination. Hence we proposed the additional application methods which added Gyro sensor and Acceleration sensor data in order to reflect recognition agent's movement feature. According to the experimental results, the method combining Acceleration sensor data with the data of existing Environment sound feature improves the recognition performance by more than 5%, when compared with existing methods of getting only Environment sound feature data from the Microphone.

Active Vision from Image-Text Multimodal System Learning (능동 시각을 이용한 이미지-텍스트 다중 모달 체계 학습)

  • Kim, Jin-Hwa;Zhang, Byoung-Tak
    • Journal of KIISE
    • /
    • v.43 no.7
    • /
    • pp.795-800
    • /
    • 2016
  • In image classification, recent CNNs compete with human performance. However, there are limitations in more general recognition. Herein we deal with indoor images that contain too much information to be directly processed and require information reduction before recognition. To reduce the amount of data processing, typically variational inference or variational Bayesian methods are suggested for object detection. However, these methods suffer from the difficulty of marginalizing over the given space. In this study, we propose an image-text integrated recognition system using active vision based on Spatial Transformer Networks. The system attempts to efficiently sample a partial region of a given image for a given language information. Our experimental results demonstrate a significant improvement over traditional approaches. We also discuss the results of qualitative analysis of sampled images, model characteristics, and its limitations.

Multidimensional Affective model-based Multimodal Complex Emotion Recognition System using Image, Voice and Brainwave (다차원 정서모델 기반 영상, 음성, 뇌파를 이용한 멀티모달 복합 감정인식 시스템)

  • Oh, Byung-Hun;Hong, Kwang-Seok
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.04a
    • /
    • pp.821-823
    • /
    • 2016
  • 본 논문은 다차원 정서모델 기반 영상, 음성, 뇌파를 이용한 멀티모달 복합 감정인식 시스템을 제안한다. 사용자의 얼굴 영상, 목소리 및 뇌파를 기반으로 각각 추출된 특징을 심리학 및 인지과학 분야에서 인간의 감정을 구성하는 정서적 감응요소로 알려진 다차원 정서모델(Arousal, Valence, Dominance)에 대한 명시적 감응 정도 데이터로 대응하여 스코어링(Scoring)을 수행한다. 이후, 스코어링을 통해 나온 결과 값을 이용하여 다차원으로 구성되는 3차원 감정 모델에 매핑하여 인간의 감정(단일감정, 복합감정)뿐만 아니라 감정의 세기까지 인식한다.