• Title/Summary/Keyword: 멀티모달정보

Search Result 187, Processing Time 0.021 seconds

Multimodal Emotion Recognition using Face Image and Speech (얼굴영상과 음성을 이용한 멀티모달 감정인식)

  • Lee, Hyeon Gu;Kim, Dong Ju
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.8 no.1
    • /
    • pp.29-40
    • /
    • 2012
  • A challenging research issue that has been one of growing importance to those working in human-computer interaction are to endow a machine with an emotional intelligence. Thus, emotion recognition technology plays an important role in the research area of human-computer interaction, and it allows a more natural and more human-like communication between human and computer. In this paper, we propose the multimodal emotion recognition system using face and speech to improve recognition performance. The distance measurement of the face-based emotion recognition is calculated by 2D-PCA of MCS-LBP image and nearest neighbor classifier, and also the likelihood measurement is obtained by Gaussian mixture model algorithm based on pitch and mel-frequency cepstral coefficient features in speech-based emotion recognition. The individual matching scores obtained from face and speech are combined using a weighted-summation operation, and the fused-score is utilized to classify the human emotion. Through experimental results, the proposed method exhibits improved recognition accuracy of about 11.25% to 19.75% when compared to the most uni-modal approach. From these results, we confirmed that the proposed approach achieved a significant performance improvement and the proposed method was very effective.

Component Analysis for Constructing an Emotion Ontology (감정 온톨로지의 구축을 위한 구성요소 분석)

  • Yoon, Ae-Sun;Kwon, Hyuk-Chul
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.1
    • /
    • pp.157-175
    • /
    • 2010
  • Understanding dialogue participant's emotion is important as well as decoding the explicit message in human communication. It is well known that non-verbal elements are more suitable for conveying speaker's emotions than verbal elements. Written texts, however, contain a variety of linguistic units that express emotions. This study aims at analyzing components for constructing an emotion ontology, that provides us with numerous applications in Human Language Technology. A majority of the previous work in text-based emotion processing focused on the classification of emotions, the construction of a dictionary describing emotion, and the retrieval of those lexica in texts through keyword spotting and/or syntactic parsing techniques. The retrieved or computed emotions based on that process did not show good results in terms of accuracy. Thus, more sophisticate components analysis is proposed and the linguistic factors are introduced in this study. (1) 5 linguistic types of emotion expressions are differentiated in terms of target (verbal/non-verbal) and the method (expressive/descriptive/iconic). The correlations among them as well as their correlation with the non-verbal expressive type are also determined. This characteristic is expected to guarantees more adaptability to our ontology in multi-modal environments. (2) As emotion-related components, this study proposes 24 emotion types, the 5-scale intensity (-2~+2), and the 3-scale polarity (positive/negative/neutral) which can describe a variety of emotions in more detail and in standardized way. (3) We introduce verbal expression-related components, such as 'experiencer', 'description target', 'description method' and 'linguistic features', which can classify and tag appropriately verbal expressions of emotions. (4) Adopting the linguistic tag sets proposed by ISO and TEI and providing the mapping table between our classification of emotions and Plutchik's, our ontology can be easily employed for multilingual processing.

  • PDF

Facial Features and Motion Recovery using multi-modal information and Paraperspective Camera Model (다양한 형식의 얼굴정보와 준원근 카메라 모델해석을 이용한 얼굴 특징점 및 움직임 복원)

  • Kim, Sang-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.9B no.5
    • /
    • pp.563-570
    • /
    • 2002
  • Robust extraction of 3D facial features and global motion information from 2D image sequence for the MPEG-4 SNHC face model encoding is described. The facial regions are detected from image sequence using multi-modal fusion technique that combines range, color and motion information. 23 facial features among the MPEG-4 FDP (Face Definition Parameters) are extracted automatically inside the facial region using color transform (GSCD, BWCD) and morphological processing. The extracted facial features are used to recover the 3D shape and global motion of the object using paraperspective camera model and SVD (Singular Value Decomposition) factorization method. A 3D synthetic object is designed and tested to show the performance of proposed algorithm. The recovered 3D motion information is transformed into global motion parameters of FAP (Face Animation Parameters) of the MPEG-4 to synchronize a generic face model with a real face.

Human body learning system using multimodal and user-centric interfaces (멀티모달 사용자 중심 인터페이스를 적용한 인체 학습 시스템)

  • Kim, Ki-Min;Kim, Jae-Il;Park, Jin-Ah
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.85-90
    • /
    • 2008
  • This paper describes the human body learning system using the multi-modal user interface. Through our learning system, students can study about human anatomy interactively. The existing learning methods use the one-way materials like images, text and movies. But we propose the new learning system that includes 3D organ surface models, haptic interface and the hierarchical data structure of human organs to serve enhanced learning that utilizes sensorimotor skills.

  • PDF

Vision-based Walking Guidance System Using Top-view Transform and Beam-ray Model (탑-뷰 변환과 빔-레이 모델을 이용한 영상기반 보행 안내 시스템)

  • Lin, Qing;Han, Young-Joon;Hahn, Hern-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.12
    • /
    • pp.93-102
    • /
    • 2011
  • This paper presents a walking guidance system for blind pedestrians in an outdoor environment using just one single camera. Unlike many existing travel-aid systems that rely on stereo-vision, the proposed system aims to get necessary information of the road environment by using just single camera fixed at the belly of the user. To achieve this goal, a top-view image of the road is used, on which obstacles are detected by first extracting local extreme points and then verified by the polar edge histogram. Meanwhile, user motion is estimated by using optical flow in an area close to the user. Based on these information extracted from image domain, an audio message generation scheme is proposed to deliver guidance instructions via synthetic voice to the blind user. Experiments with several sidewalk video-clips show that the proposed walking guidance system is able to provide useful guidance instructions under certain sidewalk environments.

A Study on the Design of Digital Twin System and Required Function for Underground Lifelines (지하공동구 디지털 트윈 체계 및 요구기능 설계에 관한 연구)

  • Jeong, Min-Woo;Lee, Hee-Seok;Shin, Dong-Bin
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.7
    • /
    • pp.248-258
    • /
    • 2021
  • 24-hour monitoring is required to maintain the city's lifeline function in the underground facility for public utilities. And it is necessary to develop technology to exchange the shortage of human resources. It is difficult to reflect the specificity of underground space management in general management methods. This study proposes underground facility for public utilities digital twin system requirements. The concept of space is divided into physical space and virtual space, and the physical space constitutes the type and layout of the sensor that is the basis for the construction of the multimodal image sensor system, and the virtual space constitutes the system architecture. It also suggested system functions according to the task. It will be effective in preventing disasters and maintaining the lifeline function of the city through the digital twins.

Design of Big Semantic System for Factory Energy Management in IoE environments (IoE 환경에서 공장에너지 관리를 위한 빅시맨틱 시스템 설계)

  • Kwon, Soon-Hyun;Lee, Joa-Hyoung;Kim, Seon-Hyeog;Lee, Sang-Keum;Shin, Young-Mee;Doh, Yoon-Mee;Heo, Tae-Wook
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.37-39
    • /
    • 2022
  • 기존 IoE 환경에서 수집데이터는 특정 서비스를 위한 도메인 지식과 연계되어 서비스를 제공한다. 하지만 수집되는 데이터의 유형이 다양하고, 정적인 지식베이스가 상황에 따라 동적으로 변화하는 IoE 환경에서는 기존의 지식베이스 시스템을 통하여 원활한 서비스를 제공할 수 없었다. 따라서, 본 논문에서는 IoE 환경에서 발생하는 대용량/실시간성 데이터를 시맨틱으로 처리하여 공통 도메인 지식베이스와 연계하고 기존의 지식베이스 추론 방법과 기계학습 기반 지식 임베딩 기법을 통하여 지식 증강을 유기적으로 진행하는 빅시맨틱 시스템을 제시한다. 제시한 시스템은 IoE 환경의 멀티모달(정형, 비정형) 데이터를 수집하고 반자동적으로 시맨틱 변환을 수행하여 도메인 지식베이스에 저장하고, 시맨틱 추론을 통해 지식베이스를 증강 시키며 증강된 지식베이스를 포함한 전체 지식베이스를 정형 및 반정형 사용자 쿼리를 통해 지식정보를 사용자에게 제공한다. 또한, 기계학습 기반 지식 임베딩 기법을 통해 학습·예측을 함으로써, 기존의 지식베이스를 증강하는 기능을 수행한다. 본 논문에서 제시한 시스템은 공장내의 에너지 정보를 수집하여 공정 및 설비 상태 및 운영정보를 바탕으로 실시간 제어를 통한 에너지 절감 시스템인 공장 에너지 관리 시스템의 기반 기술로 구현될 예정이다.

Past Block Matching Motion Estimation based on Multiple Local Search Using Spatial Temporal Correlation (시공간적 상관성을 이용한 국소 다중 탐색기반 고속 블록정합 움직임 추정)

  • 조영창;남혜영;이태홍
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.4
    • /
    • pp.356-364
    • /
    • 2000
  • Block based fast motion estimation algorithm use the fixed search pattern to reduce the search point, and are based on the assumption that the error in the mean absolute error space monotonically decreases to the global minimum. Therefore, in case of many local minima in a search region we are likely to find local minima instead of the global minimum and highly rely on the initial search points. This situation is evident in the motion boundary. In this paper we define the candidate regions within the search region using the motion information of the neighbor blocks and we propose the multiple local search method (MLSM) which search for the solution throughout the candidate regions to reduce the possibilities of isolation to the local minima. In the MLSM we mark the candidate region in the search point map and we avoid to search the candidate regions already visited to reduce the calculation. In the simulation results the proposed method shows more excellent results than that of other gradient based method especially in the search of motion boundary. Especially, in PSNR the proposed method obtains similar estimate accuracy with the significant reduction of search points to that of full search.

  • PDF

Content-based Music Information Retrieval using Pitch Histogram (Pitch 히스토그램을 이용한 내용기반 음악 정보 검색)

  • 박만수;박철의;김회린;강경옥
    • Journal of Broadcast Engineering
    • /
    • v.9 no.1
    • /
    • pp.2-7
    • /
    • 2004
  • In this paper, we proposed the content-based music information retrieval technique using some MPEG-7 low-level descriptors. Especially, pitch information and timbral features can be applied in music genre classification, music retrieval, or QBH(Query By Humming) because these can be modeling the stochasticpattern or timbral information of music signal. In this work, we restricted the music domain as O.S.T of movie or soap opera to apply broadcasting system. That is, the user can retrievalthe information of the unknown music using only an audio clip with a few seconds extracted from video content when background music sound greeted user's ear. We proposed the audio feature set organized by MPEG-7 descriptors and distance function by vector distance or ratio computation. Thus, we observed that the feature set organized by pitch information is superior to timbral spectral feature set and IFCR(Intra-Feature Component Ratio) is better than ED(Euclidean Distance) as a vector distance function. To evaluate music recognition, k-NN is used as a classifier

Multi-Modal Based Malware Similarity Estimation Method (멀티모달 기반 악성코드 유사도 계산 기법)

  • Yoo, Jeong Do;Kim, Taekyu;Kim, In-sung;Kim, Huy Kang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.2
    • /
    • pp.347-363
    • /
    • 2019
  • Malware has its own unique behavior characteristics, like DNA for living things. To respond APT (Advanced Persistent Threat) attacks in advance, it needs to extract behavioral characteristics from malware. To this end, it needs to do classification for each malware based on its behavioral similarity. In this paper, various similarity of Windows malware is estimated; and based on these similarity values, malware's family is predicted. The similarity measures used in this paper are as follows: 'TF-IDF cosine similarity', 'Nilsimsa similarity', 'malware function cosine similarity' and 'Jaccard similarity'. As a result, we find the prediction rate for each similarity measure is widely different. Although, there is no similarity measure which can be applied to malware classification with high accuracy, this result can be helpful to select a similarity measure to classify specific malware family.