• Title/Summary/Keyword: 감정 음성

Search Result 235, Processing Time 0.031 seconds

Design and Implementation of an Emotion Recognition System using Physiological Signal (생체신호를 이용한 감정인지시스템의 설계 및 구현)

  • O, Ji-Soo;Kang, Jeong-Jin;Lim, Myung-Jae;Lee, Ki-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.1
    • /
    • pp.57-62
    • /
    • 2010
  • Recently in the mobile market, the communication technology which bases on the sense of sight, sound, and touch has been developed. However, human beings uses all five - vision, auditory, palatory, olfactory, and tactile - senses to communicate. Therefore, the current paper presents a technology which enables individuals to be aware of other people's emotions through a machinery device. This is achieved by the machine perceiving the tone of the voice, body temperature, pulse, and other biometric signals to recognize the emotion the dispatching individual is experiencing. Once the emotion is recognized, a scent is emitted to the receiving individual. A system which coordinates the emission of scent according to emotional changes is proposed.

VRmeeting : Distributed Virtual Environment Supporting Real Time Video Chatting on WWW (VRmeeting: 웹상에서 실시간 화상 대화 지원 분산 가상 환경)

  • Jung, Heon-Man;Tak, Jin-Hyun;Lee, Sei-Hoon;Wang, Chang-Jong
    • Annual Conference of KIPS
    • /
    • 2000.10a
    • /
    • pp.715-718
    • /
    • 2000
  • 다중 사용자 분산 가상환경 시스템에서는 참여자들 사이의 의사 교환을 위해 텍스트 중심의 채팅과 TTS 등을 지원하고 언어 외적인 의사교환을 지원하기 위해 참여자의 대리자인 아바타에 몸짓이나 얼굴 표정 및 감정등을 표현할 수 있도록 애니메이션 기능을 추가하여 사용한다. 하지만 아바타 애니메이션으로 참여자의 의사 및 감정 표현을 표현하는 데는 한계가 있기 때문에 자유로운 만남 및 대화를 지원할 수 있는 환경이 필요하다. 따라서 이러한 문제를 해결하기 위해서는 참여자의 얼굴과 음성을 가상 공간상에 포함시킴으로써 보다 분명하고 사실적인 의사교환과 감정표현이 가능할 것이다. 이 논문에서는 컴퓨터 네트워크를 통해 형성되는 다중 사용자 가상 환경에서 참여자들의 의사 교환 및 감정 표현을 극대화하고 자유로운 만남과 대화를 제공하는 실시간 화상 대화가 가능한 분산 가상 환경 시스템을 설계하였다. 설계한 시스템은 참여자들의 거리와 주시 방향에 따라 이벤트의 양을 동적으로 제어함으로써 시스템의 부하를 최적화할 수 있는 구조를 갖고 있다.

  • PDF

Analysis of Voice Quality Features and Their Contribution to Emotion Recognition (음성감정인식에서 음색 특성 및 영향 분석)

  • Lee, Jung-In;Choi, Jeung-Yoon;Kang, Hong-Goo
    • Journal of Broadcast Engineering
    • /
    • v.18 no.5
    • /
    • pp.771-774
    • /
    • 2013
  • This study investigates the relationship between voice quality measurements and emotional states, in addition to conventional prosodic and cepstral features. Open quotient, harmonics-to-noise ratio, spectral tilt, spectral sharpness, and band energy were analyzed as voice quality features, and prosodic features related to fundamental frequency and energy are also examined. ANOVA tests and Sequential Forward Selection are used to evaluate significance and verify performance. Classification experiments show that using the proposed features increases overall accuracy, and in particular, errors between happy and angry decrease. Results also show that adding voice quality features to conventional cepstral features leads to increase in performance.

Hi, KIA! Classifying Emotional States from Wake-up Words Using Machine Learning (Hi, KIA! 기계 학습을 이용한 기동어 기반 감성 분류)

  • Kim, Taesu;Kim, Yeongwoo;Kim, Keunhyeong;Kim, Chul Min;Jun, Hyung Seok;Suk, Hyeon-Jeong
    • Science of Emotion and Sensibility
    • /
    • v.24 no.1
    • /
    • pp.91-104
    • /
    • 2021
  • This study explored users' emotional states identified from the wake-up words -"Hi, KIA!"- using a machine learning algorithm considering the user interface of passenger cars' voice. We targeted four emotional states, namely, excited, angry, desperate, and neutral, and created a total of 12 emotional scenarios in the context of car driving. Nine college students participated and recorded sentences as guided in the visualized scenario. The wake-up words were extracted from whole sentences, resulting in two data sets. We used the soundgen package and svmRadial method of caret package in open source-based R code to collect acoustic features of the recorded voices and performed machine learning-based analysis to determine the predictability of the modeled algorithm. We compared the accuracy of wake-up words (60.19%: 22%~81%) with that of whole sentences (41.51%) for all nine participants in relation to the four emotional categories. Accuracy and sensitivity performance of individual differences were noticeable, while the selected features were relatively constant. This study provides empirical evidence regarding the potential application of the wake-up words in the practice of emotion-driven user experience in communication between users and the artificial intelligence system.

Multi-Modal Emotion Recognition in Videos Based on Pre-Trained Models (사전학습 모델 기반 발화 동영상 멀티 모달 감정 인식)

  • Eun Hee Kim;Ju Hyun Shin
    • Smart Media Journal
    • /
    • v.13 no.10
    • /
    • pp.19-27
    • /
    • 2024
  • Recently, as the demand for non-face-to-face counseling has rapidly increased, the need for emotion recognition technology that combines various aspects such as text, voice, and facial expressions is being emphasized. In this paper, we address issues such as the dominance of non-Korean data and the imbalance of emotion labels in existing datasets like FER-2013, CK+, and AFEW by using Korean video data. We propose methods to enhance multimodal emotion recognition performance in videos by integrating the strengths of image modality with text modality. A pre-trained model is used to overcome the limitations caused by small training data. A GPT-4-based LLM model is applied to text, and a pre-trained model based on VGG-19 architecture is fine-tuned to facial expression images. The method of extracting representative emotions by combining the emotional results of each aspect extracted using a pre-trained model is as follows. Emotion information extracted from text was combined with facial expression changes in a video. If there was a sentiment mismatch between the text and the image, we applied a threshold that prioritized the text-based sentiment if it was deemed trustworthy. Additionally, as a result of adjusting representative emotions using emotion distribution information for each frame, performance was improved by 19% based on F1-Score compared to the existing method that used average emotion values for each frame.

The effect of musical application to develop the emotional expression of mentally retarded adults (성인정신지체인의 감정 표현 향상을 위한 음악 활용의 효과)

  • Jin, Sun Ju
    • Journal of Music and Human Behavior
    • /
    • v.2 no.1
    • /
    • pp.17-33
    • /
    • 2005
  • Music has a vital meaning in peoples' lives, mostly as a communication medium for thoughts and feelings. Because music is nonthreatening and nonjudgmental, its viability works for everyone. The purpose of this research is firstly to compare the effectiveness of existing social rehabilitation program and music integrated social rehabilitation program for people with mental retardation. Secondly, the study purported to find out if the music integrated social rehabilitation program was effective, then how various musical activities can assist communication and expression, and further assist social interactions among the people with mental retardation. The data were collected using Emotions Assessment Tool, Social Skills Assessment, and Skills development in Music were used. Also, verbal contents, voices, gestures, nonverbal expressions were observed and analyzed. As the result, it is shown that the music integrated social rehabilitation program has enhanced communicative and expressive skills of adults with mental retardation, and further improved social interactive skills. This implies that music had positive effects to the mental patients on their social relationship activities, than the program without music integration. The results support previous findings that music can be an effective communicative and expression tool.

  • PDF

Artificial Intelligence for Assistance of Facial Expression Practice Using Emotion Classification (감정 분류를 이용한 표정 연습 보조 인공지능)

  • Dong-Kyu, Kim;So Hwa, Lee;Jae Hwan, Bong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.6
    • /
    • pp.1137-1144
    • /
    • 2022
  • In this study, an artificial intelligence(AI) was developed to help with facial expression practice in order to express emotions. The developed AI used multimodal inputs consisting of sentences and facial images for deep neural networks (DNNs). The DNNs calculated similarities between the emotions predicted by the sentences and the emotions predicted by facial images. The user practiced facial expressions based on the situation given by sentences, and the AI provided the user with numerical feedback based on the similarity between the emotion predicted by sentence and the emotion predicted by facial expression. ResNet34 structure was trained on FER2013 public data to predict emotions from facial images. To predict emotions in sentences, KoBERT model was trained in transfer learning manner using the conversational speech dataset for emotion classification opened to the public by AIHub. The DNN that predicts emotions from the facial images demonstrated 65% accuracy, which is comparable to human emotional classification ability. The DNN that predicts emotions from the sentences achieved 90% accuracy. The performance of the developed AI was evaluated through experiments with changing facial expressions in which an ordinary person was participated.

Speech Emotion Recognition on a Simulated Intelligent Robot (모의 지능로봇에서의 음성 감정인식)

  • Jang Kwang-Dong;Kim Nam;Kwon Oh-Wook
    • MALSORI
    • /
    • no.56
    • /
    • pp.173-183
    • /
    • 2005
  • We propose a speech emotion recognition method for affective human-robot interface. In the Proposed method, emotion is classified into 6 classes: Angry, bored, happy, neutral, sad and surprised. Features for an input utterance are extracted from statistics of phonetic and prosodic information. Phonetic information includes log energy, shimmer, formant frequencies, and Teager energy; Prosodic information includes Pitch, jitter, duration, and rate of speech. Finally a pattern classifier based on Gaussian support vector machines decides the emotion class of the utterance. We record speech commands and dialogs uttered at 2m away from microphones in 5 different directions. Experimental results show that the proposed method yields $48\%$ classification accuracy while human classifiers give $71\%$ accuracy.

  • PDF

Feature Vector Processing for Speech Emotion Recognition in Noisy Environments (잡음 환경에서의 음성 감정 인식을 위한 특징 벡터 처리)

  • Park, Jeong-Sik;Oh, Yung-Hwan
    • Phonetics and Speech Sciences
    • /
    • v.2 no.1
    • /
    • pp.77-85
    • /
    • 2010
  • This paper proposes an efficient feature vector processing technique to guard the Speech Emotion Recognition (SER) system against a variety of noises. In the proposed approach, emotional feature vectors are extracted from speech processed by comb filtering. Then, these extracts are used in a robust model construction based on feature vector classification. We modify conventional comb filtering by using speech presence probability to minimize drawbacks due to incorrect pitch estimation under background noise conditions. The modified comb filtering can correctly enhance the harmonics, which is an important factor used in SER. Feature vector classification technique categorizes feature vectors into either discriminative vectors or non-discriminative vectors based on a log-likelihood criterion. This method can successfully select the discriminative vectors while preserving correct emotional characteristics. Thus, robust emotion models can be constructed by only using such discriminative vectors. On SER experiment using an emotional speech corpus contaminated by various noises, our approach exhibited superior performance to the baseline system.

  • PDF

Extraction of Speech Features for Emotion Recognition (감정 인식을 위한 음성 특징 도출)

  • Kwon, Chul-Hong;Song, Seung-Kyu;Kim, Jong-Yeol;Kim, Keun-Ho;Jang, Jun-Su
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.73-78
    • /
    • 2012
  • Emotion recognition is an important technology in the filed of human-machine interface. To apply speech technology to emotion recognition, this study aims to establish a relationship between emotional groups and their corresponding voice characteristics by investigating various speech features. The speech features related to speech source and vocal tract filter are included. Experimental results show that statistically significant speech parameters for classifying the emotional groups are mainly related to speech sources such as jitter, shimmer, F0 (F0_min, F0_max, F0_mean, F0_std), harmonic parameters (H1, H2, HNR05, HNR15, HNR25, HNR35), and SPI.