• Title/Summary/Keyword: human audio voice

Search Result 15, Processing Time 0.018 seconds

Crossmodal Perception of Mismatched Emotional Expressions by Embodied Agents (에이전트의 표정과 목소리 정서의 교차양상지각)

  • Cho, Yu-Suk;Suk, Ji-He;Han, Kwang-Hee
    • Science of Emotion and Sensibility
    • /
    • v.12 no.3
    • /
    • pp.267-278
    • /
    • 2009
  • Today an embodied agent generates a large amount of interest because of its vital role for human-human interactions and human-computer interactions in virtual world. A number of researchers have found that we can recognize and distinguish between various emotions expressed by an embodied agent. In addition many studies found that we respond to simulated emotions in a similar way to human emotion. This study investigates interpretation of mismatched emotions expressed by an embodied agent (e.g. a happy face with a sad voice); whether audio-visual channel integration occurs or one channel dominates when participants judge the emotion. The study employed a 4 (visual: happy, sad, warm, cold) $\times$ 4 (audio: happy, sad, warm, cold) within-subjects repeated measure design. The results suggest that people perceive emotions not depending on just one channel but depending on both channels. Additionally facial expression (happy face vs. sad face) makes a difference in influence of two channels; Audio channel has more influence in interpretation of emotions when facial expression is happy. People were able to feel other emotion which was not expressed by face or voice from mismatched emotional expressions, so there is a possibility that we may express various and delicate emotions with embodied agent by using only several kinds of emotions.

  • PDF

Proposal of Hostile Command Attack Method Using Audible Frequency Band for Smart Speaker (스마트 스피커 대상 가청 주파수 대역을 활용한 적대적 명령어 공격 방법 제안)

  • Park, Tae-jun;Moon, Jongsub
    • Journal of Internet Computing and Services
    • /
    • v.23 no.4
    • /
    • pp.1-9
    • /
    • 2022
  • Recently, the functions of smart speakers have diversified, and the penetration rate of smart speakers is increasing. As it becomes more widespread, various techniques have been proposed to cause anomalous behavior against smart speakers. Dolphin Attack, which causes anomalous behavior against the Voice Controllable System (VCS) during various attacks, is a representative method. With this method, a third party controls VCS using ultrasonic band (f>20kHz) without the user's recognition. However, since the method uses the ultrasonic band, it is necessary to install an ultrasonic speaker or an ultrasonic dedicated device which is capable of outputting an ultrasonic signal. In this paper, a smart speaker is controlled by generating an audio signal modulated at a frequency (18 to 20) which is difficult for a person to hear although it is in the human audible frequency band without installing an additional device, that is, an ultrasonic device. As a result with the method proposed in this paper, while humans could not recognize voice commands even in the audible band, it was possible to control the smart speaker with a probability of 82 to 96%.

Voice Driven Sound Sketch for Animation Authoring Tools (애니메이션 저작도구를 위한 음성 기반 음향 스케치)

  • Kwon, Soon-Il
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.4
    • /
    • pp.1-9
    • /
    • 2010
  • Authoring tools for sketching the motion of characters to be animated have been studied. However the natural interface for sound editing has not been sufficiently studied. In this paper, I present a novel method that sound sample is selected by speaking sound-imitation words(onomatopoeia). Experiment with the method based on statistical models, which is generally used for pattern recognition, showed up to 97% in the accuracy of recognition. In addition, to address the difficulty of data collection for newly enrolled sound samples, the GLR Test based on only one sample of each sound-imitation word showed almost the same accuracy as the previous method.

Development of a Mobile Application for Disease Prediction Using Speech Data of Korean Patients with Dysarthria (한국인 구음장애 환자의 발화 데이터 기반 질병 예측을 위한 모바일 애플리케이션 개발)

  • Changjin Ha;Taesik Go
    • Journal of Biomedical Engineering Research
    • /
    • v.45 no.1
    • /
    • pp.1-9
    • /
    • 2024
  • Communication with others plays an important role in human social interaction and information exchange in modern society. However, some individuals have difficulty in communicating due to dysarthria. Therefore, it is necessary to develop effective diagnostic techniques for early treatment of the dysarthria. In the present study, we propose a mobile device-based methodology that enables to automatically classify dysarthria type. The light-weight CNN model was trained by using the open audio dataset of Korean patients with dysarthria. The trained CNN model can successfully classify dysarthria into related subtype disease with 78.8%~96.6% accuracy. In addition, the user-friendly mobile application was also developed based on the trained CNN model. Users can easily record their voices according to the selected inspection type (e.g. word, sentence, paragraph, and semi-free speech) and evaluate the recorded voice data through their mobile device and the developed mobile application. This proposed technique would be helpful for personal management of dysarthria and decision making in clinic.

Implementation of User-friendly Intelligent Space for Ubiquitous Computing (유비쿼터스 컴퓨팅을 위한 사용자 친화적 지능형 공간 구현)

  • Choi, Jong-Moo;Baek, Chang-Woo;Koo, Ja-Kyoung;Choi, Yong-Suk;Cho, Seong-Je
    • The KIPS Transactions:PartD
    • /
    • v.11D no.2
    • /
    • pp.443-452
    • /
    • 2004
  • The paper presents an intelligent space management system for ubiquitous computing. The system is basically a home/office automation system that could control light, electronic key, and home appliances such as TV and audio. On top of these basic capabilities, there are four elegant features in the system. First, we can access the system using either a cellular Phone or using a browser on the PC connected to the Internet, so that we control the system at any time and any place. Second, to provide more human-oriented interface, we integrate voice recognition functionalities into the system. Third, the system supports not only reactive services but also proactive services, based on the regularities of user behavior. Finally, by exploiting embedded technologies, the system could be run on the hardware that has less-processing power and storage. We have implemented the system on the embedded board consisting of StrongARM CPU with 205MHz, 32MB SDRAM, 16MB NOR-type flash memory, and Relay box. Under these hardware platforms, software components such as embedded Linux, HTK voice recognition tools, GoAhead Web Server, and GPIO driver are cooperated to support user-friendly intelligent space.