• 제목/요약/키워드: Voice Interaction

검색결과 157건 처리시간 0.024초

디지털 소외계층을 위한 지능형 IoT 애플리케이션의 공개 API 기반 대화형 음성 상호작용 기법 (Open API-based Conversational Voice Interaction Scheme for Intelligent IoT Applications for the Digital Underprivileged)

  • 장준혁
    • 스마트미디어저널
    • /
    • 제11권10호
    • /
    • pp.22-29
    • /
    • 2022
  • 음성 상호작용은 스마트 기기의 활용에 능숙하지 못한 디지털 소외계층을 대상으로 하는 애플리케이션에서 특히 효과적이다. 그러나 공개 API를 기반으로 한 애플리케이션들은 기존의 터치스크린 중심의 UI와 제공되는 API의 한계로 인해 음성 신호를 짧고 단편적인 입출력에만 활용하고 있다. 본 논문에서는 사용자와 지능형 모바일/IoT 애플리케이션의 대화형 음성 상호작용 모델을 설계하고, 편집 거리(Levenshtein distance) 기반 키워드 탐지 기법을 제안한다. 제안 모델 및 기법은 안드로이드 환경에서 구현되었으며, 편집 거리 기반 키워드 탐지 기법은 음성인식을 통해 부정확하게 인식된 키워드에 대해 기존 기법보다 높은 인식률을 보였다.

Design for Proximity Voice Chat System in Multimedia Environments

  • Jae-Woo Chang;Jin-Woong Kim;Soo Kyun Kim
    • 한국컴퓨터정보학회논문지
    • /
    • 제29권3호
    • /
    • pp.83-90
    • /
    • 2024
  • 본 연구에서는 멀티미디어 환경에서 상호작용 시스템 중 하나인, 음성 대화 기술에 대하여 근접 음성 대화 시스템을 적용하는 솔루션을 제안한다. 사용자 아바타들 간 거리에 따라 음성의 볼륨을 조절하고, 가청 거리를 벗어난 사용자에게는 음소거를 적용하는 방식으로 멀티미디어 공간에서 여러 사용자 간의 음성 대화 방식을 설계하였다. 본 연구의 가장 큰 특징은 경제적인 개발을 위해, 거리를 기반으로 먼 거리에 있는 사용자에게는 저음질의 음성을 전달하고, 비 가청 지역에 들어선 사용자에게는 음성 데이터를 전송하지 않게 하는, reliable UDP 기반 능동적 서버 시스템에 있다. 제안 시스템은 사전에 완성하였던 유니티 게임 엔진 기반 프로젝트에서 성능을 측정하였으며, 본 연구에서 제안한 시스템을 메타버스 콘텐츠, 실시간 대전 액션 게임과 같이 여러 사용자 간 상호작용을 제공하는 환경에서 적극적으로 이용되는 것을 기대할 수 있다.

음성, 성문 및 호흡 통합 검사 시스템의 개발 (Development of an Integrated Analysis System of Voice, Electroglottography, and Respiration)

  • 이승훈;정원혁;최홍식;김수찬;임재중;김덕원
    • 음성과학
    • /
    • 제12권4호
    • /
    • pp.77-92
    • /
    • 2005
  • Voice is made by systemic interaction of respiration, vocalization, articulation and resonation. There is no existing multi-channel voice analysis system to assess voice and respiration simultaneously. The most existing systems consist of. vocal fold vibration measurement such as stroboscopy, EGG (electroglottography) or laryngeal electromyography and voice analysis system. Since respiration has close relationship with voice simultaneous analysis of both vocal cord vibration and respiration are essential. In this study, a four channel integrated system are developed for acoustic analysis through microphone, vocal fold vibratory analysis using EGG, and respiratory analysis using two channel RIP (respiratory inductive plethysmography).

  • PDF

제품 트리거로서 행동인식의 사용자 경험 디자인 연구 - 제품디자인을 중심으로 - (Study on User Experience design in Gesture Interaction as a Product Trigger - Focusing on Product Design -)

  • 민새얀;이캐시연주
    • 디지털융복합연구
    • /
    • 제17권5호
    • /
    • pp.379-384
    • /
    • 2019
  • 본 연구는 최근 급증하는 음성 인터랙션의 기능적 면모에서 사용자가 우선으로 측정하는 경험과 문제점을 파악하고 새롭게 나타날 행동인식 인터랙션의 문법에 적합한지 그 발전 가능성을 탐구하는 데 목적이 있다. 연구방법으로 문헌 연구를 통해 그동안 제품에 사용되던 제품 인터페이스의 이론적 고찰 과정을 거친 후 음성인식을 제품의 트리거로서 사용해 본 20-30대 사용자를 대상으로 심층 인터뷰를 진행하여 사용자 경험 측면에서 이용 경험과 개선 방안에 대해 정리하였다. 그 결과, 정확성 신뢰도 하락으로 인해 음성인식 인터랙션의 선호도가 감소하고 있다는 점과 물리적 거리 배제성이라는 기능적 측면에 알맞은 인터페이스가 필요하다는 결론을 도출해 낼 수 있었다. 이 연구는 제품 트리거 인터페이스에 관한 연구로, 문제를 발견하고 이에 대해 개선 방안을 제시했다는 점에 의의가 있다. 하지만 구체적인 방안을 설계하지 못했다는 데에 한계가 있다. 이 연구를 기점으로 음성인식 인터랙션의 개선 방안을 보완하고, 행동인식 인터랙션과 관련된 후속연구가 이루어져 제품디자인 인터페이스 개선에 도움이 되기를 바란다.

Human-Robot Interaction in Real Environments by Audio-Visual Integration

  • Kim, Hyun-Don;Choi, Jong-Suk;Kim, Mun-Sang
    • International Journal of Control, Automation, and Systems
    • /
    • 제5권1호
    • /
    • pp.61-69
    • /
    • 2007
  • In this paper, we developed not only a reliable sound localization system including a VAD(Voice Activity Detection) component using three microphones but also a face tracking system using a vision camera. Moreover, we proposed a way to integrate three systems in the human-robot interaction to compensate errors in the localization of a speaker and to reject unnecessary speech or noise signals entering from undesired directions effectively. For the purpose of verifying our system's performances, we installed the proposed audio-visual system in a prototype robot, called IROBAA(Intelligent ROBot for Active Audition), and demonstrated how to integrate the audio-visual system.

Acoustic Analyses of Vocal Vibrato of Korean Singers

  • Yoo, Jae-Yeon;Jeong, Ok-Ran;Kwon, Do-Ha
    • 음성과학
    • /
    • 제12권1호
    • /
    • pp.37-43
    • /
    • 2005
  • The phenomenon of vocal vibrato may be regarded as an acoustic representation of one of the most rapid and continuous changes in pitch and intensity that the human vocal mechanism is capable of producing. Singers are likely to use vibrato effectively to enrich their voice. The purpose of this study was to obtain acoustic measurements (vF0 and vAm) of 45 subjects (15 trot and 15 ballad singers and 15 non-singers) and to compare acoustic measurements of the vowel /a/ produced by 3 groups on 2 voice sampling conditions (prolongation and singing of /a/). Thirty singers of trot and ballad were selected by a producer and a concert director working for the KBS (Korean Broadcasting System). The MDVP was used to measure the acoustic parameters. A two-way MANOVA was used for statistical analyses. The results were as follows; Firstly, there was no significant difference among the 3 groups in vF0 and vAm in prolongation of /a/, but in singing voice, there was a significant difference among 3 groups in vF0 and vAm. Secondly, there was an interaction between music genre and voice sampling condition in vF0, and vAm. Finally, trot singers sing with more vibrato than ballad singers. It was concluded that it is very important to analyze singers' voice including various voice conditions (prolongation, reading, conversation, and singing) and to identify differences of singing voice characteristics among music genre.

  • PDF

갑상선 수술 범위와 공기역학적 음성 지표 변화 (Aerodynamic Evaluation of Voice Changes in Thyroid Surgery Extent)

  • 정희석;김중선;이창윤;손희영
    • 대한후두음성언어의학회지
    • /
    • 제29권1호
    • /
    • pp.24-29
    • /
    • 2018
  • Background and Objectives : The purpose of this study was to evaluate the impact of surgical extent on voice using acoustic and aerodynamic measurements in a serially followed thyroidectomy patients. Materials and Method : From October 2015 to January 2017, 108 patients who had undergone thyroid surgery and voice test for preoperative, 2, 3, and 6 months postoperatively were classified into five operative types. The radiological stage preoperatively and histopathological stage postoperatively were classified according to the invasion of thyroid capsule and surrounding tissue. For each classification, the results of the voice analysis according to the period were compared and analyzed. Results : The difference of voice according to surgical extent, radiological stage, and histopathologic stage showed significant difference only with Maximal phonation time (MPT) over time. However, in the analysis of interaction between each classification and period, Phonation threshold pressure (PTP) only showed significant results. Conclusion : Differences in imaging and histopathologic stages have no significant effect on recovery of voice symptoms after thyroid surgery. As the extent of operation increases, the pressure to start vocalization is relatively higher, which also varies with time after surgery.

모바일-매니퓰레이터 구조 로봇시스템의 안정한 모션제어에 관한연구 (A Study on Stable Motion Control of Mobile-Manipulators Robot System)

  • 박문열;황원준;박인만;강언욱
    • 한국산업융합학회 논문집
    • /
    • 제17권4호
    • /
    • pp.217-226
    • /
    • 2014
  • Since the world has changed to a society of 21st century high-tech industries, the modern people have become reluctant to work in a difficult and dirty environment. Therefore, unmanned technologies through robots are being demanded. Now days, effects such as voice, control, obstacle avoidance are being suggested, and especially, voice recognition technique that enables convenient interaction between human and machines is very important. In this study, in order to conduct study on the stable motion control of the robot system that has mobile-manipulator structure and is voice command-based, kinetic interpretation and dynamic modeling of two-armed manipulator and three-wheel mobile robot were conducted. In addition, autonomous driving of three-wheel mobile robot and motion control system of two-armed manipulator were designed, and combined robot control through voice command was conducted. For the performance experiment method, driving control and simulation mock experiment of manipulator that has two-armed structure was conducted, and for experiment of combined robot motion control which is voice command-based, through driving control, motion control of two-armed manipulator, and combined control based on voice command, experiment on stable motion control of voice command-based robot system that has mobile-manipulator structure was verified.

효과적인 인간-로봇 상호작용을 위한 딥러닝 기반 로봇 비전 자연어 설명문 생성 및 발화 기술 (Robot Vision to Audio Description Based on Deep Learning for Effective Human-Robot Interaction)

  • 박동건;강경민;배진우;한지형
    • 로봇학회논문지
    • /
    • 제14권1호
    • /
    • pp.22-30
    • /
    • 2019
  • For effective human-robot interaction, robots need to understand the current situation context well, but also the robots need to transfer its understanding to the human participant in efficient way. The most convenient way to deliver robot's understanding to the human participant is that the robot expresses its understanding using voice and natural language. Recently, the artificial intelligence for video understanding and natural language process has been developed very rapidly especially based on deep learning. Thus, this paper proposes robot vision to audio description method using deep learning. The applied deep learning model is a pipeline of two deep learning models for generating natural language sentence from robot vision and generating voice from the generated natural language sentence. Also, we conduct the real robot experiment to show the effectiveness of our method in human-robot interaction.

차내 경험의 디지털 트랜스포메이션과 오디오 기반 인터페이스의 동향 및 시사점 (Trends and Implications of Digital Transformation in Vehicle Experience and Audio User Interface)

  • 김기현;권성근
    • 한국멀티미디어학회논문지
    • /
    • 제25권2호
    • /
    • pp.166-175
    • /
    • 2022
  • Digital transformation is driving so many changes in daily life and industry. The automobile industry is in a similar situation. In some cases, element techniques in areas called metabuses are also being adopted, such as 3D animated digital cockpit, around view, and voice AI, etc. Through the growth of the mobile market, the norm of human-computer interaction (HCI) has been evolving from keyboard-mouse interaction to touch screen. The core area was the graphical user interface (GUI), and recently, the audio user interface (AUI) has partially replaced the GUI. Since it is easy to access and intuitive to the user, it is quickly becoming a common area of the in-vehicle experience (IVE), especially. The benefits of a AUI are freeing the driver's eyes and hands, using fewer screens, lower interaction costs, more emotional and personal, effective for people with low vision. Nevertheless, when and where to apply a GUI or AUI are actually different approaches because some information is easier to process as we see it. In other cases, there is potential that AUI is more suitable. This is a study on a proposal to actively apply a AUI in the near future based on the context of various scenes occurring to improve IVE.