• Title/Summary/Keyword: Voice-Based Interface

Search Result 130, Processing Time 0.029 seconds

A Study on the Voice Interface for Mobile Environment (모바일기반 음성인터페이스에 관한 연구)

  • Kim, Soo-Hoon;Ahn, Jong-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.1
    • /
    • pp.199-204
    • /
    • 2013
  • Google's android-based voice interface is limited to the web application and the users are rare. In this paper, We suggest the method that can be done using existing android-based voice engine and develope voice application. We also study the environments of android-based voice interface and present the appropriate voice interface in mobile environment.

GMM based Nonlinear Transformation Methods for Voice Conversion

  • Vu, Hoang-Gia;Bae, Jae-Hyun;Oh, Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.67-70
    • /
    • 2005
  • Voice conversion (VC) is a technique for modifying the speech signal of a source speaker so that it sounds as if it is spoken by a target speaker. Most previous VC approaches used a linear transformation function based on GMM to convert the source spectral envelope to the target spectral envelope. In this paper, we propose several nonlinear GMM-based transformation functions in an attempt to deal with the over-smoothing effect of linear transformation. In order to obtain high-quality modifications of speech signals our VC system is implemented using the Harmonic plus Noise Model (HNM)analysis/synthesis framework. Experimental results are reported on the English corpus, MOCHA-TlMlT.

  • PDF

The Effects of Interface Modality on Cognitive Load and Task Performance in Media Multitasking Environment (미디어 멀티태스킹 환경에서 인터페이스의 감각양식 차이가 인지부하와 과업수행에 미치는 영향에 관한 연구 다중 자원 이론과 스레드 인지 모델을 기반으로)

  • Lee, Dana;Han, Kwang-Hee
    • Journal of the HCI Society of Korea
    • /
    • v.14 no.2
    • /
    • pp.31-39
    • /
    • 2019
  • This research examined the changes that fast-growing voice-based devices would bring in the media multitasking environment. Based on the theoretical background that information processing efficiency improves when performing multiple tasks requiring different resource structures at the same time, we conducted an experiment where participants searched for information with voice-based or screen-based devices while performing an additional visual task. Results showed that both task performance environment and interface modality had significant main effects on cognitive load. The overall cognitive load level was higher in the voice interface group, but the difference in cognitive load between the two groups decreased in a multitasking environment where the additional visual resources was required. The visual task performance was significantly higher when using the voice interface than the screen interface. Our findings suggest that voice interfaces offered advantages in the cognitive load and task performance by distributing two tasks to the auditory and visual channels. The results of this study imply that voice-based devices have the potential to facilitate efficient information processing in the screen-centric environment where visual resources collide. We provided theoretical evidence of resource distribution using multiple resource theory and tried to identify the advantages of the voice interface more specifically based on the threaded cognition model.

Research on Emotional Factors and Voice Trend by Country to be considered in Designing AI's Voice - An analysis of interview with experts in Finland and Norway (AI의 음성 디자인에서 고려해야 할 감성적 요소 및 국가별 음성 트랜드에 관한 연구 - 핀란드와 노르웨이의 전문가 인뎁스 인터뷰를 중심으로)

  • Namkung, Kiechan
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.9
    • /
    • pp.91-97
    • /
    • 2020
  • Use of voice-based interfaces that can interact with users is increasing as AI technology develops. To date, however, most of the research on voice-based interfaces has been technical in nature, focused on areas such as improving the accuracy of speech recognition. Thus, the voice of most voice-based interfaces is uniform and does not provide users with differentiated sensibilities. The purpose of this study is to add a emotional factor suitable for the AI interface. To this end, we have derived emotional factors that should be considered in designing voice interface. In addition, we looked at voice trends that differed from country to country. For this study, we conducted interviews with voice industry experts from Finland and Norway, countries that use their own independent languages.

Design and Implementation of the English Education Testing System Interface Based on VoiceXML (VoiceXML 기반 영어 교육 평가 시스템 설계 및 구현)

  • Jang, Seung Ju
    • The Journal of Korean Association of Computer Education
    • /
    • v.8 no.6
    • /
    • pp.75-83
    • /
    • 2005
  • In this paper we studied English listening and speaking test part of foreign language using web and VoiceXML-based education testing system, which is irrespective of time and space. The testing system interface based on VoiceXML consists of user registration module, testing module, and testing result module. User registration module registers user's name and ID, password in user database, and when a tester calls for testing, the User listens to the telephone sound supported by vxml scenario. After that, if a tester logs in, the tester is verified, In the VoiceXML-based education testing system, the manager can reduce time and effort for gaining testing result. The tester listens to the voice by scenario supported by VoiceXML markup language using wire/wireless telephone at any time or anywhere and can improve the effect of foreign language studying by valuating in voice directly. verified. In the VoiceXML-based education testing system, the manager can reduce time and effort for gaining testing result. The tester listens to the voice by scenario supported by VoiceXML markup language using wire/wireless telephone at any time or anywhere and can improve the effect of foreign language studying by valuating in voice directly.

  • PDF

Communication Support System for ALS Patient Based on Text Input Interface Using Eye Tracking and Deep Learning Based Sound Synthesi (눈동자 추적 기반 입력 및 딥러닝 기반 음성 합성을 적용한 루게릭 환자 의사소통 지원 시스템)

  • Park Hyunjoo;Jeong Seungdo
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.20 no.2
    • /
    • pp.27-36
    • /
    • 2024
  • Accidents or disease can lead to acquired voice dysphonia. In this case, we propose a new input interface based on eye movements to facilitate communication for patients. Unlike the existing method that presents the English alphabet as it is, we reorganized the layout of the alphabet to support the Korean alphabet and designed it so that patients can enter words by themselves using only eye movements, gaze, and blinking. The proposed interface not only reduces fatigue by minimizing eye movements, but also allows for easy and quick input through an intuitive arrangement. For natural communication, we also implemented a system that allows patients who are unable to speak to communicate with their own voice. The system works by tracking eye movements to record what the patient is trying to say, then using Glow-TTS and Multi-band MelGAN to reconstruct their own voice using the learned voice to output sound.

The Interactive Voice Services based on VoiceXML (VoiceXML 기반 음성인식시스템을 이용한 서비스 개발)

  • Kim Hak-Gyoon;Kim Eun-Hyang;Kim Jae-In;Koo Myoung-Wan
    • MALSORI
    • /
    • no.43
    • /
    • pp.113-125
    • /
    • 2002
  • As there are needs to search the Web information via wire or wireless telephones, VoiceXML forum was established to develop and promote the Voice eXtensible Markup Language (VoiceXML). VoiceXML simplifies the creation of personalized interactive voice response services on the Web, and allows voice and phone access to information on Web sites, call center databases. Also, it can utilize the Web-based technologies, such as CGI(Common Gateway Interface) scripts. In this paper, we have developed the voice portal service platform based on VoiceXML called TeleGateway. It enables integration of voice services with data services using the Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) engines. Also, we have showed the various services on voice portal services.

  • PDF

Expected Matching Score Based Document Expansion for Fast Spoken Document Retrieval (고속 음성 문서 검색을 위한 Expected Matching Score 기반의 문서 확장 기법)

  • Seo, Min-Koo;Jung, Gue-Jun;Oh, Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2006.11a
    • /
    • pp.71-74
    • /
    • 2006
  • Many works have been done in the field of retrieving audio segments that contain human speeches without captions. To retrieve newly coined words and proper nouns, subwords were commonly used as indexing units in conjunction with query or document expansion. Among them, document expansion with subwords has serious drawback of large computation overhead. Therefore, in this paper, we propose Expected Matching Score based document expansion that effectively reduces computational overhead without much loss in retrieval precisions. Experiments have shown 13.9 times of speed up at the loss of 0.2% in the retrieval precision.

  • PDF

Development of a Work Management System Based on Speech and Speaker Recognition

  • Gaybulayev, Abdulaziz;Yunusov, Jahongir;Kim, Tae-Hyong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.16 no.3
    • /
    • pp.89-97
    • /
    • 2021
  • Voice interface can not only make daily life more convenient through artificial intelligence speakers but also improve the working environment of the factory. This paper presents a voice-assisted work management system that supports both speech and speaker recognition. This system is able to provide machine control and authorized worker authentication by voice at the same time. We applied two speech recognition methods, Google's Speech application programming interface (API) service, and DeepSpeech speech-to-text engine. For worker identification, the SincNet architecture for speaker recognition was adopted. We implemented a prototype of the work management system that provides voice control with 26 commands and identifies 100 workers by voice. Worker identification using our model was almost perfect, and the command recognition accuracy was 97.0% in Google API after post- processing and 92.0% in our DeepSpeech model.

Implementation of a Gateway Protocol between LAN and PABX for Voice Communication (근거리 통신망과 사설교환기의 음성통신을 위한 게이트웨이의 구현)

  • 안용철;신병철
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.7
    • /
    • pp.1346-1363
    • /
    • 1994
  • Packet voice protocols have been realized in many research works. But few studies for the interconnection of LAN and PABX to facilitate the voice communication have been done. In this paper, the gateway to interconnect the Ethernet LAN with the existing PABX telephone network for voice communication has been designed and implemented. The implemented gateway protocol is a modified protocol based on CCITT`s G.764 packetized voice protocol. To accomplish this goal the hardware system has been realized, which is divided into five parts: interface part with the telephone line, voice-processing part, PC interface part, controller part, and finally DTMF part. And the gateway software is divided into three parts: interface to make use of the packet driver which drives the network card, driver to drive the PABX gateway, and the protocol handling part.

  • PDF