• Title/Summary/Keyword: Speech Interface

Search Result 249, Processing Time 0.03 seconds

Implementation of speech interface for windows 95 (Windows95 환경에서의 음성 인터페이스 구현)

  • 한영원;배건성
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.34S no.5
    • /
    • pp.86-93
    • /
    • 1997
  • With recent development of speech recognition technology and multimedia computer systems, more potential applications of voice will become a reality. In this paper, we implement speech interface on the windows95 environment for practical use fo multimedia computers with voice. Speech interface is made up of three modules, that is, speech input and detection module, speech recognition module, and application module. The speech input and etection module handles th elow-level audio service of win32 API to input speech data on real time. The recognition module processes the incoming speech data, and then recognizes the spoken command. DTW pattern matching method is used for speech recognition. The application module executes the voice command properly on PC. Each module of the speech interface is designed and examined on windows95 environments. Implemented speech interface and experimental results are explained and discussed.

  • PDF

Robust Speech Recognition in the Car Interior Environment having Car Noise and Audio Output (자동차 잡음 및 오디오 출력신호가 존재하는 자동차 실내 환경에서의 강인한 음성인식)

  • Park, Chul-Ho;Bae, Jae-Chul;Bae, Keun-Sung
    • MALSORI
    • /
    • no.62
    • /
    • pp.85-96
    • /
    • 2007
  • In this paper, we carried out recognition experiments for noisy speech having various levels of car noise and output of an audio system using the speech interface. The speech interface consists of three parts: pre-processing, acoustic echo canceller, post-processing. First, a high pass filter is employed as a pre-processing part to remove some engine noises. Then, an echo canceller implemented by using an FIR-type filter with an NLMS adaptive algorithm is used to remove the music or speech coming from the audio system in a car. As a last part, the MMSE-STSA based speech enhancement method is applied to the out of the echo canceller to remove the residual noise further. For recognition experiments, we generated test signals by adding music to the car noisy speech from Aurora 2 database. The HTK-based continuous HMM system is constructed for a recognition system. Experimental results show that the proposed speech interface is very promising for robust speech recognition in a noisy car environment.

  • PDF

Usability Test Guidelines for Speech-Oriented Multimodal User Interface (음성기반 멀티모달 사용자 인터페이스의 사용성 평가 방법론)

  • Hong, Ki-Hyung
    • MALSORI
    • /
    • no.67
    • /
    • pp.103-120
    • /
    • 2008
  • Basic components for multimodal interface, such as speech recognition, speech synthesis, gesture recognition, and multimodal fusion, have their own technological limitations. For example, the accuracy of speech recognition decreases for large vocabulary and in noisy environments. In spite of those technological limitations, there are lots of applications in which speech-oriented multimodal user interfaces are very helpful to users. However, in order to expand application areas for speech-oriented multimodal interfaces, we have to develop the interfaces focused on usability. In this paper, we introduce usability and user-centered design methodology in general. There has been much work for evaluating spoken dialogue systems. We give a summary for PARADISE (PARAdigm for Dialogue System Evaluation) and PROMISE (PROcedure for Multimodal Interactive System Evaluation) that are the generalized evaluation frameworks for voice and multimodal user interfaces. Then, we present usability components for speech-oriented multimodal user interfaces and usability testing guidelines that can be used in a user-centered multimodal interface design process.

  • PDF

Speech Based Multimodal Interface Technologies and Standards (음성기반 멀티모달 인터페이스 및 표준)

  • Hong Ki-Hyung
    • MALSORI
    • /
    • no.51
    • /
    • pp.117-135
    • /
    • 2004
  • In this paper, we introduce the multimodal user interface technology, especially based on speech. We classify multimodal interface technologies into four classes: sequential, alternate, supplementary, and semantic multimodal interfaces. After introducing four types of multimodal interfaces, we explain standard activities currently being activated.

  • PDF

Usability Improvement for the Speech Interface of Mobile Phones While Driving (운전 상황에서 휴대폰 음성인터페이스의 사용성 향상에 관한 연구)

  • Kang, Yun-Hwan;Jeong, Seong-Wook;Jung, Ga-Hun;Choi, Jae-Ho;Jung, Eui-S.
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.35 no.1
    • /
    • pp.109-118
    • /
    • 2009
  • While driving, the manual use of a mobile phone is heavily restricted due to the interference with the primary driving task. An alternative would be the use of speech interface. The current study aims to provide a guideline to implementation of a speech interface to the mobile phone. To do so, an expert evaluation was made and it revealed that a speech interface requires less workload, less performance degradation of the driving task than that of the keypad interface. To make speech interfaces more usable, new improvements are suggested. Subjective workload can be reduced and user satisfaction can be improved without degrading the primary task performance, for instance, by letting the user interrupt the speech of the phone, eliminating the repetitive words, letting the user know clearly what makes an error, providing a way to go back to the previous state, reducing the usage of keypad buttons and reducing the amount of the information on the screen.

A Usability Evaluation Method for Speech Recognition Interfaces (음성인식용 인터페이스의 사용편의성 평가 방법론)

  • Han, Seong-Ho;Kim, Beom-Su
    • Journal of the Ergonomics Society of Korea
    • /
    • v.18 no.3
    • /
    • pp.105-125
    • /
    • 1999
  • As speech is the human being's most natural communication medium, using it gives many advantages. Currently, most user interfaces of a computer are using a mouse/keyboard type but the interface using speech recognition is expected to replace them or at least be used as a tool for supporting it. Despite the advantages, the speech recognition interface is not that popular because of technical difficulties such as recognition accuracy and slow response time to name a few. Nevertheless, it is important to optimize the human-computer system performance by improving the usability. This paper presents a set of guidelines for designing speech recognition interfaces and provides a method for evaluating the usability. A total of 113 guidelines are suggested to improve the usability of speech-recognition interfaces. The evaluation method consists of four major procedures: user interface evaluation; function evaluation; vocabulary estimation; and recognition speed/accuracy evaluation. Each procedure is described along with proper techniques for efficient evaluation.

  • PDF

Speech Recognition Interface in the Communication Environment (통신환경에서 음성인식 인터페이스)

  • Han, Tai-Kun;Kim, Jong-Keun;Lee, Dong-Wook
    • Proceedings of the KIEE Conference
    • /
    • 2001.07d
    • /
    • pp.2610-2612
    • /
    • 2001
  • This study examines the recognition of the user's sound command based on speech recognition and natural language processing, and develops the natural language interface agent which can analyze the recognized command. The natural language interface agent consists of speech recognizer and semantic interpreter. Speech recognizer understands speech command and transforms the command into character strings. Semantic interpreter analyzes the character strings and creates the commands and questions to be transferred into the application program. We also consider the problems, related to the speech recognizer and the semantic interpreter, such as the ambiguity of natural language and the ambiguity and the errors from speech recognizer. This kind of natural language interface agent can be applied to the telephony environment involving all kind of communication media such as telephone, fax, e-mail, and so on.

  • PDF

The Status and Research Themes of Speech based Multimodal Interface Technology (음성기반 멀티모달 인터페이스 기술 현황 및 과제)

  • Lee ChiGeun;Lee EunSuk;Lee HaeJung;Kim BongWan;Joung SukTae;Jung SungTae;Lee YongJoo;Han MoonSung
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.111-114
    • /
    • 2002
  • Complementary use of several modalities in human-to-human communication ensures high accuracy, and only few communication problem occur. Therefore, multimodal interface is considered as the next generation interface between human and computer. This paper presents the current status and research themes of speech-based multimodal interface technology, It first introduces about the concept of multimodal interface. It surveys the recognition technologies of input modalities and synthesis technologies of output modalities. After that it surveys integration technology of modality. Finally, it presents research themes of speech-based multimodal interface technology.

  • PDF

Distant-talking of Speech Interface for Humanoid Robots (휴머노이드 로봇을 위한 원거리 음성 인터페이스 기술 연구)

  • Lee, Hyub-Woo;Yook, Dong-Suk
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.39-40
    • /
    • 2007
  • For efficient interaction between human and robots, speech interface is a core problem especially in noisy and reverberant conditions. This paper analyzes main issues of spoken language interface for humanoid robots, such as sound source localization, voice activity detection, and speaker recognition.

  • PDF

Japanese Speech Based Fuzzy Man-Machine Interface of Manipulators

  • Izumi, Kiyotaka;Watanabe, Keigo;Tamano, Yuya;Kiguchi, Kazuo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.603-608
    • /
    • 2003
  • Recently, personal robots and home robots are developing by many companies and research groups. It is considered that a general effective interface for user of those robots is speech or voice. In this paper, Japanese speech based man-machine interface system is discussed for reflecting the fuzziness of natural language on robots, by using fuzzy reasoning. The present system consists of the derivation part of action command and the modification part of the derived command. In particular, a unique problem of Japanese is solved by applying the morphological analyzer ChaSen. The proposed system is applied for the motion control of a robot manipulator. It is proved from the experimental results that the proposed system can easily modify the same voice command to the actual different levels of the command, according to the current state of the robot.

  • PDF