• Title/Summary/Keyword: Speech recognition model

Search Result 623, Processing Time 0.026 seconds

Improvement of Gesture Recognition using 2-stage HMM (2단계 히든마코프 모델을 이용한 제스쳐의 성능향상 연구)

  • Jung, Hwon-Jae;Park, Hyeonjun;Kim, Donghan
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.21 no.11
    • /
    • pp.1034-1037
    • /
    • 2015
  • In recent years in the field of robotics, various methods have been developed to create an intimate relationship between people and robots. These methods include speech, vision, and biometrics recognition as well as gesture-based interaction. These recognition technologies are used in various wearable devices, smartphones and other electric devices for convenience. Among these technologies, gesture recognition is the most commonly used and appropriate technology for wearable devices. Gesture recognition can be classified as contact or noncontact gesture recognition. This paper proposes contact gesture recognition with IMU and EMG sensors by using the hidden Markov model (HMM) twice. Several simple behaviors make main gestures through the one-stage HMM. It is equal to the Hidden Markov model process, which is well known for pattern recognition. Additionally, the sequence of the main gestures, which comes from the one-stage HMM, creates some higher-order gestures through the two-stage HMM. In this way, more natural and intelligent gestures can be implemented through simple gestures. This advanced process can play a larger role in gesture recognition-based UX for many wearable and smart devices.

Adaptation of Classification Model for Improving Speech Intelligibility in Noise (음성 명료도 향상을 위한 분류 모델의 잡음 환경 적응)

  • Jung, Junyoung;Kim, Gibak
    • Journal of Broadcast Engineering
    • /
    • v.23 no.4
    • /
    • pp.511-518
    • /
    • 2018
  • This paper deals with improving speech intelligibility by applying binary mask to time-frequency units of speech in noise. The binary mask is set to "0" or "1" according to whether speech is dominant or noise is dominant by comparing signal-to-noise ratio with pre-defined threshold. Bayesian classifier trained with Gaussian mixture model is used to estimate the binary mask of each time-frequency signal. The binary mask based noise suppressor improves speech intelligibility only in noise condition which is included in the training data. In this paper, speaker adaptation techniques for speech recognition are applied to adapt the Gaussian mixture model to a new noise environment. Experiments with noise-corrupted speech are conducted to demonstrate the improvement of speech intelligibility by employing adaption techniques in a new noise environment.

Speaker Identification using Phonetic GMM (음소별 GMM을 이용한 화자식별)

  • Kwon Sukbong;Kim Hoi-Rin
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.185-188
    • /
    • 2003
  • In this paper, we construct phonetic GMM for text-independent speaker identification system. The basic idea is to combine of the advantages of baseline GMM and HMM. GMM is more proper for text-independent speaker identification system. In text-dependent system, HMM do work better. Phonetic GMM represents more sophistgate text-dependent speaker model based on text-independent speaker model. In speaker identification system, phonetic GMM using HMM-based speaker-independent phoneme recognition results in better performance than baseline GMM. In addition to the method, N-best recognition algorithm used to decrease the computation complexity and to be applicable to new speakers.

  • PDF

Implementation of HMM-Based Speech Recognizer Using TMS320C6711 DSP

  • Bae Hyojoon;Jung Sungyun;Bae Keunsung
    • MALSORI
    • /
    • no.52
    • /
    • pp.111-120
    • /
    • 2004
  • This paper focuses on the DSP implementation of an HMM-based speech recognizer that can handle several hundred words of vocabulary size as well as speaker independency. First, we develop an HMM-based speech recognition system on the PC that operates on the frame basis with parallel processing of feature extraction and Viterbi decoding to make the processing delay as small as possible. Many techniques such as linear discriminant analysis, state-based Gaussian selection, and phonetic tied mixture model are employed for reduction of computational burden and memory size. The system is then properly optimized and compiled on the TMS320C6711 DSP for real-time operation. The implemented system uses 486kbytes of memory for data and acoustic models, and 24.5 kbytes for program code. Maximum required time of 29.2 ms for processing a frame of 32 ms of speech validates real-time operation of the implemented system.

  • PDF

Implementation of Real-time Vowel Recognition Mouse based on Smartphone (스마트폰 기반의 실시간 모음 인식 마우스 구현)

  • Jang, Taeung;Kim, Hyeonyong;Kim, Byeongman;Chung, Hae
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.8
    • /
    • pp.531-536
    • /
    • 2015
  • The speech recognition is an active research area in the human computer interface (HCI). The objective of this study is to control digital devices with voices. In addition, the mouse is used as a computer peripheral tool which is widely used and provided in graphical user interface (GUI) computing environments. In this paper, we propose a method of controlling the mouse with the real-time speech recognition function of a smartphone. The processing steps include extracting the core voice signal after receiving a proper length voice input with real time, to perform the quantization by using the learned code book after feature extracting with mel frequency cepstral coefficient (MFCC), and to finally recognize the corresponding vowel using hidden markov model (HMM). In addition a virtual mouse is operated by mapping each vowel to the mouse command. Finally, we show the various mouse operations on the desktop PC display with the implemented smartphone application.

Implementation of Intelligent Speech Recognition System according to CCTV Emergency Information (CCTV 응급상황에 따른 지능형 음성인식 시스템 구현)

  • Cho, Young-Im;Jang, Sung-Soon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.3
    • /
    • pp.415-420
    • /
    • 2009
  • For the emergency detecting in general CCTV environment of our daily life, the monitoring by only images through CCTV information occurs some problems especially in cost as well as man power. Therefore, in this paper, for detecting emergency state dynamically through CCTV as well as resolving some problems, we propose our advanced speech recognition system. For the purpose of it, we adopt HMM(Hidden Markov Model) in our system to do a feature extraction. Also, we adopt Wiener filter technique for noise elimination in many information coming from on CCTV environment. In this paper, our system send only the emergency speech information to a manager to deal with emergency state effectively.

Connected Korean Digit Speech Recognition Using Vowel String and Number of Syllables (음절수와 모음 열을 이용한 한국어 연결 숫자 음성인식)

  • Youn, Jeh-Seon;Hong, Kwang-Seok
    • The KIPS Transactions:PartA
    • /
    • v.10A no.1
    • /
    • pp.1-6
    • /
    • 2003
  • In this paper, we present a new Korean connected digit recognition based on vowel string and number of syllables. There are two steps to reduce digit candidates. The first one is to determine the number and interval of digit. Once the number and interval of digit are determined, the second is to recognize the vowel string in the digit string. The digit candidates according to vowel string are recognized based on CV (consonant vowel), VCCV and VC unit HMM. The proposed method can cope effectively with the coarticulation effects and recognize the connected digit speech very well.

A study on Voice Recognition using Model Adaptation HMM for Mobile Environment (모델적응 HMM을 이용한 모바일환경에서의 음성인식에 관한 연구)

  • Ahn, Jong-Young;Kim, Sang-Bum;Kim, Su-Hoon;Hur, Kang-In
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.11 no.3
    • /
    • pp.175-179
    • /
    • 2011
  • In this paper, we propose the MA(Model Adaption) HMM that to use speech enhancement and feature compensation. Normally voice reference data is not consider for real noise data. This method is not to use estimated noise but we use real life environment noise data. And we applied this contaminated data for recognition reference model that suitable for noise environment. MAHMM is combined with surround noise when generating reference patten. We improved voice recognition rate at mobile environment to use MAHMM.

Emotion Recognition using Prosodic Feature Vector and Gaussian Mixture Model (운율 특성 벡터와 가우시안 혼합 모델을 이용한 감정인식)

  • Kwak, Hyun-Suk;Kim, Soo-Hyun;Kwak, Yoon-Keun
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2002.11b
    • /
    • pp.762-766
    • /
    • 2002
  • This paper describes the emotion recognition algorithm using HMM(Hidden Markov Model) method. The relation between the mechanic system and the human has just been unilateral so far. This is the why people don't want to get familiar with multi-service robots of today. If the function of the emotion recognition is granted to the robot system, the concept of the mechanic part will be changed a lot. Pitch and Energy extracted from the human speech are good and important factors to classify the each emotion (neutral, happy, sad and angry etc.), which are called prosodic features. HMM is the powerful and effective theory among several methods to construct the statistical model with characteristic vector which is made up with the mixture of prosodic features

  • PDF

The Neighborhood Effects in Korean Word Recognition Using Computation Model (계산주의적 모델을 이용한 한국어 시각단어 재인에서 나타나는 이웃효과)

  • Park, Ki-Nam;Kwon, You-An;Lim, Heui-Seok;Nam, Ki-Chun
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.295-297
    • /
    • 2007
  • This study suggests a computational model to inquire the roles of phonological information and orthography information in the process of visual word recognition among the courses of language information processing and the representation types of the mental lexicon. As the result of the study, the computational model showed the phonological and orthographic neighborhood effect among language phenomena which are shown in Korean word recognition, and showed proofs which implies that the mental lexicon is represented as phonological information in the process of Korean word recognition.

  • PDF