• 제목/요약/키워드: Embedded speech recognition

검색결과 55건 처리시간 0.026초

A Real-Time Embedded Speech Recognition System

  • Nam, Sang-Yep;Lee, Chun-Woo;Lee, Sang-Won;Park, In-Jung
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2002년도 ITC-CSCC -1
    • /
    • pp.690-693
    • /
    • 2002
  • According to the growth of communication biz, embedded market rapidly developing in domestic and overseas. Embedded system can be used in various way such as wire and wireless communication equipment or information products. There are lots of developing performance applying speech recognition to embedded system, for instance, PDA, PCS, CDMA-2000 or IMT-2000. This study implement minimum memory of speech recognition engine and DB for apply real time embedded system. The implement measure of speech recognition equipment to fit on embedded system is like following. At first, DC element is removed from Input voice and then a compensation of high frequency was achieved by pre-emphasis with coefficients value, 0.97 and constitute division data as same size as 256 sample by lapped shift method. Through by Levinson - Durbin Algorithm, these data can get linear predictive coefficient and again, using Cepstrum - Transformer attain feature vectors. During HMM training, We used Baum-Welch reestimation Algorithm for each words training and can get the recognition result from executed likelihood method on each words. The used speech data is using 40 speech command data and 10 digits extracted form each 15 of male and female speaker spoken menu control command of Embedded system. Since, in many times, ARM CPU is adopted in embedded system, it's peformed porting the speech recognition engine on ARM core evaluation board. And do the recognition test with select set 1 and set 3 parameter that has good recognition rate on commander and no digit after the several tests using by 5 proposal recognition parameter sets. The recognition engine of recognition rate shows 95%, speech commander recognizer shows 96% and digits recognizer shows 94%.

  • PDF

Maximum Likelihood Training and Adaptation of Embedded Speech Recognizers for Mobile Environments

  • Cho, Young-Kyu;Yook, Dong-Suk
    • ETRI Journal
    • /
    • 제32권1호
    • /
    • pp.160-162
    • /
    • 2010
  • For the acoustic models of embedded speech recognition systems, hidden Markov models (HMMs) are usually quantized and the original full space distributions are represented by combinations of a few quantized distribution prototypes. We propose a maximum likelihood objective function to train the quantized distribution prototypes. The experimental results show that the new training algorithm and the link structure adaptation scheme for the quantized HMMs reduce the word recognition error rate by 20.0%.

Speech Interactive Agent on Car Navigation System Using Embedded ASR/DSR/TTS

  • Lee, Heung-Kyu;Kwon, Oh-Il;Ko, Han-Seok
    • 음성과학
    • /
    • 제11권2호
    • /
    • pp.181-192
    • /
    • 2004
  • This paper presents an efficient speech interactive agent rendering smooth car navigation and Telematics services, by employing embedded automatic speech recognition (ASR), distributed speech recognition (DSR) and text-to-speech (ITS) modules, all while enabling safe driving. A speech interactive agent is essentially a conversational tool providing command and control functions to drivers such' as enabling navigation task, audio/video manipulation, and E-commerce services through natural voice/response interactions between user and interface. While the benefits of automatic speech recognition and speech synthesizer have become well known, involved hardware resources are often limited and internal communication protocols are complex to achieve real time responses. As a result, performance degradation always exists in the embedded H/W system. To implement the speech interactive agent to accommodate the demands of user commands in real time, we propose to optimize the hardware dependent architectural codes for speed-up. In particular, we propose to provide a composite solution through memory reconfiguration and efficient arithmetic operation conversion, as well as invoking an effective out-of-vocabulary rejection algorithm, all made suitable for system operation under limited resources.

  • PDF

Development of a Work Management System Based on Speech and Speaker Recognition

  • Gaybulayev, Abdulaziz;Yunusov, Jahongir;Kim, Tae-Hyong
    • 대한임베디드공학회논문지
    • /
    • 제16권3호
    • /
    • pp.89-97
    • /
    • 2021
  • Voice interface can not only make daily life more convenient through artificial intelligence speakers but also improve the working environment of the factory. This paper presents a voice-assisted work management system that supports both speech and speaker recognition. This system is able to provide machine control and authorized worker authentication by voice at the same time. We applied two speech recognition methods, Google's Speech application programming interface (API) service, and DeepSpeech speech-to-text engine. For worker identification, the SincNet architecture for speaker recognition was adopted. We implemented a prototype of the work management system that provides voice control with 26 commands and identifies 100 workers by voice. Worker identification using our model was almost perfect, and the command recognition accuracy was 97.0% in Google API after post- processing and 92.0% in our DeepSpeech model.

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

  • Shekofteh, Yasser;Almasganj, Farshad
    • ETRI Journal
    • /
    • 제35권1호
    • /
    • pp.100-108
    • /
    • 2013
  • In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.

텔레메틱스 단말기 내의 오디오/비디오 명령처리를 위한 임베디드용 음성인식 시스템의 구현 (Implementation of Embedded Speech Recognition System for Supporting Voice Commander to Control an Audio and a Video on Telematics Terminals)

  • 권오일;이흥규
    • 대한전자공학회논문지TC
    • /
    • 제42권11호
    • /
    • pp.93-100
    • /
    • 2005
  • 본 논문에서는 차량 내에서 음성인식 인터페이스를 이용한 오비오, 비디오와 같은 응용서비스 처리를 위해 임베디드형 음성인식 시스템을 구현한다. 임베디드형 음성인식 시스템은 DSP 보드로 제작 포팅된다. 이는 음성 인식률이 마이크, 음성 코덱 등의 H/W의 영향을 받기 때문이다. 또한 차량 내 잡음을 효율적으로 제거하기 위한 최적의 환경을 구축하고, 이에 따른 테스트 환경을 최적화한다. 본 논문에서 제안된 시스템은 차량 내에서의 신뢰적인 음성인식을 위해 잡음제거 및 특징보상 기술을 적용하고 임베디드 환경에서의 속도 및 성능 향상을 위한 문맥 종속 믹스쳐 공유 음향 모델링을 적용한다. 성능평가는 일반 실험실 환경에서의 인식률과 실제 차량 내에서의 실차 테스트를 통해 검증되었다.

A Study on Design and Implementation of Embedded System for speech Recognition Process

  • Kim, Jung-Hoon;Kang, Sung-In;Ryu, Hong-Suk;Lee, Sang-Bae
    • 한국지능시스템학회논문지
    • /
    • 제14권2호
    • /
    • pp.201-206
    • /
    • 2004
  • This study attempted to develop a speech recognition module applied to a wheelchair for the physically handicapped. In the proposed speech recognition module, TMS320C32 was used as a main processor and Mel-Cepstrum 12 Order was applied to the pro-processor step to increase the recognition rate in a noisy environment. DTW (Dynamic Time Warping) was used and proven to be excellent output for the speaker-dependent recognition part. In order to utilize this algorithm more effectively, the reference data was compressed to 1/12 using vector quantization so as to decrease memory. In this paper, the necessary diverse technology (End-point detection, DMA processing, etc.) was managed so as to utilize the speech recognition system in real time

자바를 이용한 음성인식 시스템에 관한 연구 (Study of Speech Recognition System Using the Java)

  • 최광국;김철;최승호;김진영
    • 한국음향학회지
    • /
    • 제19권6호
    • /
    • pp.41-46
    • /
    • 2000
  • 본 논문에서는 자바를 사용하여 연속분포 HMM 알고리즘과 Browser-embedded 모델로 음성인식시스템을 구현하였다. 이 시스템은 웹상에서 음성분석, 처리, 인식과정을 실행할 수 있도록 설계되었으며, 클라이언트에서는 자바애플릿을 이용하여 음성의 끝점검출과 MFCC와 에너지 그리고 델타계수들을 추출하여 소켓을 통해 서버로 전송하고, 서버는 HMM 인식기와 학습DB를 이용하여 인식을 수행하고 인식된 결과는 클라이언트에 전송되어 문자로 출력되어진다. 또한 이 시스템은 플랫폼에 독립적인 시스템으로 네트웍상에서 구축되었기 때문에 높은 에러율을 갖고 있지만 멀티미디어 분야에 접목시켰다는 의의와 향후에 새로운 정보통신 서비스가 될 가능성이 있음을 알 수 있었다.

  • PDF

품사 부착 말뭉치를 이용한 임베디드용 연속음성인식의 어휘 적용률 개선 (Vocabulary Coverage Improvement for Embedded Continuous Speech Recognition Using Part-of-Speech Tagged Corpus)

  • 임민규;김광호;김지환
    • 대한음성학회지:말소리
    • /
    • 제67호
    • /
    • pp.181-193
    • /
    • 2008
  • In this paper, we propose a vocabulary coverage improvement method for embedded continuous speech recognition (CSR) using a part-of-speech (POS) tagged corpus. We investigate 152 POS tags defined in Lancaster-Oslo-Bergen (LOB) corpus and word-POS tag pairs. We derive a new vocabulary through word addition. Words paired with some POS tags have to be included in vocabularies with any size, but the vocabulary inclusion of words paired with other POS tags varies based on the target size of vocabulary. The 152 POS tags are categorized according to whether the word addition is dependent of the size of the vocabulary. Using expert knowledge, we classify POS tags first, and then apply different ways of word addition based on the POS tags paired with the words. The performance of the proposed method is measured in terms of coverage and is compared with those of vocabularies with the same size (5,000 words) derived from frequency lists. The coverage of the proposed method is measured as 95.18% for the test short message service (SMS) text corpus, while those of the conventional vocabularies cover only 93.19% and 91.82% of words appeared in the same SMS text corpus.

  • PDF

다층회귀신경예측 모델 및 HMM 를 이용한 임베디드 음성인식 시스템 개발에 관한 연구 (A Study on Development of Embedded System for Speech Recognition using Multi-layer Recurrent Neural Prediction Models & HMM)

  • 김정훈;장원일;김영탁;이상배
    • 한국지능시스템학회논문지
    • /
    • 제14권3호
    • /
    • pp.273-278
    • /
    • 2004
  • 본 논문은 주인식기로 흔히 사용되는 HMM 인식 알고리즘을 보완하기 위한 방법으로 회귀신경회로망(Recurrent neural networks : RNN)을 적용하였다. 이 회귀신경회로망 중에서 실 시간적으로 동작이 가능하게 한 방법인 다층회귀신경예측 모델 (Multi-layer Recurrent Neural Prediction Model : MRNPM)을 사용하여 학습 및 인식기로 구현하였으며, HMM과 MRNPM 을 이용하여 Hybrid형태의 주 인식기로 설계하였다. 설계된 음성 인식 알고리즘을 잘 구별되지 않는 한국어 숫자음(13개 단어)에 대해 화자 독립형으로 인식률 테스트 한 결과 기존의 HMM인식기 보다 5%정도의 인식률 향상이 나타났다. 이 결과를 이용하여 실제 DSP(TMS320C6711) 환경 내에서 최적(인식) 코드만을 추출하여 임베디드 음성 인식 시스템을 구현하였다. 마찬가지로 임베디드 시스템의 구현 결과도 기존 단독 HMM 인식시스템보다 향상된 인식시스템을 구현할 수 있게 되었다.