• Title/Summary/Keyword: 음성인식률

Search Result 549, Processing Time 0.024 seconds

Vocabulary Recognition Performance Improvement using a convergence of Bayesian Method for Parameter Estimation and Bhattacharyya Algorithm Model (모수 추정을 위한 베이시안 기법과 바타차랴 알고리즘을 융합한 어휘 인식 성능 향상)

  • Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.13 no.10
    • /
    • pp.353-358
    • /
    • 2015
  • The Vocabulary Recognition System made by recognizing the standard vocabulary is seen as a decline of recognition when out of the standard or similar words. In this case, reconstructing the system in order to add or extend a range of vocabulary is a way to solve the problem. This paper propose configured Bhattacharyya algorithm standing by speech recognition learning model using the Bayesian methods which reflect parameter estimation upon the model configuration scalability. It is recognized corrected standard model based on a characteristic of the phoneme using the Bayesian methods for parameter estimation of the phoneme's data and Bhattacharyya algorithm for a similar model. By Bhattacharyya algorithm to configure recognition model evaluates a recognition performance. The result of applying the proposed method is showed a recognition rate of 97.3% and a learning curve of 1.2 seconds.

A Study on Hybrid Structure of Semi-Continuous HMM and RBF for Speaker Independent Speech Recognition (화자 독립 음성 인식을 위한 반연속 HMM과 RBF의 혼합 구조에 관한 연구)

  • 문연주;전선도;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.8
    • /
    • pp.94-99
    • /
    • 1999
  • It is the hybrid structure of HMM and neural network(NN) that shows high recognition rate in speech recognition algorithms. And it is a method which has majorities of statistical model and neural network model respectively. In this study, we propose a new style of the hybrid structure of semi-continuous HMM(SCHMM) and radial basis function(RBF), which re-estimates weighting coefficients probability affecting observation probability after Baum-Welch estimation. The proposed method takes account of the similarity of basis Auction of RBF's hidden layer and SCHMM's probability density functions so as to discriminate speech signals sensibly through the learned and estimated weighting coefficients of RBF. As simulation results show that the recognition rates of the hybrid structure SCHMM/RBF are higher than those of SCHMM in unlearned speakers' recognition experiment, the proposed method has been proved to be one which has more sensible property in recognition than SCHMM.

  • PDF

Cepstral Normalization Combined with CSFN for Noisy Speech Recognition (켑스트럼 정규화와 켑스트럼 거리기반 묵음특징정규화 방법을 이용한 잡음음성 인식)

  • Choi, Sook-Nam;Shen, Guang-Hu;Chung, Hyun-Yeol
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.10
    • /
    • pp.1221-1228
    • /
    • 2011
  • The speech recognition system works well in general indoor environment. However, the recognition performance is dramatically decreased when the system is used in the real environment because of the several noises. In this paper we proposed CSFN-CMVN to improve the recognition performance of the existing CSFN(Cepstral distance based SFN). The CSFN-CMVN method is a combined method of cepstral normalization with CSFN that normalizes silence features using cepstral euclidean distance to classify speech/silence for better performance. From the test results using Aurora 2.0 DB, we could find out that our proposed CSFN-CMVN improves about 7% of more average word accuracy in all the test sets comparing with the typical silence features normalization SFN-I. We can also get improved accuracy of 6% and 5% respectively in compared tests with the conventional SFN-II and CSFN, showing the effectiveness of our proposed method.

Automatic Generation of Pronunciation Variants for Korean Continuous Speech Recognition (한국어 연속음성 인식을 위한 발음열 자동 생성)

  • 이경님;전재훈;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2
    • /
    • pp.35-43
    • /
    • 2001
  • Many speech recognition systems have used pronunciation lexicon with possible multiple phonetic transcriptions for each word. The pronunciation lexicon is of often manually created. This process requires a lot of time and efforts, and furthermore, it is very difficult to maintain consistency of lexicon. To handle these problems, we present a model based on morphophon-ological analysis for automatically generating Korean pronunciation variants. By analyzing phonological variations frequently found in spoken Korean, we have derived about 700 phonemic contexts that would trigger the multilevel application of the corresponding phonological process, which consists of phonemic and allophonic rules. In generating pronunciation variants, morphological analysis is preceded to handle variations of phonological words. According to the morphological category, a set of tables reflecting phonemic context is looked up to generate pronunciation variants. Our experiments show that the proposed model produces mostly correct pronunciation variants of phonological words. Then we estimated how useful the pronunciation lexicon and training phonetic transcription using this proposed systems.

  • PDF

A embodiment of mouse pointing system using 3-axis accelerometer and sound-recognition module (3축 가속도센서 및 음성인식 모듈을 이용한 마우스 포인팅 시스템의 구현)

  • Lee, Seung-Joon;Shin, Dong-Hwan;Kasno, Mohamad Afif B.;Kim, Joo-Woong;Park, Jin-Woo;Eom, Ki-Hwan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.05a
    • /
    • pp.934-937
    • /
    • 2010
  • In this paper, we did pursue the embodiment of a mouse pointing system which help the handicapped and people of not familiar with using electronics use electronic devices easily. Speech Recognition and 3-axis acceleration sensors in conjunction with a headset, a new mouse pointing system is constructed. We used speaker dependent system module which are generating the BCD code by recognizing human voices because it has high recognition rate rather than speaker independent system. Head-set mouse system is organized by 3-axis accelerometer, sound recognition module and TMS320F2812 processor. The main controller, TMS320F2812 DSP-processor is communicated with main computer by using SCI communications. The system is operated by Visual Basic in PC.

  • PDF

Emotion recognition in speech using hidden Markov model (은닉 마르코프 모델을 이용한 음성에서의 감정인식)

  • 김성일;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.3 no.3
    • /
    • pp.21-26
    • /
    • 2002
  • This paper presents the new approach of identifying human emotional states such as anger, happiness, normal, sadness, or surprise. This is accomplished by using discrete duration continuous hidden Markov models(DDCHMM). For this, the emotional feature parameters are first defined from input speech signals. In this study, we used prosodic parameters such as pitch signals, energy, and their each derivative, which were then trained by HMM for recognition. Speaker adapted emotional models based on maximum a posteriori(MAP) estimation were also considered for speaker adaptation. As results, the simulation performance showed that the recognition rates of vocal emotion gradually increased with an increase of adaptation sample number.

  • PDF

A Computation Study of Prosodic Structures of Korean for Speech Recognition and Synthesis:Predicting Phonological Boundaries (음성인식.합성을 위한 한국어 운율단위 음운론의 계산적 연구:음운단위에 따른 경계의 발견)

  • Lee, Chan-Do
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.1
    • /
    • pp.280-287
    • /
    • 1997
  • The introduction of phonological knowledge, prosodic information to speech recognition and synthesis systems is very important to build successful spoken language systems. First, related works of computational phonology is overviewed and the theoretical and experimental studies of prosodic structures and boundaries in Korean are summarized. The main focus of this study is to decide which prosodic phrasing trained on a simple recurrent network. The results show information other than phonetic features. This method can be combined with other useful information to predict the boundaries more correctly and to help segmentation, which are vital for the successful speech recognition and synthesis systems.

  • PDF

Efficient Acoustic Echo Cancellation System for Distant-Talking Automatic Speech Recognition (원거리 음성 인식을 위한 효율적인 에코제거 시스템)

  • Kim, Ki-Beom;Kim, Sang-Yoon;Lee, Woo-Jung;Kwon, Min-Seok;Ko, Byeong-Seob
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2014.10a
    • /
    • pp.150-155
    • /
    • 2014
  • 본 논문에서는, 원거리 음성인식을 위한 서브밴드 필터링 기반의 빠르고 효율적인 에코제거 시스템을 제안한다. 제안하는 에코제거 시스템은 우선 채널간 유사도 (correlation) 가 높을 경우 적응필터가 오작동하는 것을 방지하기 위해 spatial decorrelation 을 적용하게 된다. 그리고 tree 형태를 가지는 IIR filterbank 기반의 subband 구조를 채택함으로써, 적은 차수로도 효과적인 analysis, synthesis 필터링을 수행할 수 있도록 한다. 이 과정에서 불가피하게 발생하는 서브 밴드간 spectral aliasing은 notch filter를 적용해 해결할 수 있다. 또한 적응 필터로는 improved proportionate normalized least-mean-square (IP-NLMS) 알고리즘을 사용해 수렴속도 및 에코제거 성능에서 우수함을 확인하였다. 마지막으로 decision-directed estimation 기반의 residual echo suppressor를 적용해 잔여 에코를 제거하게 된다. 본 논문에서는 각 단계를 구성하게 된 이론적인 배경을 소개하고, 실제 에코가 존재하는 환경에서 ERLE, 원거리 음성 인식률, computational complexity를 통해 제안하는 에코제거 시스템의 효과를 입증하도록 한다.

  • PDF

A study on the Recognition of Continuous Digits using Syntactic Analysis and One-Stage DP (구문 분석과 One-Stage DP를 이용한 연속 숫자음 인식에 관한 연구)

  • Ann, Tae-Ock
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.3
    • /
    • pp.97-104
    • /
    • 1995
  • This paper is a study on the recognition of continuous digits for the implementation of a voice dialing system, and proposes an method of speech recognition using syntactic analysis and One-Stage DP. In order to perform the speech recognition, first of all, we make DMS model by section division algorithm and let continuous digits data be recognized through the proposed One-Stage DP method using syntactic analysis. In this study, 7 continuous digits of 21 kinds which is pronounced by 8 male speakers two or three times, are used. The speaker dependent and speaker independent recognition are performed with the above data by way of the conventional One-Stage DP and the proposed One-Stage DP using syntactic analysis under the condition of laboratory environment. From the recognition experiments, it is shown that the proposed method was better than the established method. And, the recognition accuracy of speaker dependence and independence by the proposed One-Stage DP using syntactic analysis was about 91.7% and 89.7%.

  • PDF

Isolated Digit and Command Recognition in Car Environment (자동차 환경에서의 단독 숫자음 및 명령어 인식)

  • 양태영;신원호;김지성;안동순;이충용;윤대희;차일환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.2
    • /
    • pp.11-17
    • /
    • 1999
  • This paper proposes an observation probability smoothing technique for the robustness of a discrete hidden Markov(DHMM) model based speech recognizer. Also, an appropriate noise robust processing in car environment is suggested from experimental results. The noisy speech is often mislabeled during the vector quantization process. To reduce the effects of such mislabelings, the proposed technique increases the observation probability of similar codewords. For the noise robust processing in car environment, the liftering on the distance measure of feature vectors, the high pass filtering, and the spectral subtraction methods are examined. Recognition experiments on the 14-isolated words consists of the Korean digits and command words were performed. The database was recorded in a stopping car and a running car environments. The recognition rates of the baseline recognizer were 97.4% in a stopping situation and 59.1% in a running situation. Using the proposed observation probability smoothing technique, the liftering, the high pass filtering, and the spectral subtraction the recognition rates were enhanced to 98.3% in a stopping situation and to 88.6% in a running situation.

  • PDF