• Title/Summary/Keyword: Distributed Speech Recognition

Search Result 37, Processing Time 0.024 seconds

Bayesian Fusion of Confidence Measures for Confidence Scoring (베이시안 신뢰도 융합을 이용한 신뢰도 측정)

  • 김태윤;고한석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.5
    • /
    • pp.410-419
    • /
    • 2004
  • In this paper. we propose a method of confidence measure fusion under Bayesian framework for speech recognition. Centralized and distributed schemes are considered for confidence measure fusion. Centralized fusion is feature level fusion which combines the values of individual confidence scores and makes a final decision. In contrast. distributed fusion is decision level fusion which combines the individual decision makings made by each individual confidence measuring method. Optimal Bayesian fusion rules for centralized and distributed cases are presented. In isolated word Out-of-Vocabulary (OOV) rejection experiments. centralized Bayesian fusion shows over 13% relative equal error rate (EER) reduction compared with the individual confidence measure methods. In contrast. the distributed Bayesian fusion shows no significant performance increase.

Optimized Wiener Filter for Noise Reduction in VoIP Environments (VoIP 환경에서의 잡음제거를 위한 최적화된 위너 필터)

  • Jeong, Sang-Bae;Lee, Sung-Doke;Hahn, Min-Soo
    • MALSORI
    • /
    • no.64
    • /
    • pp.105-119
    • /
    • 2007
  • Noise reduction technologies are indispensable to achieve acceptable speech quality in VoIP systems. This paper proposes a Wiener filter optimized to the estimated SNR of noisy speech for the noise reduction in VoIP environments. The proposed noise canceller is applied as a pre-processor before speech encoding. The performance of the proposed method is evaluated by the PESQ in various noisy conditions. In this paper, the proposed algorithm is applied to G.711, G.723.1, and G.729A which are all VoIP speech codecs. The PESQ results show that the performance of our proposed noise reduction scheme outperforms those of the noise suppression in the IS-127 EVRC and the ETSI standard for the advanced distributed speech recognition front-end.

  • PDF

Development of Embedded Fast/Light Phoneme Recognizer for Distributed Speech Recognition (분산음성인식을 위한 내장형 고속/경량 음소인식기 개발)

  • Kim, Seung-Hi;Hwang, Kyu-Woong;Jeon, Hyun-Bae;Jeong, Hoon;Park, Jun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.05a
    • /
    • pp.395-396
    • /
    • 2007
  • ETRI 음성/언어정보연구센터에서는 분산음성인식을 위해 메모리를 작게 사용하며 속도가 빠른 음소인식기를 개발 중이다. 음향 모델, 언어 모델, 탐색 네트워크 등 고정되어 있는 정보는 인식기를 수행하기 이전에 미리 binary 형태로 구축하여 ROM 형태로 저장함으로써 실제 사용해야 할 RAM 용량을 대폭 줄일 수 있었다. Tied state에 기반한 triphone 모델에서는 unique HMM 만을 사용함으로써 인식시간 및 메모리 사용량을 대폭 줄일 수 있었다. Monophone 인식기의 경우 RAM 사용량이 179KB였으며, triphone 인식기의 경우 435KB의 RAM 사용량과 RTF(Real Time Factor) 0.02를 확인하였다.

  • PDF

Improvement of Speech Reconstructed from MFCC Using GMM (GMM을 이용한 MFCC로부터 복원된 음성의 개선)

  • Choi, Won-Young;Choi, Mu-Yeol;Kim, Hyung-Soon
    • MALSORI
    • /
    • no.53
    • /
    • pp.129-141
    • /
    • 2005
  • The goal of this research is to improve the quality of reconstructed speech in the Distributed Speech Recognition (DSR) system. For the extended DSR, we estimate the variable Maximum Voiced Frequency (MVF) from Mel-Frequency Cepstral Coefficient (MFCC) based on Gaussian Mixture Model (GMM), to implement realistic harmonic plus noise model for the excitation signal. For the standard DSR, we also make the voiced/unvoiced decision from MFCC based on GMM because the pitch information is not available in that case. The perceptual test reveals that speech reconstructed by the proposed method is preferred to the one by the conventional methods.

  • PDF

Parallel Speech Recognition on Distributed Memory Multiprocessors (분산 메모리 다중 프로세서 상에서의 병렬 음성인식)

  • 윤지현;홍성태;정상화;김형순
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10a
    • /
    • pp.747-749
    • /
    • 1998
  • 본 논문에서는 음성과 자연언어의 통합처리를 위한 효과적인 병렬 계산 모델을 제안한다. 음소모델은 continuous HMM에 기반을 둔 문맥종속형 음소를 사용하며, 언어모델은 knowledge-based approach를 사용한다. 또한 계층구조의 지식베이스상에서 다수의 가설을 처리하기 위해 memory-based parsing기술을 사용하였다. 본 연구의 병렬 음성인식 알고리즘은 분산메모리 MIMD 구조의 다중 Transputer 시스템을 이용하여 구현되었다. 실험을 통하여 음성인식 과정에서 발생하는 speech-specific problem의 해를 제공하고 음성인식 시스템의 병렬화를 통하여 실시간 음성인식의 가능성을 보여준다.

  • PDF

A Neural Net System Self-organizing the Distributed Concepts for Speech Recognition (음성인식을 위한 분산개념을 자율조직하는 신경회로망시스템)

  • Kim, Sung-Suk;Lee, Tai-Ho
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.5
    • /
    • pp.85-91
    • /
    • 1989
  • In this paper, we propose a neural net system for speech recognition, which is composed of two neural networks. Firstly the self-supervised BP(Back Propagation) network generates the distributed concept corresponding to the activity pattern in the hidden units. And then the self-organizing neural network forms a concept map which directly displays the similarity relations between concepts. By doing the above, the difficulty in learning the conventional BP network is solved and the weak side of BP falling into a pattern matcher is gone, while the strong point of generating the various internal representations is used. And we have obtained the concept map which is more orderly than the Kohonen's SOFM. The proposed neural net system needs not any special preprocessing and has a self-learning ability.

  • PDF

A Study on the PMC Adaptation for Speech Recognition under Noisy Conditions (잡음 환경에서의 음성인식을 위한 PMC 적응에 관한 연구)

  • 김현기
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.7 no.3
    • /
    • pp.9-14
    • /
    • 2002
  • In this paper we propose a method for performance enhancement of speech recognizer under noisy conditions. The parallel combination model which is presented at the PMC method using multiple Gaussian-distributed mixtures have been adapted to the variation of each mixture. The CDHMM(continuous observation density HMM) which has multiple Gaussian distributed mixtures are combined by the proposed PMC method. Also, the EM(expectation maximization) algorithm is used for adapting the model mean parameter in order to reduce the variation of the mixture density. The result of simulation, the proposed PMC adaptation method show better performance than the conventional PMC method.

  • PDF

Recognition of Korean Phonemes in the Spoken Isolated Words Using Distributed Neural Network (분산 신경망을 이용한 고립 단어 음성에 나타난 음소 인식)

  • Kim, Seon-Il;Lee, Haing-Sei
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.6
    • /
    • pp.54-61
    • /
    • 1995
  • In this paper, we implemented distributed neural network that recognizes phonemes by frame unit for the 30 Korean proverbs sentences consist of 106 isolated words. The features of speech were chosen as PLP cepstrums, energy and zero crossings, where we get those being used as inputs to the distributed neural networks in wide area for a frame to get the good temperal characteristics. A young man of twenties has produced 30 proverbs 5 times. The learning of neural network uses 4 sets of them. 1 set being unused remains for test. There exists silence between words for the easy discrimination. The ratio of frame recognition in large grouping neural network is $95.3\%$ when 4 sets were used for the learning.

  • PDF

Mel-Frequency Cepstral Coefficients Using Formants-Based Gaussian Distribution Filterbank (포만트 기반의 가우시안 분포를 가지는 필터뱅크를 이용한 멜-주파수 켑스트럴 계수)

  • Son, Young-Woo;Hong, Jae-Keun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.8
    • /
    • pp.370-374
    • /
    • 2006
  • Mel-frequency cepstral coefficients are widely used as the feature for speech recognition. In FMCC extraction process. the spectrum. obtained by Fourier transform of input speech signal is divided by met-frequency bands, and each band energy is extracted for the each frequency band. The coefficients are extracted by the discrete cosine transform of the obtained band energy. In this Paper. we calculate the output energy for each bandpass filter by taking the weighting function when applying met-frequency scaled bandpass filter. The weighting function is Gaussian distributed function whose center is at the formant frequency In the experiments, we can see the comparative performance with the standard MFCC in clean condition. and the better Performance in worse condition by the method proposed here.

A Study on the optimal text corpus for company names (한국어최적상호명코퍼스설계에관한연구)

  • Lee, Sun-Jung
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.5
    • /
    • pp.747-754
    • /
    • 2004
  • In this paper, we obtain an optimal corpus that can represent its characteristics very well from the baseline corpus which consists of unique 1,566,943 names among company names in a directory assistance serve (114). Two kinds of optimal solutions ared considered to obtain the optimal corpus. The first solution is to find phonetically balanced corpus (PBC), which are the minimum set including all possible triphones in the baseline corpus. The second solution is to find the phonetically distributed corpus (PDC), which is a minimum set representing the frequency characteristics of triphones in the baseline corpus. We can obtain 8,699 words as the PBC and 16,783 words (similarity measure R = 0.92) as PDC, respectively. These corpora can be used for the development of speech recognition and speech synthesis.

  • PDF