• Title/Summary/Keyword: Vector Quantization(VQ)

Search Result 129, Processing Time 0.023 seconds

HMM-based Speech Recognition using FSVQ, Fuzzy Concept and Doubly Spectral Feature (FSVQ, 퍼지 개념 및 이중 스펙트럼 특징을 이용한 HMM에 기초를 둔 음성 인식)

  • 정의봉
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.4
    • /
    • pp.491-502
    • /
    • 2004
  • In this paper, we propose a HMM model using FSVQ(First Section VQ), fuzzy theory and doubly spectral feature, as study on the isolated word recognition system of speaker-independent. In the proposed paper, LPC cepstrum coefficients and regression coefficients of LPC cepstrum as doubly spectral feature be used. And, training data are divided several section and first section is generated codebook of VQ, and then is obtained multi-observation sequences by order of large propabilistic values based on fuzzy nile from the codebook of the first section. Thereafter, this observation sequences of first section is trained and is recognized a word to be obtained highest probaility by same concept. Besides the speech recognition experiments of proposed method, we experiment the other methods under the equivalent environment of data and conditions. In the whole experiment, it is proved that the proposed method is superior to the others in recognition rate.

  • PDF

A Study on Design and Implementation of Speech Recognition System Using ART2 Algorithm

  • Kim, Joeng Hoon;Kim, Dong Han;Jang, Won Il;Lee, Sang Bae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.2
    • /
    • pp.149-154
    • /
    • 2004
  • In this research, we selected the speech recognition to implement the electric wheelchair system as a method to control it by only using the speech and used DTW (Dynamic Time Warping), which is speaker-dependent and has a relatively high recognition rate among the speech recognitions. However, it has to have small memory and fast process speed performance under consideration of real-time. Thus, we introduced VQ (Vector Quantization) which is widely used as a compression algorithm of speaker-independent recognition, to secure fast recognition and small memory. However, we found that the recognition rate decreased after using VQ. To improve the recognition rate, we applied ART2 (Adaptive Reason Theory 2) algorithm as a post-process algorithm to obtain about 5% recognition rate improvement. To utilize ART2, we have to apply an error range. In case that the subtraction of the first distance from the second distance for each distance obtained to apply DTW is 20 or more, the error range is applied. Likewise, ART2 was applied and we could obtain fast process and high recognition rate. Moreover, since this system is a moving object, the system should be implemented as an embedded one. Thus, we selected TMS320C32 chip, which can process significantly many calculations relatively fast, to implement the embedded system. Considering that the memory is speech, we used 128kbyte-RAM and 64kbyte ROM to save large amount of data. In case of speech input, we used 16-bit stereo audio codec, securing relatively accurate data through high resolution capacity.

Implementation of A Fast Preprocessor for Isolated Word Recognition (고립단어 인식을 위한 빠른 전처리기의 구현)

  • Ahn, Young-Mok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.1
    • /
    • pp.96-99
    • /
    • 1997
  • This paper proposes a very fast preprocessor for isolated word recognition. The proposed preprocessor has a small computational cost for extracting candidate words. In the preprocessor, we used a feature sorting algorithm instead of vector quantization to reduce the computational cost. In order to show the effectiveness of our preprocessor, we compared it to a speech recognition system based on semi-continuous hidden Markov Model and a VQ-based preprocessor by computing their recognition performances of a speaker independent isolated word recognition. For the experiments, we used the speech database consisting of 244 words which were uttered by 40 male speakers. The set of speech data uttered by 20 male speakers was used for training, and the other set for testing. As the results, the accuracy of the proposed preprocessor was 99.9% with 90% reduction rate for the speech database.

  • PDF

The Design of Optimal Filters in Vector-Quantized Subband Codecs (벡터양자화된 부대역 코덱에서 최적필터의 구현)

  • 지인호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.1
    • /
    • pp.97-102
    • /
    • 2000
  • Subband coding is to divide the signal frequency band into a set of uncorrelated frequency bands by filtering and then to encode each of these subbands using a bit allocation rationale matched to the signal energy in that subband. The actual coding of the subband signal can be done using waveform encoding techniques such as PCM, DPCM and vector quantizer(VQ) in order to obtain higher data compression. Most researchers have focused on the error in the quantizer, but not on the overall reconstruction error and its dependence on the filter bank. This paper provides a thorough analysis of subband codecs and further development of optimum filter bank design using vector quantizer. We compute the mean squared reconstruction error(MSE) which depends on N the number of entries in each code book, k the length of each code word, and on the filter bank coefficients. We form this MSE measure in terms of the equivalent quantization model and find the optimum FIR filter coefficients for each channel in the M-band structure for a given bit rate, given filter length, and given input signal correlation model. Specific design examples are worked out for 4-tap filter in 2-band paraunitary filter bank structure. These optimum paraunitary filter coefficients are obtained by using Monte Carlo simulation. We expect that the results of this work could be contributed to study on the optimum design of subband codecs using vector quantizer.

  • PDF

A Study on Korean Phoneme Classification using Recursive Least-Square Algorithm (Recursive Least-Square 알고리즘을 이용한 한국어 음소분류에 관한 연구)

  • Kim, Hoe-Rin;Lee, Hwang-Su;Un, Jong-Gwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.6 no.3
    • /
    • pp.60-67
    • /
    • 1987
  • In this paper, a phoneme classification method for Korean speech recognition has been proposed and its performance has been studied. The phoneme classification has been done based on the phonemic features extracted by the prewindowed recursive least-square (PRLS) algorithm that is a kind of adaptive filter algorithms. Applying the PRLS algorithm to input speech signal, precise detection of phoneme boundaries has been made, Reference patterns of Korean phonemes have been generated by the ordinery vector quantization (VQ) of feature vectors obtained manualy from prototype regions of each phoneme. In order to obtain the performance of the proposed phoneme classification method, the method has been tested using spoken names of seven Korean cities which have eleven different consonants and eight different vowels. In the speaker-dependent phoneme classification, the accuracy is about $85\%$ considering simple phonemic rules of Korean language, while the accuracy of the speaker-independent case is far less than that of the speaker-dependent case.

  • PDF

Image Data Compression Using Biorthgnal Wavelet Transform and Variable Block Size Edges Extraction (쌍직교 웨이브렛 변환과 가변 블럭 윤곽선 추출에 의한 영상 데이타 압축)

  • 김기옥;김재공
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.7
    • /
    • pp.1203-1212
    • /
    • 1994
  • This paper proposes a variable block size vector quantization based on a biorthogonal wavelet transform for image compression. An image is first decomposed with the biorthogonal wavelet transform into multiresolution image and the wavelet coefficients of the middle frequency bands are segmented using the quadtree sturcture to extract the perceptually important regions in the middle frequency bands. A sedges of middle frequency bands exist the corresponding position of high frequency bands, the complicated quadtree structure of middle frequency bands is equally applied to the high frequency bands. Therefore the overhaed information of the quadtree codes needed to segment the high frequency bands can be reduced. The segmented subblocks are encoded with the codebook designed at the each scales and directions. The simulation results showed that the proposed methods could reproduce higher quality image with bit rate reduced about 20(%) than of the preceding VQ method and sufficiently reduce the bolck effect and the edge degradation.

  • PDF

Design of Wideband Speech Coder Using the MLT Residual Signal (MLT 여기신호를 이용한 광대역 음성 부호화기 설계)

  • Oh Yeon-Seon;Shin Jae-Hyun;Lee In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.5
    • /
    • pp.248-254
    • /
    • 2005
  • In this Paper, the structure of a split bandwidth wideband speech coder and its highband coder for tone qualify elevation are Proposed. The lowband and highband by the split bandwidth method are encoded independently applying the G.729E and MLT (Modulated Lapped Transform) residual model. In the highband structure which is encoded by low bit rate of 4kbps, the MLT residual signals are distinguished to voice and unvoice signal . The voice signals are applied to MLT peak picking method by lowband pitch period. Because transformed MLT residual signals are represented by periodic signal that have periodic peak. The unvoice signals are applied to MLT which linear prediction spectral response is added and do vector quantization. Performance for proposed 15.8kbps wideband speech coder was verified through subjective listening test.

Rejection Performance Analysis in Vocabulary Independent Speech Recognition Based on Normalized Confidence Measure (정규화신뢰도 기반 가변어휘 고립단어 인식기의 거절기능 성능 분석)

  • Choi, Seung-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.2
    • /
    • pp.96-100
    • /
    • 2006
  • Kim et al. Proposed Normalized Confidence Measure (NCM) [1-2] and it was successfully used for rejecting mis-recognized words in isolated word recognition. However their experiments were performed on the fixed word speech recognition. In this Paper we apply NCM to the domain of vocabulary independent speech recognition (VISP) and shows the rejection Performance of NCM in VISP. Specialty we Propose vector quantization (VQ) based method for overcoming the problem of unseen triphones. It is because NCM uses the statistics of triphone confidence in the case of triphone-based normalization. According to speech recognition experiments Phone-based normalization method shows better results than RLJC[3] and also triphone-based normalization approach. This results are different with those of Kim et al [1-2]. Concludingly the Phone-based normalization shows robust Performance in VISP domain.

Efficiency Algorithm of Multispectral Image Compression in Wavelet Domain (웨이브릿 영역에서 다분광 화상데이터의 효율적인 압축 알고리듬)

  • Ban, Seong-Won;Seok, Jeong-Yeop;Kim, Byeong-Ju;Park, Gyeong-Nam;Kim, Yeong-Chun;Jang, Jong-Guk;Lee, Geon-Il
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.4
    • /
    • pp.362-370
    • /
    • 2001
  • In this paper, we proposed multispectral image compression method using CIP (classified inter-channel prediction) and SVQ (selective vector quantization) in wavelet domain. First, multispectral image is wavelet transformed and classified into one of three classes considering reflection characteristics of the subband with the lowest resolution. Then, for a reference channel which has the highest correlation and the same resolution with other channels, the variable VQ is performed in the classified intra-channel to remove spatial redundancy. For other channels, the CIP is performed to remove spectral redundancy. Finally, the prediction error is reduced by performing SVQ. Experiments are carried out on a multispectral image. The results show that the proposed method reduce the bit rate at higher reconstructed image quality and improve the compression efficiency compared to conventional methods. Index Terms-Multispectral image compression, wavelet transform, classfied inter-channel prediction, selective vetor quantization, subband with lowest resolution.

  • PDF