• Title/Summary/Keyword: VQ

Search Result 251, Processing Time 0.026 seconds

Rejection Performance Analysis in Vocabulary Independent Speech Recognition Based on Normalized Confidence Measure (정규화신뢰도 기반 가변어휘 고립단어 인식기의 거절기능 성능 분석)

  • Choi, Seung-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.2
    • /
    • pp.96-100
    • /
    • 2006
  • Kim et al. Proposed Normalized Confidence Measure (NCM) [1-2] and it was successfully used for rejecting mis-recognized words in isolated word recognition. However their experiments were performed on the fixed word speech recognition. In this Paper we apply NCM to the domain of vocabulary independent speech recognition (VISP) and shows the rejection Performance of NCM in VISP. Specialty we Propose vector quantization (VQ) based method for overcoming the problem of unseen triphones. It is because NCM uses the statistics of triphone confidence in the case of triphone-based normalization. According to speech recognition experiments Phone-based normalization method shows better results than RLJC[3] and also triphone-based normalization approach. This results are different with those of Kim et al [1-2]. Concludingly the Phone-based normalization shows robust Performance in VISP domain.

Image Data Compression Using Biorthgnal Wavelet Transform and Variable Block Size Edges Extraction (쌍직교 웨이브렛 변환과 가변 블럭 윤곽선 추출에 의한 영상 데이타 압축)

  • 김기옥;김재공
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.7
    • /
    • pp.1203-1212
    • /
    • 1994
  • This paper proposes a variable block size vector quantization based on a biorthogonal wavelet transform for image compression. An image is first decomposed with the biorthogonal wavelet transform into multiresolution image and the wavelet coefficients of the middle frequency bands are segmented using the quadtree sturcture to extract the perceptually important regions in the middle frequency bands. A sedges of middle frequency bands exist the corresponding position of high frequency bands, the complicated quadtree structure of middle frequency bands is equally applied to the high frequency bands. Therefore the overhaed information of the quadtree codes needed to segment the high frequency bands can be reduced. The segmented subblocks are encoded with the codebook designed at the each scales and directions. The simulation results showed that the proposed methods could reproduce higher quality image with bit rate reduced about 20(%) than of the preceding VQ method and sufficiently reduce the bolck effect and the edge degradation.

  • PDF

Voice Personality Transformation Using a Probabilistic Method (확률적 방법을 이용한 음성 개성 변환)

  • Lee Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.3
    • /
    • pp.150-159
    • /
    • 2005
  • This paper addresses a voice personality transformation algorithm which makes one person's voices sound as if another person's voices. In the proposed method, one person's voices are represented by LPC cepstrum, pitch period and speaking rate, the appropriate transformation rules for each Parameter are constructed. The Gaussian Mixture Model (GMM) is used to model one speaker's LPC cepstrums and conditional probability is used to model the relationship between two speaker's LPC cepstrums. To obtain the parameters representing each probabilistic model. a Maximum Likelihood (ML) estimation method is employed. The transformed LPC cepstrums are obtained by using a Minimum Mean Square Error (MMSE) criterion. Pitch period and speaking rate are used as the parameters for prosody transformation, which is implemented by using the ratio of the average values. The proposed method reveals the superior performance to the previous VQ-based method in subjective measures including average cepstrum distance reduction ratio and likelihood increasing ratio. In subjective test. we obtained almost the same correct identification ratio as the previous method and we also confirmed that high qualify transformed speech is obtained, which is due to the smoothly evolving spectral contours over time.

Image Coding Using DCT Map and Binary Tree-structured Vector Quantizer (DCT 맵과 이진 트리 구조 벡터 양자화기를 이용한 영상 부호화)

  • Jo, Seong-Hwan;Kim, Eung-Seong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.1 no.1
    • /
    • pp.81-91
    • /
    • 1994
  • A DCT map and new cldebook design algorithm based on a two-dimension discrete cosine transform (2D-DCT) is presented for coder of image vector quantizer. We divide the image into smaller subblocks, then, using 2D DCT, separate it into blocks which are hard to code but it bears most of the visual information and easy to code but little visual information, and DCT map is made. According to this map, the significant features of training image are extracted by using the 2D DCT. A codebook is generated by partitioning the training set into a binary tree based on tree-structure. Each training vector at a nonterminal node of the binary tree is directed to one of the two descendants by comparing a single feature associated with that node to a threshold. Compared with the pairwise neighbor (PPN) and classified VQ(CVQ) algorithm, about 'Lenna' and 'Boat' image, the new algorithm results in a reduction in computation time and shows better picture quality with 0.45 dB and 0.33dB differences as to PNN, 0.05dB and 0.1dB differences as to CVQ respectively.

  • PDF

Design of Wideband Speech Coder Using the MLT Residual Signal (MLT 여기신호를 이용한 광대역 음성 부호화기 설계)

  • Oh Yeon-Seon;Shin Jae-Hyun;Lee In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.5
    • /
    • pp.248-254
    • /
    • 2005
  • In this Paper, the structure of a split bandwidth wideband speech coder and its highband coder for tone qualify elevation are Proposed. The lowband and highband by the split bandwidth method are encoded independently applying the G.729E and MLT (Modulated Lapped Transform) residual model. In the highband structure which is encoded by low bit rate of 4kbps, the MLT residual signals are distinguished to voice and unvoice signal . The voice signals are applied to MLT peak picking method by lowband pitch period. Because transformed MLT residual signals are represented by periodic signal that have periodic peak. The unvoice signals are applied to MLT which linear prediction spectral response is added and do vector quantization. Performance for proposed 15.8kbps wideband speech coder was verified through subjective listening test.

A Study on Development of Embedded System for Speech Recognition using Multi-layer Recurrent Neural Prediction Models & HMM (다층회귀신경예측 모델 및 HMM 를 이용한 임베디드 음성인식 시스템 개발에 관한 연구)

  • Kim, Jung hoon;Jang, Won il;Kim, Young tak;Lee, Sang bae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.3
    • /
    • pp.273-278
    • /
    • 2004
  • In this paper, the recurrent neural networks (RNN) is applied to compensate for HMM recognition algorithm, which is commonly used as main recognizer. Among these recurrent neural networks, the multi-layer recurrent neural prediction model (MRNPM), which allows operating in real-time, is used to implement learning and recognition, and HMM and MRNPM are used to design a hybrid-type main recognizer. After testing the designed speech recognition algorithm with Korean number pronunciations (13 words), which are hardly distinct, for its speech-independent recognition ratio, about 5% improvement was obtained comparing with existing HMM recognizers. Based on this result, only optimal (recognition) codes were extracted in the actual DSP (TMS320C6711) environment, and the embedded speech recognition system was implemented. Similarly, the implementation result of the embedded system showed more improved recognition system implementation than existing solid HMM recognition systems.

A Study on the Development of Embedded Serial Multi-modal Biometrics Recognition System (임베디드 직렬 다중 생체 인식 시스템 개발에 관한 연구)

  • Kim, Joeng-Hoon;Kwon, Soon-Ryang
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.1
    • /
    • pp.49-54
    • /
    • 2006
  • The recent fingerprint recognition system has unstable factors, such as copy of fingerprint patterns and hacking of fingerprint feature point, which mali cause significant system error. Thus, in this research, we used the fingerprint as the main recognition device and then implemented the multi-biometric recognition system in serial using the speech recognition which has been widely used recently. As a multi-biometric recognition system, once the speech is successfully recognized, the fingerprint recognition process is run. In addition, speaker-dependent DTW(Dynamic Time Warping) algorithm is used among existing speech recognition algorithms (VQ, DTW, HMM, NN) for effective real-time process while KSOM (Kohonen Self-Organizing feature Map) algorithm, which is the artificial intelligence method, is applied for the fingerprint recognition system because of its calculation amount. The experiment of multi-biometric recognition system implemented in this research showed 2 to $7\%$ lower FRR (False Rejection Ratio) than single recognition systems using each fingerprints or voice, but zero FAR (False Acceptance Ratio), which is the most important factor in the recognition system. Moreover, there is almost no difference in the recognition time(average 1.5 seconds) comparing with other existing single biometric recognition systems; therefore, it is proved that the multi-biometric recognition system implemented is more efficient security system than single recognition systems based on various experiments.

Efficiency Algorithm of Multispectral Image Compression in Wavelet Domain (웨이브릿 영역에서 다분광 화상데이터의 효율적인 압축 알고리듬)

  • Ban, Seong-Won;Seok, Jeong-Yeop;Kim, Byeong-Ju;Park, Gyeong-Nam;Kim, Yeong-Chun;Jang, Jong-Guk;Lee, Geon-Il
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.4
    • /
    • pp.362-370
    • /
    • 2001
  • In this paper, we proposed multispectral image compression method using CIP (classified inter-channel prediction) and SVQ (selective vector quantization) in wavelet domain. First, multispectral image is wavelet transformed and classified into one of three classes considering reflection characteristics of the subband with the lowest resolution. Then, for a reference channel which has the highest correlation and the same resolution with other channels, the variable VQ is performed in the classified intra-channel to remove spatial redundancy. For other channels, the CIP is performed to remove spectral redundancy. Finally, the prediction error is reduced by performing SVQ. Experiments are carried out on a multispectral image. The results show that the proposed method reduce the bit rate at higher reconstructed image quality and improve the compression efficiency compared to conventional methods. Index Terms-Multispectral image compression, wavelet transform, classfied inter-channel prediction, selective vetor quantization, subband with lowest resolution.

  • PDF

The Design of Optimal Filters in Vector-Quantized Subband Codecs (벡터양자화된 부대역 코덱에서 최적필터의 구현)

  • 지인호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.1
    • /
    • pp.97-102
    • /
    • 2000
  • Subband coding is to divide the signal frequency band into a set of uncorrelated frequency bands by filtering and then to encode each of these subbands using a bit allocation rationale matched to the signal energy in that subband. The actual coding of the subband signal can be done using waveform encoding techniques such as PCM, DPCM and vector quantizer(VQ) in order to obtain higher data compression. Most researchers have focused on the error in the quantizer, but not on the overall reconstruction error and its dependence on the filter bank. This paper provides a thorough analysis of subband codecs and further development of optimum filter bank design using vector quantizer. We compute the mean squared reconstruction error(MSE) which depends on N the number of entries in each code book, k the length of each code word, and on the filter bank coefficients. We form this MSE measure in terms of the equivalent quantization model and find the optimum FIR filter coefficients for each channel in the M-band structure for a given bit rate, given filter length, and given input signal correlation model. Specific design examples are worked out for 4-tap filter in 2-band paraunitary filter bank structure. These optimum paraunitary filter coefficients are obtained by using Monte Carlo simulation. We expect that the results of this work could be contributed to study on the optimum design of subband codecs using vector quantizer.

  • PDF

Highband Coding Method Using Matching Pusuit Estimation and CELP Coding for Wideband Speech Coder (광대역 음성부호화기를 위한 매칭퍼슈잇 알고리즘과 CELP 방법을 이용한 고대역 부호화 방법)

  • Jeong Gyu-Hyeok;Ahn Yeong-Uk;Kim Jong-Hark;Shin Jae-Hyun;Seo Sang-Won;Hwang In-Kwan;Lee In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.1
    • /
    • pp.21-29
    • /
    • 2006
  • In this Paper a split bandwidth wideband speech coder and its highband coding method are Proposed. The coder uses a split-band approach. where the wideband input speech signal is split into two equal frequency bands from 0-4kHz and 4-8kHz. The lowband and the highband are coded respectively by the 11.8kb/s G.729 Annex E and the proposed coding method. After the LPC analysis, the highband is divided by two modes according to the properties of signals. In stationary mode. the highband signals are compressed by the mixture excitation model; CELP algorithm and W (Matching Pursuit) algorithm. The others are coded by the only CELP algorithm. We compare the performance of the new wideband speech coder with that of G.722 48kbps SB-ADPCM and G.722.2 12.85kbps in a subjective method. The simulation results show that the Performance of the proposed wideband speech coder has better than that of 48kbps G.722 and no better than that of 12.85kbps G.722.2.