• Title/Summary/Keyword: Vector Quantization(VQ)

Search Result 129, Processing Time 0.02 seconds

Automatic Clustering of Speech Data Using Modified MAP Adaptation Technique (수정된 MAP 적응 기법을 이용한 음성 데이터 자동 군집화)

  • Ban, Sung Min;Kang, Byung Ok;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.6 no.1
    • /
    • pp.77-83
    • /
    • 2014
  • This paper proposes a speaker and environment clustering method in order to overcome the degradation of the speech recognition performance caused by various noise and speaker characteristics. In this paper, instead of using the distance between Gaussian mixture model (GMM) weight vectors as in the Google's approach, the distance between the adapted mean vectors based on the modified maximum a posteriori (MAP) adaptation is used as a distance measure for vector quantization (VQ) clustering. According to our experiments on the simulation data generated by adding noise to clean speech, the proposed clustering method yields error rate reduction of 10.6% compared with baseline speaker-independent (SI) model, which is slightly better performance than the Google's approach.

Content-based music retrieval using temporal characteristics (Temporal 특성을 이용한 내용기반 음악 정보 검색)

  • Park Chuleui;Park Mansoo;Kim Sungtak;Kim Hoi-Rin;Kang Kyeongok
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.299-302
    • /
    • 2004
  • 본 논문에서는 내용 기반 음악 정보 검색에 음악의 temporal 특징을 이용한 검색 방법을 제안한다. 방송환경에 적용하기 위해 검색 범위를 드라마나 영화의 배경 음악으로 사용되는 OST 앨범으로 제한하였다. 오디오의 특징 벡터로써 UFCC(Mel Frequency Cepstral Coefficient)를 사용하였으며 이 특징 벡터를 이용하여 VQ(Vector Quantization)로 부호화한 codeword로 오디오 신호의 시변 특성을 표현한다. 본 논문에서는 제안한 음악의 temporal 특성을 반영한 codeword-sequence를 이용하는 방법을 pitch-histogram을 기반으로 하는 방법 및 MFCC codeword-histogram을 기반으로 하는 방법과 비교하고 성능 개선을 보여주었다.

  • PDF

A Query-by-Speech Scheme for Photo Albuming (음성 질의 기반 디지털 사진 검색 기법)

  • Kim Tae-Sung;Suh Young-Joo;Lee Yong-Ju;Kim Hoi-Rin
    • MALSORI
    • /
    • no.57
    • /
    • pp.99-112
    • /
    • 2006
  • In this paper, we introduce two retrieval methods for photos with speech documents. We compare the pattern of speech query with those of speech documents recorded in digital cameras, and measure the similarities, and retrieve photos corresponding to the speech documents which have high similarity scores. As the first approach, a phoneme recognition scheme is used as the pre-processor for the pattern matching, and in the second one, the vector quantization (VQ) and the dynamic time warping (DTW) are applied to match the speech query with the documents in signal domain itself. Experimental results show that the performance of the first approach is highly dependent on that of phoneme recognition while the processing time is short. The second method provides a great improvement of performance. While the processing time is longer than that of the first method due to DTW, but we can reduce it by taking approximated methods.

  • PDF

A Study on Design and Implementation of Embedded System for speech Recognition Process

  • Kim, Jung-Hoon;Kang, Sung-In;Ryu, Hong-Suk;Lee, Sang-Bae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.2
    • /
    • pp.201-206
    • /
    • 2004
  • This study attempted to develop a speech recognition module applied to a wheelchair for the physically handicapped. In the proposed speech recognition module, TMS320C32 was used as a main processor and Mel-Cepstrum 12 Order was applied to the pro-processor step to increase the recognition rate in a noisy environment. DTW (Dynamic Time Warping) was used and proven to be excellent output for the speaker-dependent recognition part. In order to utilize this algorithm more effectively, the reference data was compressed to 1/12 using vector quantization so as to decrease memory. In this paper, the necessary diverse technology (End-point detection, DMA processing, etc.) was managed so as to utilize the speech recognition system in real time

On a robust text-dependent speaker identification over telephone channels (전화음성에 강인한 문장종속 화자인식에 관한 연구)

  • Jung, Eu-Sang;Choi, Hong-Sub
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.57-66
    • /
    • 1997
  • This paper studies the effects of the method, CMS(Cepstral Mean Subtraction), (which compensates for some of the speech distortion. caused by telephone channels), on the performance of the text-dependent speaker identification system. This system is based on the VQ(Vector Quantization) and HMM(Hidden Markov Model) method and chooses the LPC-Cepstrum and Mel-Cepstrum as the feature vectors extracted from the speech data transmitted through telephone channels. Accordingly, we can compare the correct recognition rates of the speaker identification system between the use of LPC-Cepstrum and Mel-Cepstrum. Finally, from the experiment results table, it is found that the Mel-Cepstrum parameter is proven to be superior to the LPC-Cepstrum and that recognition performance improves by about 10% when compensating for telephone channel using the CMS.

  • PDF

Study on the searching of images via clustering (이미지 데이타 클러스터링을 이용한 검색 연구)

  • Kim, Jin-Ok;Hwang, Dae-Joon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.04a
    • /
    • pp.97-100
    • /
    • 2002
  • 이미지, 비디오, 오디오와 같은 멀티미디어 데이터들은 텍스트기반의 데이터에 비하여 대용량이고 비정형적인 특성을 가지기 때문에 검색이 어렵다. 또한 멀티미디어 데이터의 특징은 행렬이나 벡터의 형태로 표현되기 때문에 완전일치 검색이 아닌 유사 검색을 수행하여 사용자가 원하는 이미지와 유사한 이미지를 검색해야 한다. 본 연구에서는 멀티미디어 데이터 검색에 클러스터링와 인덱싱 기법을 같이 적용하여 유사한 이미지끼리는 인접 디스크에 클러스터하고 이 클러스터에 접근하는 인덱스를 구축하여 검색이 빠르게 이루어지는 유사 검색방법을 제안한다 제안 검색 방법은 클러스터링을 생성하는 알고리즘과 해싱기법의 인덱싱을 같이 적용함으로써 VQ(Vector Quantization)보다 높은 재현율과 정확도를 보인다.

  • PDF

A Study on Angiographgy Coding (심장조영상 부호화에 관한 연구)

  • Park, Sang-Hui;Han, Young-Oh;Park, Hyun-Soo;Kim, Hyung-Suk;Shin, Joong-In
    • Journal of Biomedical Engineering Research
    • /
    • v.14 no.2
    • /
    • pp.177-183
    • /
    • 1993
  • Medical imagies with high resolution are coded to be archived and communicated in MPACS. In this paper, we have studied on coding of Cardio-Angiography. Our coding technique is Subband-Vector Quantization. This techniques is irreversible coding method. This technique's advantages are removing blocking artifact and edge degradation, adapting for drastic image change because of dye injection, and fast decoding. We achieved good results for Cardio-Angiography data, but the study on more sophiscated motion estimation techniques and VQ techniques must be performed.

  • PDF

A Study on Discrete Hidden Markov Model for Vibration Monitoring and Diagnosis of Turbo Machinery (터보회전기기의 진동모니터링 및 진단을 위한 이산 은닉 마르코프 모델에 관한 연구)

  • Lee, Jong-Min;Hwang, Yo-ha;Song, Chang-Seop
    • The KSFM Journal of Fluid Machinery
    • /
    • v.7 no.2 s.23
    • /
    • pp.41-49
    • /
    • 2004
  • Condition monitoring is very important in turbo machinery because single failure could cause critical damages to its plant. So, automatic fault recognition has been one of the main research topics in condition monitoring area. We have used a relatively new fault recognition method, Hidden Markov Model(HMM), for mechanical system. It has been widely used in speech recognition, however, its application to fault recognition of mechanical signal has been very limited despite its good potential. In this paper, discrete HMM(DHMM) was used to recognize the faults of rotor system to study its fault recognition ability. We set up a rotor kit under unbalance and oil whirl conditions and sampled vibration signals of two failure conditions. DHMMS of each failure condition were trained using sampled signals. Next, we changed the setup and the rotating speed of the rotor kit. We sampled vibration signals and each DHMM was applied to these sampled data. It was found that DHMMs trained by data of one rotating speed have shown good fault recognition ability in spite of lack of training data, but DHMMs trained by data of four different rotating speeds have shown better robustness.

Entropy-Constrained Sample-Adaptive Product Quantizer Design for the High Bit-Rate Quantization (고 전송률 양자화를 위한 엔트로피 제한 표본 적응 프로덕트 양자기 설계)

  • Kim, Dong-Sik
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.1
    • /
    • pp.11-18
    • /
    • 2012
  • In this paper, an entropy constrained vector quantizer for high bit-rates is proposed. The sample-adaptive product quantizer (SAPQ), which is based on the product codebooks, is employed, and a design algorithm for the entropy constrained sample adaptive product quantizer (ECSAPQ) is proposed. The performance of the proposed ECSAPQ is better than the case of the entropy constrained vector quantizer by 0.5dB. It is also shown that the ECSAPQ distortion curve, which is based on the scalar quantizer, is lower than the high-rate theoretical curve of the entropy constrained scalar quantizer, where the theoretical curve have 1.53dB difference from Shannon's lower bound.

Cyber Character Implementation with Recognition and Synthesis of Speech/lmage (음성/영상의 인식 및 합성 기능을 갖는 가상캐릭터 구현)

  • Choe, Gwang-Pyo;Lee, Du-Seong;Hong, Gwang-Seok
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.37 no.5
    • /
    • pp.54-63
    • /
    • 2000
  • In this paper, we implemented cyber character that can do speech recognition, speech synthesis, Motion tracking and 3D animation. For speech recognition, we used Discrete-HMM algorithm with K-means 128 level vector quantization and MFCC feature vector. For speech synthesis, we used demi-syllables TD-PSOLA algorithm. For PC based Motion tracking, we present Fast Optical Flow like Method. And for animating 3D model, we used vertex interpolation with DirectSD retained mode. Finally, we implemented cyber character integrated above systems, which game calculating by the multiplication table with user and the cyber character always look at user using of Motion tracking system.

  • PDF