• Title/Summary/Keyword: VQ

Search Result 252, Processing Time 0.027 seconds

Entropy-Coded Lattice Vector Quantization Based on the Sample-Adaptive Product Quantizer and its Performance for the Memoryless Gaussian Source (표본 적응 프로덕트 양자기에 기초한 격자 벡터 양자화의 엔트로피 부호화와 무기억성 가우시언 분포에 대한 성능 분석)

  • Kim, Dong Sik
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.9
    • /
    • pp.67-75
    • /
    • 2012
  • Optimal quantizers in conducting the entropy-constrained quantization for high bit rates have the lattice structure. The quantization process is simple due to the regular structure, and various quantization algorithms are proposed depending on the lattice. Such a lattice vector quantizer (VQ) can be implemented by using the sample-adaptive product quantizer (SAPQ) and its output can also be easily entropy encoded. In this paper, the entropy encoding scheme for the lattice VQ is proposed based on SAPQ, and the performance of the proposed lattice VQ, which is based on SAPQ with the entropy coder, is asymptotically compared as the rate increases. It is shown by experiment that the gain for the memoryless Gaussian source also approaches the theoretic gain for the uniform density case.

Designing a Quantizer of LPC Parameters for the Narrowband Speech Coder using Block-Constrained Trellis Coded Quantization (블록 제한 트렐리스 부호화 양자화 기법을 이용한 협대역 음성 부호화기용 LPC 계수 양자화기 설계)

  • Jun, Ja-Kyoung;Park, Sang-Kuk;Kang, Sang-Won
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.3C
    • /
    • pp.234-240
    • /
    • 2007
  • In this paper, low complexity block constrained trellis coded quantization (BC-TCQ) structures are introduced, and a predictive BC TCQ encoding method is developed for quantization of line spectrum frequencies (LSF) parameters for narrowband speech coding applications. Trellis-coded quantization(TCQ) is a form of VQ that builds the VQ codebook from interleaved constituent scalar quantization codebooks. The performance is compared to the other VQ, demonstrating reduction in spectral distortion and significant reduction in encoding complexity. The predictive BC-TCQ is about 0.47107 dB superior to the IS-641 split-VQ, 26bits/frame, in spectral distortion sense. The BC-TCQ is 64.54%, 76.93%, 2.35% of the IS-641 split-VQ, respectively, in the complexity of the additions, multiplies, comparisons.

A Modified Multistage Vector Quantizer Using a Hybrid Structure for Image Compression (영상 압축을 위한 혼합형 구조를 이용한 변형된 다단계 벡터 앙자화기)

  • Lee, Sang-Un;Lee, Doo-Soo;LIm, In-Chil
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.6
    • /
    • pp.127-136
    • /
    • 1998
  • This paper proposes a new MVQMultistage Vector Quantizer) using a hybrid structure. While in a conventional MVQ, the quantizers of all stages perform the encoding procedure for input signals, we introduce a quantizer that performs selectively. The proposed quantizer with a hybrid structure is composed of a FSVQ(Finite-State Vector Quantizer) for the first stage and a ordinary VQ(Vector Quantizer) for the second stage. A input block is firstly encoded by the FSVQ of the first stage. If the Euclidean distortion measure between original signals and the codevector selected from the state codebook of the FSVQ is less than a prespecified value, only the FSVQ is used for image coding. Otherwise, both the FSVQ of the first stage and the ordinary VQ of the second stage are used for image coding. While the conventional MVQ has an advantage that can achieve low encoding complexity in comparison to the ordinary VQ, but has a disadvantage that is suboptimal with respect to the performance measure and can not achieve the bit rate reduction, the proposed method achieve not only the bit rate reduction but also the performance improvement.

  • PDF

Music Identification Using Pitch Histogram and MFCC-VQ Dynamic Pattern (피치 히스토그램과 MFCC-VQ 동적 패턴을 사용한 음악 검색)

  • Park Chuleui;Park Mansoo;Kim Sungtak;Kim Hoirin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.3
    • /
    • pp.178-185
    • /
    • 2005
  • This paper presents a new music identification method using probabilistic and dynamic characteristics of melody. The propo3ed method uses pitch and MFCC parameters as feature vectors for the characteristics of music notes and represents melody pattern by pitch histogram and temporal sequence of codeword indices. We also propose a new pattern matching method for the hybrid method. We have tested the proposed algorithm in small (drama OST) and broad (1.005 popular songs) search spaces. The experimental results on search areas of OST and 1,005 popular songs showed better performance of the proposed method over conventional methods. We achieved the performance improvement of average $9.9\%$ and $10.2\%$ in error reduction rate on each search area.

Simulating flood inflow to multipurposed dam on 2020.8.7.~8.8 storm with ONE model (ONE 모형에 의한 2020.8.7.~8.8. 호우의 댐 유입량 모의)

  • Noh, Jaekyoung
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.120-120
    • /
    • 2021
  • 2020년 8월 7일부터 8월 8일까지 호우는 용담댐, 섬진강댐, 합천댐 하류 유역의 막대한 침수피해를 일으켰다. 이들 다목적 댐 유입량의 신뢰도 높은 모의는 홍수기 댐 운영 및 하류하천의 홍수 해석에 필수다. 여기서는 일 유출 모의 기반으로 개발된 ONE 모형을 10분 단위, 1시간 단위로 적용한 결과를 제시하고자 한다. 보통 홍수모의는 사상별로 실시하지만, 여기서는 1월1일부터 12월 31일까지 연속으로 모의한 결과에서 해당 홍수사상 결과를 제시하였다. 3개 다목적 댐의 홍수사상은 8월6일부터 8월 10일까지 5일간으로 설정하였다. 유역면적은 용담댐, 섬진강댐, 합천댐, 각각 930km2, 763km2, 925km2, 총강우량은 각각 490.7mm, 451.9mm, 452.4mm, 첨두유입량은 10분 단위는 각각 4,872.7m3/s, 3,533.7.0m3/s, 2,776.0m3/s, 1시간 단위는 각각 4,394.9m3/s, 3,401.8m3/s, 2,745.6m3/s, 총유입량은 각각 3억8,836만m3, 3억1,324만m3, 3억2,816만m3였다. 첨두유입량 상대오차가 0일 때의 매개변수로 모의한 결과를 제시하며, 총유입량 상대오차(Vq), R2, RMSE, NSE 등으로 평가하였다. 용담댐 결과는 10분 단위 경우 최대면적강우량 7.3mm, 첨두유입량 4,872.4m3/s, 총유입량 3억 8,138만m3, Vq 1.9%, R2 0.968, RMSE 207.347, NSE 0.978였고, 1시간의 경우 최대면적강우량 29.6mm, 첨두유입량 4394.9m3/s, 총유입량 4억157만m3, Vq -8.4%, R2 0.970, RMSE 186.962, NSE 0.982였다. 섬진강댐 결과는 10분 단위 경우 최대면적강우량 9.2mm, 첨두유입량 3,533.3m3/s, 총유입량 2억7,223만m3, Vq 18.4%, R2 0.885, RMSE 808.296, NSE 0.925였고, 1시간의 경우 최대 면적강우량 37.9mm, 첨두유입량 3401.6m3/s, 총유입량 2억7,029만m3, Vq 13.7%, R2 0.907, RMSE 285.544, NSE 0.936였다. 합천댐 결과는 10분 단위 경우 최대면적강우량 5.5mm, 첨두유입량 2,776.2m3/s, 총유입량 3억3,667만m3, Vq -2.7%, R2 0.941, RMSE 191.896, NSE 0.965였고, 1시간의 경우 최대면적강우량 17.0mm, 첨두유입량 2,746.7m3/s, 총유입량 3억1,333만m3, Vq 4.5%, R2 0.965, RMSE 140.739, NSE 0.981였다. 이상 ONE 모형으로 10분, 1시간 단위의 댐 홍수 유입량 모의결과는 높은 신뢰도를 나타냈다.

  • PDF

A study on the application of residual vector quantization for vector quantized-variational autoencoder-based foley sound generation model (벡터 양자화 변분 오토인코더 기반의 폴리 음향 생성 모델을 위한 잔여 벡터 양자화 적용 연구)

  • Seokjin Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.243-252
    • /
    • 2024
  • Among the Foley sound generation models that have recently begun to be studied, a sound generation technique using the Vector Quantized-Variational AutoEncoder (VQ-VAE) structure and generation model such as Pixelsnail are one of the important research subjects. On the other hand, in the field of deep learning-based acoustic signal compression, residual vector quantization technology is reported to be more suitable than the conventional VQ-VAE structure. Therefore, in this paper, we aim to study whether residual vector quantization technology can be effectively applied to the Foley sound generation. In order to tackle the problem, this paper applies the residual vector quantization technique to the conventional VQ-VAE-based Foley sound generation model, and in particular, derives a model that is compatible with the existing models such as Pixelsnail and does not increase computational resource consumption. In order to evaluate the model, an experiment was conducted using DCASE2023 Task7 data. The results show that the proposed model enhances about 0.3 of the Fréchet audio distance. Unfortunately, the performance enhancement was limited, which is believed to be due to the decrease in the resolution of time-frequency domains in order to do not increase consumption of the computational resources.

A Study on VQ/HMM using Nonlinear Clustering and Smoothing Method (비선형 집단화와 완화기법을 이용한 VQ/HMM에 관한 연구)

  • 정희석;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.3
    • /
    • pp.35-42
    • /
    • 1999
  • In this paper, a modified clustering algorithm is proposed to improve the discrimination of discrete HMM(Hidden Markov Model), so that it has increased recognition rate of 2.16% in comparison with the original HMM using the K-means or LBG algorithm. And, for preventing the decrease of recognition rate because of insufficient training data at the training scheme of HMM, a modified probabilistic smoothing method is proposed, which has increased recognition rate of 3.07% for the speaker-independent case. In the experiment applied the two proposed algorithms, the average rate of recognition has increased 4.66% for the speaker-independent case in comparison with that of original VQ/HMM.

  • PDF

Speaker Identification Based on Vowel Classification and Vector Quantization (모음 인식과 벡터 양자화를 이용한 화자 인식)

  • Lim, Chang-Heon;Lee, Hwang-Soo;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.8 no.4
    • /
    • pp.65-73
    • /
    • 1989
  • In this paper, we propose a text-independent speaker identification algorithm based on VQ(vector quantization) and vowel classification, and its performance is studied and compared with that of a conventional speaker identification algorithm using VQ. The proposed speaker identification algorithm is composed of three processes: vowel segmentation, vowel recognition and average distortion calculation. The vowel segmentation is performed automatlcally using RMS energy, BTR(Back-to-Total cavity volume Ratio)and SFBR(Signed Front-to-Back maximum area Ratio) extracted from input speech signal. If the Input speech signal Is noisy, particularity when the SNR is around 20dB, the proposed speaker identification algorithm performs better than the reference speaker identification algorithm when the correct vowel segmentation is done. The same result is obtained when we use the noisy telephone speech signal as an input, too.

  • PDF

GMM-based Emotion Recognition Using Speech Signal (음성 신호를 사용한 GMM기반의 감정 인식)

  • 서정태;김원구;강면구
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.235-241
    • /
    • 2004
  • This paper studied the pattern recognition algorithm and feature parameters for speaker and context independent emotion recognition. In this paper, KNN algorithm was used as the pattern matching technique for comparison, and also VQ and GMM were used for speaker and context independent recognition. The speech parameters used as the feature are pitch. energy, MFCC and their first and second derivatives. Experimental results showed that emotion recognizer using MFCC and its derivatives showed better performance than that using the pitch and energy parameters. For pattern recognition algorithm. GMM-based emotion recognizer was superior to KNN and VQ-based recognizer.

An Efficient Vector Quantization Codebook generation using a Triangle Inequality (삼각 부등식을 이용한 빠른 벡터 양자화 코드북 생성)

  • Lee, Hyun-Jin
    • Journal of Digital Contents Society
    • /
    • v.13 no.3
    • /
    • pp.309-315
    • /
    • 2012
  • Active data are the input data which are changed its membership as Vector Quantization codebook generation algorithm is processed. In the process of VQ codebook generation algorithm performed, the actual active data out of the entire input data will be less presented as the process is performed. Therefore, if we can accurately find the active data and only if we are going to do VQ codebook generation on the active data, then we can significantly reduce the overall generation time. In this paper, we presented the triangle inequality based algorithm to select the active data. Experimental results show that our algorithm is superior to other methods in terms of the VQ codebook generation time.