Search | Korea Science

A study on the application of residual vector quantization for vector quantized-variational autoencoder-based foley sound generation model (벡터 양자화 변분 오토인코더 기반의 폴리 음향 생성 모델을 위한 잔여 벡터 양자화 적용 연구)

Seokjin Lee
- The Journal of the Acoustical Society of Korea
- /
- v.43 no.2
- /
- pp.243-252
- /
- 2024
Among the Foley sound generation models that have recently begun to be studied, a sound generation technique using the Vector Quantized-Variational AutoEncoder (VQ-VAE) structure and generation model such as Pixelsnail are one of the important research subjects. On the other hand, in the field of deep learning-based acoustic signal compression, residual vector quantization technology is reported to be more suitable than the conventional VQ-VAE structure. Therefore, in this paper, we aim to study whether residual vector quantization technology can be effectively applied to the Foley sound generation. In order to tackle the problem, this paper applies the residual vector quantization technique to the conventional VQ-VAE-based Foley sound generation model, and in particular, derives a model that is compatible with the existing models such as Pixelsnail and does not increase computational resource consumption. In order to evaluate the model, an experiment was conducted using DCASE2023 Task7 data. The results show that the proposed model enhances about 0.3 of the Fréchet audio distance. Unfortunately, the performance enhancement was limited, which is believed to be due to the decrease in the resolution of time-frequency domains in order to do not increase consumption of the computational resources.
https://doi.org/10.7776/ASK.2024.43.2.243 인용 PDF

Under the fading channel environment, performance evaluation of AF CR loop Due to the quantization effect (페이딩 채널 환경하에서의 양자화 특성에 의한 AF CR loop의 성능평가)

송재철;이경하;김선형;최형진
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.3
- /
- pp.737-746
- /
- 1996
In this paper, we present simulation result of quantization effects about a new Angular From Carrier Recovery Loop(AF CR loop) for PSK modulation technique. AF CR loop includes detected angle symbol and Multi Level hardimiter. In general, detected angle is used in dtermining symbol. Because detected angle is used to make an error signal of phase detector output, hardware implementation of AF CR loop is simpler than that of other loops. Before hardware implementation of AF CR loop, the result due to quantization effect should be investigated. In order to confirm quntization effect of AF CR loop, we evaluate performance of this loop by Monte-Carlosimulation method. Under both in the AWGN and Jake's fading noise channel environments, we confirmed the characteristics of AF CR loop in terms of RMS jitter due to quntization effect. Differential APSK modulation schemeis used in this paper. Especially, Jake's fading channel is used as a channel model and also AGC(Automatic Gain Control) is used in the overall process of performance evaluation. We obtained the resonable result of quantization effect about AF CR loop. With the result of performanceevaluation based on quantization effects, we can expect to operate AF CRloop under the fading channel environments reasonably well.
PDF

Optimal Realization of a State-Space Digital Filter Using Singular Value Decomposition (특이치 분해를 이용한 상태 공간 디지틀 필터의 최적 실현)

문용선;박종안;김재민
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.15 no.2
- /
- pp.155-165
- /
- 1990
The problem of quantization errors in digital filter design arises because of the practical necessity due to finite wordlength implementation. These errors are classified into coefficient quantization error and roung off error. In this paper, in order to analyze and reduce these errors, minimum ceefficient quantization realization is directly derived form impulse responese design specification. And using the equivalent transform relation between minimum coefficient quantization error and minimum roundoff error realizations, we synthesize an optimal realization state-space digital filter. This technique is analyzed by the simulation of an approximated 3rd model, which shows that it is superior to direct or cascade state-space digital filter in quantization errors.
PDF

Trade-off between Model Complexity and Performance in Intra-frame Predictive Vector Quantization of Wideband Speech (광대역 음성에 대한 프레임내 잔차 벡터 양자화에 있어서 모델 복잡도와 성능 사이의 교환관계)

Song, Geun-Bae;Hahn, Hern-Soo
- The Journal of Korea Robotics Society
- /
- v.5 no.1
- /
- pp.70-76
- /
- 2010
This paper addresses a design issue of "model complexity and performance trade-off" in the application of bandwidth extension (BWE) methods to the intra-frame predictivevector quantization problem of wideband speech. It discusses model-based linear and non-linear prediction methods and presents a comparative study of them in terms of prediction gain. Through experimentation, the general trend of saturation in performance (with the increase in model complexity) is observed. However, specifically, it is also observed that there is no significant difference between HMM and GMM-based BWE functions.
PDF KSCI

Digital Implementation of Backing up control of Truck-trailer type Mobile Robots (트럭-트레일러 타입의 모바일로봇을 위한 귀환 제어기 설계)

Ku, Ja-Yl;Park, Chang-Woo
- 전자공학회논문지 IE
- /
- v.46 no.2
- /
- pp.33-45
- /
- 2009
In this paper, the implementation of the backward movement control of a truck-trailer type mobile robot using fuzzy model based control scheme considering the practical constraints, computing time-delay and quantization is presented. We propose the fuzzy feedback controller whose output is delayed with unit sampling period and predicted. The analysis and the design problem considering the computing time-delay become very easy because the proposed controller is syncronized with the sampling time. Also, the stability analysis is made when the quantization exists in the implementation of the fuzzy control architectures and it is shown that if the trivial solution of the fuzzy control system without quantization is asymptotically stable, then the solutions of the fuzzy control system with quantization are uniformly ultimately bounded. The experimental results are shown to verify the effectiveness of the proposed scheme.
PDF KSCI

Development of a Robust Multiple Audio Watermarking Using Improved Quantization Index Modulation and Support Vector Machine (개선된 QIM과 SVM을 이용한 공격에 강인한 다중 오디오 워터마킹 알고리즘 개발)

Seo, Ye-Jin;Cho, San-Gjin;Chong, Ui-Pil
- Journal of the Institute of Convergence Signal Processing
- /
- v.16 no.2
- /
- pp.63-68
- /
- 2015
This paper proposes a robust multiple audio watermarking algorithm using improved QIM(quantization index modulation) with adaptive stepsize for different signal power and SVM(support vector machine) decoding model. The proposed algorithm embeds watermarks into both frequency magnitude response and frequency phase response using QIM. This multiple embedding method can achieve a complementary robustness. The SVM decoding model can improve detection rate when it is not sure whether the extracted data are the watermarks or not. To evaluate robustness, 11 attacks are employed. Consequently, the proposed algorithm outperforms previous multiple watermarking algorithm, which is identical to the proposed one but without SVM decoding model, in PSNR and BER. It is noticeable that the proposed algorithm achieves improvements of maximum PSNR 7dB and BER 10%.
PDF KSCI

Adaptive Skin Segmentation based on Region Histogram of Color Quantization Map (칼라 양자화 맵의 영역 히스토그램에 기반한 조명 적응적 피부색 영역 분할)

Cho, Seong-Sik;Bae, Jung-Tae;Lee, Seong-Whan
- Journal of KIISE:Software and Applications
- /
- v.36 no.1
- /
- pp.54-61
- /
- 2009
This paper proposes a skin segmentation method based on region histograms of the color quantization map. First, we make a quantization map of the image using the JSEG algorithm and detect the skin pixel. For the skin region detection, the similar neighboring regions are set by its similarity of the size and location between the previous frame and the present frame from the each region of the color quantization map. Then we compare the similarity of histogram between the color distributions of each quantized region and the skin color model using the histogram distance. We select the skin region by the threshold value calculated automatically. The skin model is updated by the skin color information from the selected result. The proposed algorithm was compared with previous algorithms on the ECHO database and the continuous images captured under time varying illumination for adaptation test. Our approach shows better performance than previous approaches on skin color segmentation and adaptation to varying illumination.
PDF KSCI

Adaptive Digital Watermarking Based on Wavelet Transform Using Successive Subband Quantization and Perceptual Model

Kim, Ju-Young;Kwon, Seong-geun;Hwang, Hee-Chul;Kwon, Ki-Ryong;Kim, Duk-Gyoo
- Proceedings of the IEEK Conference
- /
- 2002.07b
- /
- pp.1240-1243
- /
- 2002
In this paper, we propose an adaptive digital image watermarking algorithm using successive subband quantization (SSQ) and perceptual model based on wavelet domain. The watermark is embedded into the perceptually significant coefficients (PSCs) of image. The PSCs in the baseband are selected according to the amplitude of the coefficients and the high frequency subbands are selected by SSQ. To embed the watermark, we use perceptual model. The perceptual model is based on the computation of the noise visibility function (NVF) and embed at the texture and edge region stronger embedded watermarks.
PDF

Adaptive Watermarking Using Successive Subband Quantization and Perceptual Model Based on Multiwavelet Transform Domain (멀티웨이브릿 변환 영역 기반의 연속 부대역 양자화 및 지각 모델을 이용한 적응 워터마킹)

권기룡;이준재
- Journal of Korea Multimedia Society
- /
- v.6 no.7
- /
- pp.1149-1158
- /
- 2003
Content adaptive watermark embedding algorithm using a stochastic image model in the multiwavelet transform is proposed in this paper. A watermark is embedded into the perceptually significant coefficients (PSCs) of each subband using multiwavelet transform. The PSCs in high frequency subband are selected by SSQ, that is, by setting the thresholds as the one half of the largest coefficient in each subband. The perceptual model is applied with a stochastic approach based on noise visibility function (NVF) that has local image properties for watermark embedding. This model uses stationary Generalized Gaussian model characteristic because watermark has noise properties. The watermark estimation use shape parameter and variance of subband region. it is derive content adaptive criteria according to edge and texture, and flat region. The experiment results of the proposed watermark embedding method based on multiwavelet transform techniques were found to be excellent invisibility and robustness.
PDF

Vector Quantization based Speech Recognition Performance Improvement using Maximum Log Likelihood in Gaussian Distribution (가우시안 분포에서 Maximum Log Likelihood를 이용한 벡터 양자화 기반 음성 인식 성능 향상)

Chung, Kyungyong;Oh, SangYeob
- Journal of Digital Convergence
- /
- v.16 no.11
- /
- pp.335-340
- /
- 2018
Commercialized speech recognition systems that have an accuracy recognition rates are used a learning model from a type of speaker dependent isolated data. However, it has a problem that shows a decrease in the speech recognition performance according to the quantity of data in noise environments. In this paper, we proposed the vector quantization based speech recognition performance improvement using maximum log likelihood in Gaussian distribution. The proposed method is the best learning model configuration method for increasing the accuracy of speech recognition for similar speech using the vector quantization and Maximum Log Likelihood with speech characteristic extraction method. It is used a method of extracting a speech feature based on the hidden markov model. It can improve the accuracy of inaccurate speech model for speech models been produced at the existing system with the use of the proposed system may constitute a robust model for speech recognition. The proposed method shows the improved recognition accuracy in a speech recognition system.
https://doi.org/10.14400/JDC.2018.16.11.335 인용 PDF KSCI HTML

Search Result 227, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)