Search | Korea Science

Multi Mode Harmonic Transform Coding for Speech and Music

Kim, Jonghark;Shin, Jae-Hyun;Lee, Insung
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.3E
- /
- pp.101-109
- /
- 2003
A multi-mode harmonic transform coding (MMHTC) for speech and music signals is proposed. Its structure is organized as a linear prediction model with an input of harmonic and transform-based excitation. The proposed coder also utilizes harmonic prediction and an improved quantizer of excitation signal. To efficiently quantize the excitation of music signals, the modulated lapped transform(MLT) is introduced. In other words, the coder combines both the time domain (linear prediction) and the frequency domain technique to achieve the best perceptual quality. The proposed coder showed better speech quality than that of the 8 kbps QCELP coder at a bit-rate of 4 kbps.
PDF KSCI

A Speech Coder using the Simplified Multi-mode Method (단순화된 다중 모드 방법을 이용한 음성 부호화기)

강홍구
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1995.06a
- /
- pp.146-149
- /
- 1995
This paper proposes a SM-CELP speech coder which applies different excitation signal according to the characteristic of speech segment at bit-rate below 4 kbps. Speech signal is divided with 2 modes such as stationary voice and etc. using the parameters of average energy of the short-time speech and the residual signal after long term prediction. Structured multi-pulse method is used for the excitation of mode-A and gaussian or pulse-like codebook for mode-B. 4.8kbps DoD-CELP are used to evaluate the performance of the proposed coder. As a result, the propose method shows 1~2 dB higher segmental signal to noise ratio and better subjectional quality without increasing the computational amount.
PDF

Efficient Harmonic-CELP Based Low Bit Rate Speech Coder (효율적인 하모닉-CELP 구조를 갖는 저 전송률 음성 부호화기)

최용수;김경민;윤대희
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.5
- /
- pp.35-47
- /
- 2001
This paper describes an efficient harmonic-CELP speech coder by taking advantages of harmonic and CELP coders into account. According to frame voicing decision, the proposed harmonic-CELP coder adopts the RP-VSELP coder as a fast CELP in case of an unvoiced frame, or an improved harmonic coder in case of a voiced frame. The proposed coder has main features as follows: simple pitch detection, fast harmonic estimation, variable dimension harmonic vector quantization, perceptual weighting reflecting frequency resolution, fast harmonic synthesis, naturalness control using band voicing, and multi-mode. These features make the proposed coder require very low complexity, compared with HVXC coder To demonstrate the performance of the proposed coder, a 2.4 kbps coder has been implemented and compared with reference coders. From results of informal listening tests, the proposed coder showed good quality while requiring low delay and complexity.
PDF

Method of a Multi-mode Low Rate Speech Coder Using a Transient Coding at the Rate of 2.4 kbit/s (전이구간 부호화를 이용한 2.4 kbit/s 다중모드 음성 부호화 방법)

Ahn Yeong-uk;Kim Jong-hak;Lee Insung;Kwon Oh-ju;Bae Mun-Kwan
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.42 no.2 s.302
- /
- pp.131-142
- /
- 2005
The low rate speech coders under 4 kbit/s are based on sinusoidal transform coding (STC) or multiband excitation (MBE). Since the harmonic coders are not efficient to reconstruct the transient segments of speech signals such as onsets, offsets, non-periodic signals, etc, the coders do not provide a natural speech quality. This paper proposes method of a efficient transient model :d a multi-mode low rate coder at 2.4 kbit/s that uses harmonic model for the voiced speech, stochastic model for the unvoiced speech and a model using aperiodic pulse location tracking (APPT) for the transient segments, respectively. The APPT utilizes the harmonic model. The proposed method uses different models depending on the characteristics of LPC residual signals. In addition, it can combine synthesized excitation in CELP coding at time domain with that in harmonic coding at frequency domain efficiently. The proposed coder shows a better speech quality than 2.4 kbit/s version of the mixed excitation linear prediction (MELP) coder that is a U.S. Federal Standard for speech coder.
PDF KSCI

Design of video encoder using Multi-dimensional DCT (다차원 DCT를 이용한 비디오 부호화기 설계)

Jeon, S.Y.;Choi, W.J.;Oh, S.J.;Jeong, S.Y.;Choi, J.S.;Moon, K.A.;Hong, J.W.;Ahn, C.B.
- Journal of Broadcast Engineering
- /
- v.13 no.5
- /
- pp.732-743
- /
- 2008
In H.264/AVC, 4$\times$4 block transform is used for intra and inter prediction instead of 8$\times$8 block transform. Using small block size coding, H.264/AVC obtains high temporal prediction efficiency, however, it has limitation in utilizing spatial redundancy. Motivated on these points, we propose a multi-dimensional transform which achieves both the accuracy of temporal prediction as well as effective use of spatial redundancy. From preliminary experiments, the proposed multi-dimensional transform achieves higher energy compaction than 2-D DCT used in H.264. We designed an integer-based transform and quantization coder for multi-dimensional coder. Moreover, several additional methods for multi-dimensional coder are proposed, which are cube forming, scan order, mode decision and updating parameters. The Context-based Adaptive Variable-Length Coding (CAVLC) used in H.264 was employed for the entropy coder. Simulation results show that the performance of the multi-dimensional codec appears similar to that of H.264 in lower bit rates although the rate-distortion curves of the multi-dimensional DCT measured by entropy and the number of non-zero coefficients show remarkably higher performance than those of H.264/AVC. This implies that more efficient entropy coder optimized to the statistics of multi-dimensional DCT coefficients and rate-distortion operation are needed to take full advantage of the multi-dimensional DCT. There remains many issues and future works about multi-dimensional coder to improve coding efficiency over H.264/AVC.
https://doi.org/10.5909/JBE.2008.13.5.732 인용 PDF KSCI

Enhancement of Super-wideband Coder by Considering Audio Feature in MDCT Domain (MDCT 도메인에서 오디오 신호 특징을 고려한 초광대역 코덱 개선)

Hong, Ki-Bong;Jeong, Gyu-Hyeok;Lee, In-Sung
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.48 no.5
- /
- pp.129-136
- /
- 2011
This paper presents the coding method that have multi-mode and efficiency of audio codecs using the feature of audio signal. Recently, the developed extension super-wideband codec based on G.718 wideband divides two mode between Generic and Sinusiodal. So codec efficently encode audio signal exist in super-wideband. But the codec is not as efficent coding for harmonic component of wind instrument and string instrument and individual-Line component of percussion instrument. The proposed method are modeling and encoding multiple pitch and individual-line feature using multi mode coding. For the performance evaluation, we used SNR in MDCT domain for objective test and MUSHRA test for subjective test. As a result, the performance of SNR and MUSHRA test of the proposed method have better performance than the G.718 super-wideband codec.
PDF KSCI

A New Vocoder based on AMR 7.4Kbit/s Mode for Speaker Dependent System (화자 의존 환경의 AMR 7.4Kbit/s모드에 기반한 보코더)

Min, Byung-Jae;Park, Dong-Chul
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.33 no.9C
- /
- pp.691-696
- /
- 2008
A new vocoder of Code Excited Linear Predictive (CELP) based on Adaptive Multi Rate (AMR) 7.4kbit/s mode is proposed in this paper. The proposed vocoder achieves a better compression rate in an environment of Speaker Dependent Coding System (SDSC) and is efficiently used for systems, such as OGM(Outgoing message) and TTS(Text To Speech), which needs only one person's speech. In order to enhance the compression rate of a coder, a new Line Spectral Pairs(LSP) code-book is employed by using Centroid Neural Network (CNN) algorithm. In comparison with original(traditional) AMR 7.4 Kbit/s coder, the new coder shows 27% higher compression rate while preserving synthesized speech quality in terms of Mean Opinion Score(MOS).
PDF KSCI

A New Variable Bit Rate Scheme for Waveform Interpolative Coders (파형보간 코더에서 파라미터간 거리차를 이용한 가변비트율 기법)

Yang, Hee-Sik;Jeong, Sang-Bae;Hahn, Min-Soo
- MALSORI
- /
- no.65
- /
- pp.81-91
- /
- 2008
In this paper, we propose a new variable bit-rate speech coder based on the waveform interpolation concept. After the coder extracted all parameters, the amounts of the distortions between the current and the predicted parameters which are estimated by extrapolation using past two parameters are measured for all parameters. A parameter would not be transmitted unless the distortion exceeds the preset threshold. At the decoder side, the non-transmitted parameter is reconstructed by extrapolation with past two parameters used to synthesize signals. In this way, we can reduce 26% of the total bit rate while retaining the speech quality degradation below 0.1 PESQ score.
PDF

An Efficient Algebraic Codebook Search Method for ham Speech Coder (적응형 다중 비트율 음성 부호화기를 위한 효율적인 대수코드북 검색법)

변경진;정희범;한민수
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.2
- /
- pp.129-134
- /
- 2003
In this paper, we efficiently implement the AMR speech coder by reducing the complexity of algebraic codebook search. To reduce the computational complexity of the algebraic codebook search, we propose a fast algebraic codebook search method that improves conventional depth first tree search method used in AMR speech coder algorithm. The proposed method reduces the search complexity by pruning the trees which are less possible to be selected as an optimum excitation. This method needs no additional computation for selecting the trees to be pruned and reduces the computational complexity considerably compared to the original depth first tree search method with slightly degradation or speech qualify. Applying our method to the implementation or AMR speech coder with 12.2 kbps mode by using the TeakLite DSP, we reduce the search complexity about 40% compared to the conventional method.
PDF KSCI

Efficient TTS Database Compression Based on AMR-WB Speech Coder (AMR-WB 음성 부호화기를 이용한 TTS 데이터베이스의 효율적인 압축 기법)

Lim, jong-Wook;Kim, Ki-Chul;Kim, Kyeong-Sun;Lee, Hang-Seop;Park, Hae-Young;Kim, Moo-Young
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.3
- /
- pp.290-297
- /
- 2009
This paper presents an improved adaptive multi-rate wideband (AMR-WB) algorithm for the efficient Text-To-Speech (TTS) database compression. The proposed algorithm includes unnecessary common bit-stream (CBS) removal and parameter delta coding combined with speaker-dependent huffman coding to reduce the required bit-rate without any quality degradation. We also propose lossy coding schemes to produce the maximum bit-rate reduction with negligible quality degradation. The proposed lossless algorithm including CBS removal can reduce bit-rate by 12.40% without quality degradation compared with the 12.65 kbps AMR-WB mode. The proposed lossy algorithm can reduce bit-rate by 20.00% with 0.12 PESQ degradation.
https://doi.org/10.7776/ASK.2009.28.3.290 인용 PDF KSCI

Search Result 12, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)