• Title/Summary/Keyword: Speech codec

Search Result 128, Processing Time 0.026 seconds

Real-time implementation of the 2.4kbps EHSX Speech Coder Using a $TMS320C6701^TM$ DSPCore ($TMS320C6701^TM$을 이용한 2.4kbps EHSX 음성 부호화기의 실시간 구현)

  • 양용호;이인성;권오주
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.7C
    • /
    • pp.962-970
    • /
    • 2004
  • This paper presents an efficient implementation of the 2.4 kbps EHSX(Enhanced Harmonic Stochastic Excitation) speech coder on a TMS320C6701$^{TM}$ floating-point digital signal processor. The EHSX speech codec is based on a harmonic and CELP(Code Excited Linear Prediction) modeling of the excitation signal respectively according to the frame characteristic such as a voiced speech and an unvoiced speech. In this paper, we represent the optimization methods to reduce the complexity for real-time implementation. The complexity in the filtering of a CELP algorithm that is the main part for the EHSX algorithm complexity can be reduced by converting program using floating-point variable to program using fixed-point variable. We also present the efficient optimization methods including the code allocation considering a DSP architecture and the low complexity algorithm of harmonic/pitch search in encoder part. Finally, we obtained the subjective quality of MOS 3.28 from speech quality test using the PESQ(perceptual evaluation of speech quality), ITU-T Recommendation P.862 and could get a goal of realtime operation of the EHSX codec.c.

Optimized Wiener Filter for Noise Reduction in VoIP Environments (VoIP 환경에서의 잡음제거를 위한 최적화된 위너 필터)

  • Jeong, Sang-Bae;Lee, Sung-Doke;Hahn, Min-Soo
    • MALSORI
    • /
    • no.64
    • /
    • pp.105-119
    • /
    • 2007
  • Noise reduction technologies are indispensable to achieve acceptable speech quality in VoIP systems. This paper proposes a Wiener filter optimized to the estimated SNR of noisy speech for the noise reduction in VoIP environments. The proposed noise canceller is applied as a pre-processor before speech encoding. The performance of the proposed method is evaluated by the PESQ in various noisy conditions. In this paper, the proposed algorithm is applied to G.711, G.723.1, and G.729A which are all VoIP speech codecs. The PESQ results show that the performance of our proposed noise reduction scheme outperforms those of the noise suppression in the IS-127 EVRC and the ETSI standard for the advanced distributed speech recognition front-end.

  • PDF

Adaptive Kernel Function of SVM for Improving Speech/Music Classification of 3GPP2 SMV

  • Lim, Chung-Soo;Chang, Joon-Hyuk
    • ETRI Journal
    • /
    • v.33 no.6
    • /
    • pp.871-879
    • /
    • 2011
  • Because a wide variety of multimedia services are provided through personal wireless communication devices, the demand for efficient bandwidth utilization becomes stronger. This demand naturally results in the introduction of the variable bitrate speech coding concept. One exemplary work is the selectable mode vocoder (SMV) that supports speech/music classification. However, because it has severe limitations in its classification performance, a couple of works to improve speech/music classification by introducing support vector machines (SVMs) have been proposed. While these approaches significantly improved classification accuracy, they did not consider correlations commonly found in speech and music frames. In this paper, we propose a novel and orthogonal approach to improve the speech/music classification of SMV codec by adaptively tuning SVMs based on interframe correlations. According to the experimental results, the proposed algorithm yields improved results in classifying speech and music within the SMV framework.

Design of EVRC LSP Codebooks with Korean (한국어에 의한 EVRC LSP 코드북 설계)

  • 이진걸
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.2
    • /
    • pp.167-172
    • /
    • 2002
  • The EVRC (Enhanced Variable Rate Codec) is currently in service as a speech cosec in digital cellular systems in North America and Korea. In the EVRC, the LSP (Line Spectral Pairs) related to energy distribution of speech signals in the frequency domain are coded by weighted split vector quantization. Considering that the LSP codebooks might be trained with the language of the develop country of the codebooks or English, it is expected that codebooks trained with Korean provide the performance improvements in the communication in Korean. In this paper, the EVRC LSP codebooks are designed with korean adopting the LBG algorithm based vector quantization, and the performance improvement of the vector quantization and the accompanying speech quality improvement are demonstrated by spectral distortion, SNR and SegSNR measurements, respectively.

Detection of Underwater Transient Signals Using Noise Suppression Module of EVRC Speech Codec (EVRC 음성부호화기의 잡음억제단을 이용한 수중 천이신호 검출)

  • Kim, Tae-Hwan;Bae, Keun-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.6
    • /
    • pp.301-305
    • /
    • 2007
  • In this paper, we propose a simple algorithm for detecting underwater transient signals on the fact that the frequency range of underwater transient signals is similar to audio frequency. For this, we use a preprocessing module of EVRC speech codec that is the standard speech codec of the mobile communications. If a signal is entered into EVRC noise suppression module, we can get some parameters such as the update flag, the energy of each channel, the noise suppressed signal, the energy of input signal, the energy of background noise, and the energy of enhanced signal. Therefore the energy of the enhanced signal that is normalized with the energy of the background noise is compared with the pre-defined detection threshold, and then we can detect the transient signal. And the detection threshold is updated using the previous value in the noisy period. The experimental result shows that the proposed algorithm has $0{\sim}4% error rate in the AWGN or the colored noise environment.

Developing a Low Power BWE Technique Based on the AMR Coder (AMR 기반 저 전력 인공 대역 확장 기술 개발)

  • Koo, Bon-Kang;Park, Hee-Wan;Ju, Yeon-Jae;Kang, Sang-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.4
    • /
    • pp.190-196
    • /
    • 2011
  • Bandwidth extension is a technique to improve speech quality and intelligibility, extending from 300-3400 Hz narrowband speech to 50-7000 Hz wideband speech. This paper designs an artificial bandwidth extension (ABE) module embedded in the AMR (adaptive multi-rate) decoder, reducing LPC/LSP analysis and algorithm delay of the ABE module. We also introduce a fast search codebook mapping method for ABE, and design a low power BWE technique based on the AMR decoder. The proposed ABE method reduces the computational complexity and the algorithm delay, respectively, by 28 % and 20 msec, compared to the traditional DTE (decode then extend) method. We also introduce a weighted classified codebook mapping method for constructing the spectral envelope of the wideband speech signal.

Robust Feature Extraction for Voice Activity Detection in Nonstationary Noisy Environments (음성구간검출을 위한 비정상성 잡음에 강인한 특징 추출)

  • Hong, Jungpyo;Park, Sangjun;Jeong, Sangbae;Hahn, Minsoo
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.11-16
    • /
    • 2013
  • This paper proposes robust feature extraction for accurate voice activity detection (VAD). VAD is one of the principal modules for speech signal processing such as speech codec, speech enhancement, and speech recognition. Noisy environments contain nonstationary noises causing the accuracy of the VAD to drastically decline because the fluctuation of features in the noise intervals results in increased false alarm rates. In this paper, in order to improve the VAD performance, harmonic-weighted energy is proposed. This feature extraction method focuses on voiced speech intervals and weighted harmonic-to-noise ratios to determine the amount of the harmonicity to frame energy. For performance evaluation, the receiver operating characteristic curves and equal error rate are measured.

Very Low Bit Rate Speech Coder of Analysis by Synthesis Structure Using ZINC Function Excitation (ZINC 함수 여기신호를 이용한 분석-합성 구조의 초 저속 음성 부호화기)

  • Seo, Sang-Won;Kim, Young-Jun;Kim, Jong-Hak;Kim, Young-Ju;Lee, In-Sung
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.349-350
    • /
    • 2006
  • This paper presents very low bit rate speech coder, ZFE-CELP(ZINC Function Excitation-Code Excited Linear Prediction). The ZFE-CELP speech codec is based on a ZINC function and CELP modeling of the excitation signal respectively according to the frame characteristic such as a voiced speech and an unvoiced speech. And this paper suggest strategies to improve the speech quality of the very low bit rate speech coder.

  • PDF

Comparion of Noise Suppression Methods in Voice CODEC (음성코덱에서의 잡음제거 방식 비교)

  • Lee, Jin-Geol
    • The Journal of Engineering Research
    • /
    • v.3 no.1
    • /
    • pp.43-46
    • /
    • 1998
  • Considerable research in the last three decades has examined the problem of enhancement of speech degraded by additive background noise. We compare traditional methods such as spectral subtraction and Wiener filter, recently proposed psychoacoustic model based methods such as perceptual filter and noise suppression in EVRC in terms of performance and complexity.

  • PDF

Performance Comparison of AMR Codec Mode Allocations in Downlink WCDMA System (순방향 WCDMA 채널에서 AMR 음성 코덱 모드 할당방식에 대한 성능 비교)

  • Jeong, S.H.;Hong, J.W.;Lee, S.C.;Lie, C.H.
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.31 no.4
    • /
    • pp.349-357
    • /
    • 2005
  • The Adaptive Multi-Rate (AMR) speech codec is the mandatory for voice service in WCDMA systems. The AMR codec can be used efficiently to provide a balanced trade-off between the capacity and quality of voice by adjusting various service rates. In this paper, three ways of AMR mode allocation schemes on the downlink in WCDMA system are evaluated. To evaluate users satisfaction efficiently, new system performance measure and analytic models are proposed. The proposed analytic models can be applied to obtain optimal mode allocation ways while considering the system capacity and quality of voice. In numerical examples, the ways of finding optimal parameters are illustrated for the given traffic loads and the performances of three mode allocation schemes are compared.