• Title/Summary/Keyword: Speech Code

Search Result 118, Processing Time 0.03 seconds

VQ Codebook Index Interpolation Method for Frame Erasure Recovery of CELP Coders in VoIP

  • Lim Jeongseok;Yang Hae Yong;Lee Kyung Hoon;Park Sang Kyu
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.9C
    • /
    • pp.877-886
    • /
    • 2005
  • Various frame recovery algorithms have been suggested to overcome the communication quality degradation problem due to Internet-typical impairments on Voice over IP(VoIP) communications. In this paper, we propose a new receiver-based recovery method which is able to enhance recovered speech quality with almost free computational cost and without an additional increment of delay and bandwidth consumption. Most conventional recovery algorithms try to recover the lost or erroneous speech frames by reconstructing missing coefficients or speech signal during speech decoding process. Thus they eventually need to modify the decoder software. The proposed frame recovery algorithm tries to reconstruct the missing frame itself, and does not require the computational burden of modifying the decoder. In the proposed scheme, the Vector Quantization(VQ) codebook indices of the erased frame are directly estimated by referring the pre-computed VQ Codebook Index Interpolation Tables(VCIIT) using the VQ indices from the adjacent(previous and next) frames. We applied the proposed scheme to the ITU-T G.723.1 speech coder and found that it improved reconstructed speech quality and outperforms conventional G.723.1 loss recovery algorithm. Moreover, the suggested simple scheme can be easily applicable to practical VoIP systems because it requires a very small amount of additional computational cost and memory space.

A Training Algorithm for the Transform Trellis Code with Applications to Stationary Gaussian Sources and Speech (정상 가우시안 소오스와 음성 신호용 변환 격자 코드에 대한 훈련 알고리즘 개발)

  • Kim, Dong-Youn;Park, Yong-Seo;Whang, Keum-Chan;Pearlman, William A.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.1
    • /
    • pp.22-34
    • /
    • 1992
  • There exists a transform trellis code that is optimal for stationary Gaussian sources and the squared-error distortion measure at all rates. In this paper, we train an asymptotically optimal version of such a code to obtain one which is matched better to the statistics of real world data. The training algorithm uses the M algorithm to search the trellis codebook and the LBG algorithm to update the trellis codebook. We investigate the trained transform trellis coding scheme for the first-order AR(autoregressive) Gaussian source whose correlation coefficient is 0.9 and actual speech sentences. For the first-order AR source, the achieved SNR for the test sequence is from 0.6 to 1.4 dB less than the maximum achievable SNR as given by Shannon's rate-distortion function for this source, depending on the rate and surpasses all previous known results for this source. For actual speech data, to achieve improved performance, we use window functions and gain adaptation at rate 1.0 bits/sample.

  • PDF

Real-time Implementation of a GSM-EFR Speech Coder on a 16 Bit Fixed-point DSP (16 비트 고정 소수점 DSP를 이용한 GSM-EFR 음성 부호화기의 실시간 구현)

  • 최민석;변경진;김경수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.7
    • /
    • pp.42-47
    • /
    • 2000
  • This paper describes a real-time implementation of a GSM-EFR (Global System for Mobil communications Enhanced Full Rate) speech coder using OakDSP core; a 16bit fixed-point Digital Signal Processor (DSP) by DSP Group, Inc. The real-time implemented speech coder required about 24MIPS for computation and 7.06K words and 12.19K words for code and data memory, respectively. The implemented GSM-EFR speech coder passes all of test vectors provided by ETSI (European Telecommunication Standard Institute), and perceptual speech quality measurement using MNB algorithm shows that the quality of the GSM-EFR speech coder is similar to the one of 32kbps ADPCM. The real-time implemented GSM-EFR speech coder which is the highest bit-rate mode of the GSM-AMR speech coder will be used as the basic structure of the GSM-AMR speech coder which is embedded in MODEM ASIC of IMT2000 asynchronous mode mobile station.

  • PDF

A New Vocoder based on AMR 7.4Kbit/s Mode for Speaker Dependent System (화자 의존 환경의 AMR 7.4Kbit/s모드에 기반한 보코더)

  • Min, Byung-Jae;Park, Dong-Chul
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.9C
    • /
    • pp.691-696
    • /
    • 2008
  • A new vocoder of Code Excited Linear Predictive (CELP) based on Adaptive Multi Rate (AMR) 7.4kbit/s mode is proposed in this paper. The proposed vocoder achieves a better compression rate in an environment of Speaker Dependent Coding System (SDSC) and is efficiently used for systems, such as OGM(Outgoing message) and TTS(Text To Speech), which needs only one person's speech. In order to enhance the compression rate of a coder, a new Line Spectral Pairs(LSP) code-book is employed by using Centroid Neural Network (CNN) algorithm. In comparison with original(traditional) AMR 7.4 Kbit/s coder, the new coder shows 27% higher compression rate while preserving synthesized speech quality in terms of Mean Opinion Score(MOS).

Implementation of HMM-Based Speech Recognizer Using TMS320C6711 DSP

  • Bae Hyojoon;Jung Sungyun;Son Jongmok;Kwon Hongseok;Kim Siho;Bae Keunsung
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.391-394
    • /
    • 2004
  • This paper focuses on the DSP implementation of an HMM-based speech recognizer that can handle several hundred words of vocabulary size as well as speaker independency. First, we develop an HMM-based speech recognition system on the PC that operates on the frame basis with parallel processing of feature extraction and Viterbi decoding to make the processing delay as small as possible. Many techniques such as linear discriminant analysis, state-based Gaussian selection, and phonetic tied mixture model are employed for reduction of computational burden and memory size. The system is then properly optimized and compiled on the TMS320C6711 DSP for real-time operation. The implemented system uses 486kbytes of memory for data and acoustic models, and 24.5kbytes for program code. Maximum required time of 29.2ms for processing a frame of 32ms of speech validates real-time operation of the implemented system.

  • PDF

Implementation of 16Kpbs ADPCM by DSK50 (DSK50을 이용한 16kbps ADPCM 구현)

  • Cho, Yun-Seok;Han, Kyong-Ho
    • Proceedings of the KIEE Conference
    • /
    • 1996.07b
    • /
    • pp.1295-1297
    • /
    • 1996
  • CCITT G.721, G.723 standard ADPCM algorithm is implemented by using TI's fixed point DSP start kit (DSK). ADPCM can be implemented on a various rates, such as 16K, 24K, 32K and 40K. The ADPCM is sample based compression technique and its complexity is not so high as the other speech compression techniques such as CELP, VSELP and GSM, etc. ADPCM is widely applicable to most of the low cost speech compression application and they are tapeless answering machine, simultaneous voice and fax modem, digital phone, etc. TMS320C50 DSP is a low cost fixed point DSP chip and C50 DSK system has an AIC (analog interface chip) which operates as a single chip A/D and D/A converter with 14 bit resolution, C50 DSP chip with on-chip memory of 10K and RS232C interface module. ADPCM C code is compiled by TI C50 C-compiler and implemented on the DSK on-chip memory. Speech signal input is converted into 14 bit linear PCM data and encoded into ADPCM data and the data is sent to PC through RS232C. The ADPCM data on PC is received by the DSK through RS232C and then decoded to generate the 14 bit linear PCM data and converted into the speech signal. The DSK system has audio in/out jack and we can input and out the speech signal.

  • PDF

A Nonlinear Regression Analysis Method for Frame Erasure Concealment in VoIP Networks (VoIP 망에서의 프레임손실은닉을 위한 비선형 회귀분석 기법)

  • Choi, Seung-Ho;Sung, Ho-Sang
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.9 no.5
    • /
    • pp.129-132
    • /
    • 2009
  • Frame erasure is one of the most difficult problems in voice over IP (VoIP) networks and is a major source of speech quality degradation. In this paper, a frame erasure concealment algorithm based on nonlinear regression analysis is presented to minimize speech quality deterioration in code-excited linear prediction (CELP) based coders. We applied the proposed scheme to the ITU-T G.729 standard and obtained improved perceptual evaluation of speech quality (PESQ) scores compared to the conventional methods.

  • PDF

A Basic Performance Evaluation of the Speech Recognition APP of Standard Language and Dialect using Google, Naver, and Daum KAKAO APIs (구글, 네이버, 다음 카카오 API 활용앱의 표준어 및 방언 음성인식 기초 성능평가)

  • Roh, Hee-Kyung;Lee, Kang-Hee
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.12
    • /
    • pp.819-829
    • /
    • 2017
  • In this paper, we describe the current state of speech recognition technology and identify the basic speech recognition technology and algorithms first, and then explain the code flow of API necessary for speech recognition technology. We use the application programming interface (API) of Google, Naver, and Daum KaKao, which have the most famous search engine among the speech recognition APIs, to create a voice recognition app in the Android studio tool. Then, we perform a speech recognition experiment on people's standard words and dialects according to gender, age, and region, and then organize the recognition rates into a table. Experiments were conducted on the Gyeongsang-do, Chungcheong-do, and Jeolla-do provinces where the degree of tongues was severe. And Comparative experiments were also conducted on standardized dialects. Based on the resultant sentences, the accuracy of the sentence is checked based on spacing of words, final consonant, postposition, and words and the number of each error is represented by a number. As a result, we aim to introduce the advantages of each API according to the speech recognition rate, and to establish a basic framework for the most efficient use.

Robust Tree Coding Combined with Harmonic Scaling of Speech at 4.8 Kbps (견실한 배음 축척과 결합된 4.8KBPS 트리 음성부호기)

  • 강상원;이인성;한경호
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.12
    • /
    • pp.1806-1814
    • /
    • 1993
  • Efficient speech coders using tree coding combined with harmonic scaling are designed at the rate of 4.8 kilobitts/sec (kbps). A time domain harmonic scaling algorithm (TDHS) is used to compress input speech by a factor of two. This process allows the tree coder have 1.5 bits/sample for 4.8 kbps in the case of a 6.4 kHz sampling rate. In the backward adaptive tree coder, there are three components of the code generator, including a hybrid adaptive quantizer, a short-term predictor and a pitch predictor. The robustness of the tree coder is achieved by carefully choosing the input of the short term predictor adaptation. Also, inclusion of a smoother in the pitch predictor improves the error performance of tree coder in the noisy channel. Subjectively, tree coding combined with TDHS provides good quality speech at 4.8 kbps.

  • PDF

A Preliminary Study on Serious Game for C Language Study of Beginners : freCman (초보자를 위한 C 언어 학습 기능성 게임 개발 사례 : 프레C맨)

  • Hwang, Kitae;Jung, Inhwan
    • Journal of Korea Game Society
    • /
    • v.15 no.4
    • /
    • pp.199-206
    • /
    • 2015
  • This paper introduces a serious game called freCman developed for C programming language beginners. Since key words, syntax, and programming structure of C programming language are unfamiliar for them, they feel uneasy and have many difficulties to study. We developed three games such as shooting star C, finding hidden errors, unscrambling C codes through which C beginners can study C language easily. Also we developed CTS(Code to Speech) which speeches C source codes like English statements so that C beginners can be familiar with C key words and statements. To prove effectiveness of the freCman, some experiments have been conducted with C language beginners. Experiment results show that the freCman helps beginners studying C programming language much.