• Title/Summary/Keyword: speech codec

Search Result 128, Processing Time 0.022 seconds

Implementation of Speaker Independent Speech Recognizer in Noise Environment based on DSP (DSP기반의 잡음환경에 강인한 화자 독립 음성 인식기 구현)

  • 박진영;권호민;박정원;김창근;허강인
    • Proceedings of the IEEK Conference
    • /
    • 2003.11a
    • /
    • pp.69-72
    • /
    • 2003
  • 본 논문에서는 범용 DSP를 이용한 잡음환경에 강인한 음성인식 시스템을 구현하였다. 구현된 시스템은 TI사의 범용 DSP인 TMS320C32를 이용하였고, 실시간 음성 입력을 위한 음성 Codec과 외부 인터페이스를 확장하여 인식결과를 출력하도록 구성하였다. 또한, 기존의 음성 인식 시스템에 사용한 파라메터에 대한 고찰과 ICA를 이용하여 잡음 환경에 강인한 음성 특징 파라메터를 제안하고 성능 비교 실험을 하였다. 제안된 ICA 파라메터를 적용하여 음성인식 시스템을 구현하였다. 그리고, 독립적으로 동작 가능한 음성인식 시스템의 응용 예로 무선자동차에 적용시켜 실험했다.

  • PDF

Development of G.723.1 Speech Codec Using a Fixed-point DSP(ADSP-2181) (ADSP-2181 DSP를 이용한 G.723.1 음성부호화기 개발)

  • 박정재
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.08a
    • /
    • pp.121-126
    • /
    • 1998
  • 고정 소수점 DSP 인 analog devices 사의 ADSP-2181을 이용하여 실시간 G.723.1 음성부호화기를 개발한 사례이다. G.723.1은 ITU에서 개발한 세계 표준 음성 부호화기로 낮은 전송율에서 고음질을 얻을 수 있다. 본 논문에서는 고정 소수점 DSP를 이용하여 부호화기를 갭라하는데 필요한 사항들을 제시하였다. 먼저 1절에서는 DAM성 부호화기의 특성에 대한 개괄을 설명하고, 2절에서는 G.723.1 부호화기의 특징을, 3절에서는 고정소수점 DSP를 이용하여 개발하는 과정을, 4절에서는 구현결과를 분석하였으며, 마지막으로 5절에서 결론을 맺는다.

  • PDF

The Implementation of Smartphone Application servicing HD(High Definition)-Voice (HD 음성 서비스를 제공하는 스마트폰 어플리케이션의 구현)

  • Choi, Seung-Han;Kim, Do-Young;Seo, Chang-Ho
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.23 no.4
    • /
    • pp.609-615
    • /
    • 2013
  • This paper represents the development of the HD-Voice application with G.711.1 coder-the latest wideband codec standard from ITU-T-for smartphone based on android platform. The work also includes the structure of the HD-voice application and the result of speech quality of HD-Voice application with G.711.1 coder. The paper shows that the speech quality of HD-Voice application with G.711.1 coder is excellent.

Design of pitch parameter search architecture for a speech coder using dual MACs (Dual MAC을 이용한 음성 부호화기용 피치 매개변수 검색 구조 설계)

  • 박주현;심재술;김영민
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.33A no.5
    • /
    • pp.172-179
    • /
    • 1996
  • In the paper, QCELP (qualcomm code excited linear predictive), CDMA (code division multiple access)'s vocoder algorithm, was analyzed. And then, a ptich parameter seaarch architecture for 16-bit programmable DSP(digital signal processor) for QCELP was designed. Because we speed up the parameter search through high speed DSP using two MACs, we can satisfy speech codec specifiction for the digital celluar. Also, we implemented in FIFO(first-in first-out) memory using register file to increase the access time of data. This DSP was designed using COMPASS, ASIC design tool, by top-down design methodology. Therefore, it is possible to cope with rapid change at mobile communication market.

  • PDF

Improvement of Packet Loss Concealment Algorithm by Using state gain control and fixed codebook estimation (상태별 이득 제어 및 fixed codebook estimation을 이용한 G.729에서의 Packet Loss Concealment 알고리즘 개선)

  • Moon Kwang;Hahn Minsoo
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.109-112
    • /
    • 2003
  • In real time packetized voice applications, missing frames is a major source of voice quality degradation. Thus packet loss concealment(PLC) algorithms are needed to guarantee the QoS of the VoIP. Still current speech codecs for VoIP work poor when consecutive packet losses are issued. In this paper, we proposed a new PLC algorithm for the G.729 codec. Our algorithm works better especially when the consecutive packet loss occurs mainly because it adopts an adaptive gain controller utilizing the number of missing packet information combined with a fixed codebook vector estimation algorithm and LPC bandwidth expansion.

  • PDF

Robust Speech Decoding Using Channel-Adaptive Parameter Estimation.

  • Lee, Yun-Keun;Lee, Hwang-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.1E
    • /
    • pp.3-6
    • /
    • 1999
  • In digital mobile communication system, the transmission errors affect the quality of output speech seriously. There are many error concealment techniques using a posteriori probability which provides information about any transmitted parameter. They need knowledge about channel transition probability as well as the 1st order Markov transition probability of codec parameters for estimation of transmitted parameters. However, in applications of mobile communication systems, the channel transition probability varies depending on nonstationary channel characteristics. The mismatch of designed channel transition probability of the estimator to actual channel transition probability degrades the performance of the estimator. In this paper, we proposed a new parameter estimator which adapts to the channel characteristics using short time average of maximum a posteriori probabilities(MAPs). The proposed scheme, when applied to the LSP parameter estimation, performed better than the conventional estimator which do not adapt to the channel characteristics.

  • PDF

Adaptive TCX Windowing Technology for Unified Structure MPEG-D USAC

  • Lee, Tae-Jin;Beack, Seung-Kwon;Kang, Kyeong-Ok;Kim, Whan-Woo
    • ETRI Journal
    • /
    • v.34 no.3
    • /
    • pp.474-477
    • /
    • 2012
  • The MPEG-D unified speech and audio coding (USAC) standardization process was initiated by MPEG to develop an audio codec that is able to provide consistent quality for mixed speech and music contents. The current USAC reference model structure consists of frequency domain (FD) and linear prediction domain (LPD) core modules and is controlled using a signal classifier tool. In this letter, we propose an LPD single-mode USAC structure using an adaptive widowing-based transform-coded excitation module. We tested our system using official test items for all mono-evaluation modes. The results of the experiment show that the objective and subjective performances of the proposed single-mode USAC system are better than those of the FD/LPD dual-mode USAC system.

Coding History Detection of Speech Signal using Deep Neural Network (심층 신경망을 이용한 음성 신호의 부호화 이력 검출)

  • Cho, Hyo-Jin;Jang, Won;Shin, Seong-Hyeon;Park, Hochong
    • Journal of Broadcast Engineering
    • /
    • v.23 no.1
    • /
    • pp.86-92
    • /
    • 2018
  • In this paper, we propose a method for coding history detection of digital speech signal. In digital speech communication and storage, the signal is encoded to reduce the number of bits. Therefore, when a speech signal waveform is given, we need to detect its coding history so that we can determine whether the signal is an original or an coded one, and if coded, determine the number of times of coding. In this paper, we propose a coding history detection method for 12.2kbps AMR codec in terms of original, single coding, and double coding. The proposed method extracts a speech-specific feature vector from the given speech, and models the feature vector using a deep neural network. We confirm that the proposed feature vector provides better performance in coding history detection than the feature vector computed from the general spectrogram.

A Study on Design and Implementation of Speech Recognition System Using ART2 Algorithm

  • Kim, Joeng Hoon;Kim, Dong Han;Jang, Won Il;Lee, Sang Bae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.2
    • /
    • pp.149-154
    • /
    • 2004
  • In this research, we selected the speech recognition to implement the electric wheelchair system as a method to control it by only using the speech and used DTW (Dynamic Time Warping), which is speaker-dependent and has a relatively high recognition rate among the speech recognitions. However, it has to have small memory and fast process speed performance under consideration of real-time. Thus, we introduced VQ (Vector Quantization) which is widely used as a compression algorithm of speaker-independent recognition, to secure fast recognition and small memory. However, we found that the recognition rate decreased after using VQ. To improve the recognition rate, we applied ART2 (Adaptive Reason Theory 2) algorithm as a post-process algorithm to obtain about 5% recognition rate improvement. To utilize ART2, we have to apply an error range. In case that the subtraction of the first distance from the second distance for each distance obtained to apply DTW is 20 or more, the error range is applied. Likewise, ART2 was applied and we could obtain fast process and high recognition rate. Moreover, since this system is a moving object, the system should be implemented as an embedded one. Thus, we selected TMS320C32 chip, which can process significantly many calculations relatively fast, to implement the embedded system. Considering that the memory is speech, we used 128kbyte-RAM and 64kbyte ROM to save large amount of data. In case of speech input, we used 16-bit stereo audio codec, securing relatively accurate data through high resolution capacity.

An Efficient Transcoding Algorithm For G.723.1 and EVRC Speech Coders (G.723.1 음성부호화기와 EVRC 음성부호화기의 상호 부호화 알고리듬)

  • 김경태;정성교;윤성완;박영철;윤대희;최용수;강태익
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.5C
    • /
    • pp.548-554
    • /
    • 2003
  • Interoperability is ole the most important factors for a successful integration of the speech network. To accomplish communication between endpoints employing different speech coders, decoder and encoder of each endpoint coder should be placed in tandem. However, tandem coder often produces problems such as poor speech quality, high computational load, and additional transmission delay. In this paper, we propose an efficient transcoding algorithm that can provide interoperability to the networks employing ITU-T G.723.1[1]and TIA IS-127 EVRC[2]speech coders. The proposed transcoding algorithm is composed of four parts: LSP conversion, open-loop pitch conversion, fast adaptive codebook search, and fast fixed codebook search. Subjective and objective quality evaluation confirmed that the speech quality produced by the proposed transcoding algorithm was equivalent to, or better than the tandem coding, while it had shorter processing delay and less computational complexity, which is certified implementing on TMS320C62x.