• Title/Summary/Keyword: 음성 신호 압축

Search Result 91, Processing Time 0.036 seconds

Design of Voice processing module Using RTP in VoIP system (SIP기반의 VoIP시스템에서 RTP를 이용한 Voice 처리 모듈의 개발)

  • 윤원동;백은경;박일규;최양희
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04a
    • /
    • pp.292-294
    • /
    • 2001
  • VoIP(Voice over IP) system은 현재 크게 2가지 형태로 진행되어가고 있다. 첫 번째는 H.323을 이용한 방법이고, 두 번째는 SIP(Session Initiation Protocol)를 이용한 방법이다. H.323은 실제 데이터를 전송하기전 호처리에 많은 signaling이 이루어지는 관계로 SIP보다 많은 RTT(Round Trip Time)를 소모하게 된다. 따라서 매우 복잡하고, LAN환경을 바탕으로 만들어서 확장성면에서도 여러 문제점을 가지고 있다. 그래서 본 논문은 호처리는 SIP를 이용하고, 실제 음성전송은 RTP(Real-Time Transport Protocol)와 RTCP(RTP Control Protocol)를 이용하는 시스템 구현을 제시한다. RTP는 실시간 특성을 가지는 데이터에 대해서 종단간 전송 서비스를 제공해주는 프로토콜로, 어떠한 인코딩에도 적합한 프레임워크를 제공한다. 그런데, RTP는 완전한 하나의 프로토콜이 되기 위해서는 RTP와 페이로드 포맷이 함께 제공되어야 하므로, 구현시스템은 음성신호를 PCM(Pulse Code Modulation), ADPCM(Adaptive Differential PCM)등의 여러 압축기술을 이용하여 파일을 생성하여 실시간으로 RTP와 RTCP를 이용하여 전송하는 방법을 제시한다.

The Seismic Multipulse Deconvolution (다중펄스 방법을 이용한 디컨벌루션)

  • Shon, Howoong
    • Economic and Environmental Geology
    • /
    • v.28 no.5
    • /
    • pp.487-491
    • /
    • 1995
  • The multipulse model of linear predictive coding (LPC), which has been successfully used for compressing of speech signals into an impulse excitation, is here applied to seismic data which contains multiples. Multiples are happened by successive reflection between layers and make the seismic interpretation difficult In this paper, the author applied the enhanced multipulse method to seismic traces to compress source-wavelets into spikes, and to eliminate/reduce multiples. The enhanced multipulse method which was applied to seismic traces extracted the amplitudes and locations of reflectivity function, which depicts the subsurface configuration, by iterative computation of autoregressive (AR) estimation method.

  • PDF

Two-Channel Multiwavelet Transform and Pre/Post-Filtering for Image Compression (영상 데이터 압축을 위한 2-채널 멀티웨이브렛 변환과 전후처리 필터의 적용)

  • Heo, Ung;Choi, Jae-Ho
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.5
    • /
    • pp.737-746
    • /
    • 2004
  • Two-channel multiwavelet system is investigated for image compression application in this paper. Generally, multiwavelets are known for their superb capability of compressing non-stationary signals like voice. However, multivavelet system have a critical problem in processing and compressing image data due to mesh-grid visual artifacts. In our two-channel multiwavelet system we have investigated incorporation of pre and post filtering to the multiwavelet transform and compression system for alleviating those ingerent visual artifacts due to multiwavelet effect. In addition, to quantify the image data compression performance of proposed multiwavelet system, computer simulations have been performed using various image data. For bit allocation and quantization, the Lagrange multiplier technique considering data rate vs. distortion rate along with a nonlinear companding method are applied equallly to all systems considered, here. The simulation results have yielded 1 ~ 2 dB compression enhancement over the scalar savelet systems. If the more advanced compression methods like SPIHT and run-length channel coding were adopted for the proposed multiwavelet system, a much higher compression gain could be obtained.

  • PDF

Real-time Implementation or AMR-WB Speech Coder Using TMS320C5509 DSP (TMS320C5509 DSP를 이용한 AMR-WB 음성부호화기의 실시간 구현)

  • Choi Song-ln;Jee Deock-Gu
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.1
    • /
    • pp.52-57
    • /
    • 2005
  • The adaptive multirate wideband (AMR-WB) speech coder has an extended audio bandwidth from 50 Hz to 7 kBz and operates on nine speech coding bit-rates from 6.6 to 23.85 kbit/s. In this Paper, we present the real-time implementation of AMR-WB speech coder using 16bit fixed-point TMS320C5509 that has dual MAC units. Firstly, We implemented AMR-WB speech coder in C 1anguage level using intrinsics, and then performed optimization in assembly language. The computational complexity of the implemented AMR-WB coder at 23.85 kbit/s is 42.9 Mclocks. And this coder needs the program memory of 15.1 kwords, data ROM of 9.2 kwords and data RAM of 13.9 kwords.

Sound Enhancement of low Sample rate Audio Using LMS in DWT Domain (DWT영역에서 LMS를 이용한 저 샘플링 비율 오디오 신호의 음질 향상)

  • 백수진;윤원중;박규식
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.1
    • /
    • pp.54-60
    • /
    • 2004
  • In order to mitigate the problems in storage space and network bandwidth for the full CD quality audio, current digital audio is always restricted by sampling rate and bandwidth. This restriction normally results in low sample rate audio or calls for the data compression scheme such as MP3. However, they can only reproduce a lower frequency range than a regular CD quality because of the Nyquist sampling theory. Consequently they lose rich spatial information embedded in high frequency. The propose of this paper is to propose efficient high frequency enhancement of low sample rate audio using n adaptive filtering and DWT analysis and synthesis. The proposed algorithm uses the LMS adaptive algorithm to estimate the missing high frequency contents in DWT domain and it then reconstructs the spectrally enhanced audio by using the DWT synthesis procedure. Several experiments with real speech and audio are performed and compared with other algorithm. From the experimental results of spectrogram and sonic test, we confirm that the proposed algorithm outperforms the other algorithm and reasonably works well for the most of audio cases.

A Study on the Improvements of the Speech Quality by using Distribution Characteristics of LSP parameters in the EVRC(Enhanced Variable Rate Codec) (LSP 파라미터의 분포특성을 이용한 EVRC의 음질개선에 관한 연구)

  • Min, So-Yeon;Na, Deok-Su
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.12
    • /
    • pp.5843-5848
    • /
    • 2011
  • To improve the efficiency of the channel spectrum and to reduce the power consumption of the system in EVRC, the voice signal is compressed and transmitted only when the user speaks to. In addition to this, voice frames are divided into three rates 1, 1/2 and 1/8 and each frame is handled differently. For example, we assumed that the input is silence region if the 1/8 rate is used. In this paper, the sections are firstly separated into the voiced speech signal region, unvoiced speech signal region, and silence region by using distribution characteristics of LSP parameters. Then the paper suggested to encode 1 rate for the voiced speech signal, 1/2 rate for the unvoiced speech signal region, 1/8 rate for the silence region. In other words, traditional way of transmission is used when sending full rate in the EVRC. However, when sending half rate, the voice is firstly distinguished between voiced and unvoiced. If the voice is distinguished as voiced, voice is converted into full rate before the transmission. If it is distinguished as silence, EVRC's basic rate is applied. In the experimental results with SNR, ASDM, transmission bit rate measurement, we have demonstrated that voice quality was improved by using the proposed algorithm.

A Fast Normalized Cross-Correlation Computation for WSOLA-based Speech Time-Scale Modification (WSOLA 기반의 음성 시간축 변환을 위한 고속의 정규상호상관도 계산)

  • Lim, Sangjun;Kim, Hyung Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.7
    • /
    • pp.427-434
    • /
    • 2012
  • The overlap-add technique based on waveform similarity (WSOLA) method is known to be an efficient high-quality algorithm for time scaling of speech signal. The computational load of WSOLA is concentrated on the repeated normalized cross-correlation (NCC) calculation to evaluate the similarity between two signal waveforms. To reduce the computational complexity of WSOLA, this paper proposes a fast NCC computation method, in which NCC is obtained through pre-calculated sum tables to eliminate redundancy of repeated NCC calculations in the adjacent regions. While the denominator part of NCC has much redundancy irrespective of the time-scale factor, the numerator part of NCC has less redundancy and the amount of redundancy is dependent on both the time-scale factor and optimal shift value, thereby requiring more sophisticated algorithm for fast computation. The simulation results show that the proposed method reduces about 40%, 47% and 52% of the WSOLA execution time for the time-scale compression, 2 and 3 times time-scale expansions, respectively, while maintaining exactly the same speech quality of the conventional WSOLA.

A New MPEG Reference Model for Unified Speech and Audio Coding (통합 음성/오디오 부호화를 위한 새로운 MPEG 참조 모델)

  • Song, Jeong-Ook;Oh, Hyen-O;Kang, Hong-Goo
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.5
    • /
    • pp.74-80
    • /
    • 2010
  • Speech and audio codecs have been developed based on different type of coding technologies since they have different characteristics of signal and applications. In harmony with a convergence between broadcasting and telecommunication system, international organizations for standardization such as 3GPP and ISO/IEC MPEG have tried to compress and transmit multimedia signals using unified codecs. MPEG recently initiated an activity to standardize the USAC (Unified speech and audio coding). However, USAC RM (Reference model) software has been problematic since it has a complex hierarchy, many useless source codes and poor quality of the encoder. To solve these problems, this paper introduces a new RM software designed with an open source paradigm. It was presented at the MPEG meeting in April, 2010 and the source code was released in June.

Optimize the Acoustic Environment Using a Sound Masking Effects of the Audio Signal Compression Principle (음성신호의 압축원리를 이용한 사운드 마스킹 효과로 음향 환경 최적화)

  • Ann, Sook-Hyang
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.28 no.11
    • /
    • pp.748-751
    • /
    • 2015
  • Sound Masking System technology as by sound the same on all bands and artificially generates a constant sound shield People want to hear or recognize the people with the noise generated from the interior of the way. Prevent hearing or prevent recognition by using the technology to control the audible frequency band Continue to emit constant and uniform shielding sound audible frequency band Even the security content of speech (20 Hz~20 KHz). That interception laser eavesdropping, internal solicitations, during recording Or delay the decoding was a result of the effect of interference calculated Experience noise disturbance index is applied around the Stress Index is the average index is 10.16 was a luxury for the average index is then applied to the index 3.07 Noise is significantly lower stress level has improved noise conditions.

The Study of the Sensorineural Hearing Loss Compensation Algorithm using Psychoacoustics Model (심리음향모델을 적용한 난청 보정 알고리즘의 연구)

  • 노형철;김헌중;한헌수;차형태
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.189-192
    • /
    • 2000
  • 본 논문에서는 청각 장애인의 보다 향상된 보청 환경을 조성하고자 청각손실을 심리음향 모델을 적용하여 감음 신경성 난청을 보정하는 알고리즘을 제안한다. 제안한 알고리즘에서는 난청의 유형은 내이에서부터 중추 뇌에 걸친 감음계와 신경계의 장애에서 비롯되는 감음신경성 난청(sensorineural hearing loss)으로 주파수 영역상에서 MTH(minimum hearing threshold)가 균일하지 않게 상승하게되어 가청영역이 좁아지는 문제점을 해결하기 위한 방법으로 각각의 주파수 밴드마다 멀티밴드 압축 알고리즘을 적용하였다. 그러나 이 경우 각각의 주파수 밴드에 따른 서로 다른 가청 영역의 영향에 의한 변형된 스펙트럼 모양으로 인해 spectral contrast reduction과 변형된 마스킹 특성으로 인해 음성 변별력에 제한을 가하게 된다. 이것은 주변 주파수 성분들에 의한 마스킹 효과에 의한 것으로, 신호에 대한 난청인이 느끼는 지각 영역(perceptual domain)에서의 해석과 심리음향 모델 파라미터를 통한 보청기의 개발이 이루어져야 하며, 본 논문에서 그 알고리즘을 적용하였다.

  • PDF