• Title/Summary/Keyword: perceptual audio

Search Result 74, Processing Time 0.024 seconds

Robust Audio Identification Using Spectro-Temporal Subband Centroids (부밴드 스펙트럼의 무게중심을 이용한 강인한 오디오 인식기)

  • Seo, Jin-Soo;Lee, Seung-Jae
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.5
    • /
    • pp.239-243
    • /
    • 2008
  • This paper proposes a new audio identification method based on a combination of the instantaneous and dynamic spectral features of the audio spectrum. Especially we propose the spectro-temporal subband centroids that are easy to compute and effective to summarize the instantaneous and dynamic spectral variations. Experimental results demonstrate that the identification performance can be greatly improved by combining both the spectral and the temporal subband centroids.

Performance Evaluation of Frame Erasure Concealment Algorithms in VoIP Coders (VoIP 코더들의 프레임손실은닉 알고리즘 성능평가)

  • Han, Seung-Ho;Moon, Kwang;Han, Min-Soo
    • Proceedings of the KSPS conference
    • /
    • 2004.05a
    • /
    • pp.235-238
    • /
    • 2004
  • Frame erasures cause speech quality degradation in wireless communication networks or packet networks. The degradation becomes worse when consecutive frame erasures occur. Speech coders have a frame erasure concealment(FEC) mechanism to compensate for frame erasures. It is meaningful to evaluate the performance of FEC mechanisms for frame erasures that occur in communications networks. In this paper, various frame erasures are designed. And the FEC algorithms of speech coders are evaluated and analyzed with the Perceptual Evaluation of Speech Quality(PESQ). It is found that the performances vary in accordance with frame erasure types, frame erasure rates, and utterance lengths.

  • PDF

A Lossless and Lossy Audio Compression using Prediction Model and Wavelet Transform

  • Park, Se-Yil;Park, Se-Hyoung;Lim, Dae-Sik;Jaeho Shin
    • Proceedings of the IEEK Conference
    • /
    • 2002.07c
    • /
    • pp.2063-2066
    • /
    • 2002
  • In this paper, we propose a structure far lossless audio coding method. Prediction model is used in the wavelet transform domain. After DWT, wavelet coefficients is quantized and decorrelated by prediction modeling. The DWT can be constructed to critical bands. We can get a lower data rate representation of audio signal which has a good quality like the result of perceptual coding. Then the prediction errors are efficiently coded by the Golomb-coding method. The prediction coefficients are fixed for reducing the computational burden when we find prediction coefficients.

  • PDF

Performance Improvement of Perceptual Filter Using Noise Energy Control (잡음 에너지 제어를 통한 지각 필터 성능 개선)

  • Seo Joung-Kook;Cha Hyung-Tai
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.1
    • /
    • pp.43-51
    • /
    • 2005
  • In this paper, we propose an algorithm that improves a tone quality of a noisy audio signal in order to enhance a Performance of perceptual filter using noise energy control. Most of the algorithms which were proposed by the other researchers usually applied a filter using the noise energy acquired from a silent range. In this case. the improvement rate of tone quality decreases if the noise energy is changed by the magnitude or environment variation in a signal frame. But the Proposed method Provides the means to find a food estimated noise through energy control of the estimated noise which is obtained from a silent range. Also we can get the enhancement of tone qualify in low frequency band unlike other methods. To show the performance of the Proposed algorithm, various input signals which had a different signal-to-noise ratio (SNR) such as 5dB, l0dB, 15dB and 20dB were used to test the proposed algorithm. With the proposed algorithm, we could confirm the enhancement of tone quality in terms of segmental SNR (SSNR). noise-to-mask ration (NMR) and mean opinion score (MOS) test.

Audio Quality Enhancement at a Low-bit Rate Perceptual Audio Coding (저비트율로 압축된 오디오의 음질 개선 방법)

  • 서정일;서진수;홍진우;강경옥
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.6
    • /
    • pp.566-575
    • /
    • 2002
  • Low-titrate audio coding enables a number of Internet and mobile multimedia streaming service more efficiently. For the help of next-generation mobile telephone technologies and digital audio/video compression algorithm, we can enjoy the real-time multimedia contents on our mobile devices (cellular phone, PDA notebook, etc). But the limited available bandwidth of mobile communication network prohibits transmitting high-qualify AV contents. In addition, most bandwidth is assigned to transmit video contents. In this paper, we design a novel and simple method for reproducing high frequency components. The spectrum of high frequency components, which are lost by down-sampling, are modeled by the energy rate with low frequency band in Bark scale, and these values are multiplexed with conventional coded bitstream. At the decoder side, the high frequency components are reconstructed by duplicating with low frequency band spectrum at a rate of decoded energy rates. As a result of segmental SNR and MOS test, we convinced that our proposed method enhances the subjective sound quality only 10%∼20% additional bits. In addition, this proposed method can apply all kinds of frequency domain audio compression algorithms, such as MPEG-1/2, AAC, AC-3, and etc.

A Common Synthesis Filter for MPEG-2 BC/AAC Audio Using Recursive Structure (Recursive 구조를 이용한 MPEG-2 BC/AAC 오디오 공용 합성 필터)

  • 강명수;박세기;오신범;이채욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.6C
    • /
    • pp.874-882
    • /
    • 2004
  • MPEG Audio compression algorithm is the international standard for the digital compression of high quality audio using mechanism of the perceptual coding based on psychoacoustic masking. It is necessary to discuss the constraints on designing of common filter banks for MPEG-2 BC and MPEG-2 AAC decoder system, which is not Down yet, mapping audio signals from the time domain into the frequency domain. In this paper, we present an architecture of common synthesis filter whcih can be used for MPEG-2 BC and MPEG-2 AAC decoder using recursive structure. The proposed algorithm is based on recursive architecture that effectively performs common compulsion.

Enhanced Spectral Hole Substitution for Improving Speech Quality in Low Bit-Rate Audio Coding

  • Lee, Chang-Heon;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3E
    • /
    • pp.131-139
    • /
    • 2010
  • This paper proposes a novel spectral hole substitution technique for low bit-rate audio coding. The spectral holes frequently occurring in relatively weak energy bands due to zero bit quantization result in severe quality degradation, especially for harmonic signals such as speech vowels. The enhanced aacPlus (EAAC) audio codec artificially adjusts the minimum signal-to-mask ratio (SMR) to reduce the number of spectral holes, but it still produces noisy sound. The proposed method selectively predicts the spectral shapes of hole bands using either intra-band correlation, i.e. harmonically related coefficients nearby or inter-band correlation, i.e. previous frames. For the bands that have low prediction gain, only the energy term is quantized and spectral shapes are replaced by pseudo random values in the decoding stage. To minimize perceptual distortion caused by spectral mismatching, the criterion of the just noticeable level difference (JNLD) and spectral similarity between original and predicted shapes are adopted for quantizing the energy term. Simulation results show that the proposed method implemented into the EAAC baseline coder significantly improves speech quality at low bit-rates while keeping equivalent quality for mixed and music contents.

Performance Analysis of Audio Data Hiding Method based on Phase Information with Various Window Length (주파수 변환의 길이에 따른 위상 기반 오디오 정보 은닉 기술의 음질 및 성능 분석)

  • Cho, Kiho;Kim, Nam Soo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.12
    • /
    • pp.232-237
    • /
    • 2013
  • The role of the window length of time-frequency transformation is important for the audio data hiding methods utilizing phase information. In this paper, the experiments for our audio data hiding method were conducted in order to evaluate the audio quality and robustness against reverberant environment. The experimental results showed the tendency that the worse audio quality but better robustness were obtained when the lengthy window was applied. The important reason for quality degradation was pre-echo which flatters the percussive sound. The results also indicated that the wireless communication theory related to the length of time-frequency transform can be applied in the field of audio data hiding and acoustic data transmission.

Robust Music Identification Using Long-Term Dynamic Modulation Spectrum

  • Kim, Hyoung-Gook;Eom, Ki-Wan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.2E
    • /
    • pp.69-73
    • /
    • 2006
  • In this paper, we propose a robust music audio fingerprinting system for automatic music retrieval. The fingerprint feature is extracted from the long-term dynamic modulation spectrum (LDMS) estimation in the perceptual compressed domain. The major advantage of this feature is its significant robustness against severe background noise from the street and cars. Further the fast searching is performed by looking up hash table with 32-bit hash values. The hash value bits are quantized from the logarithmic scale modulation frequency coefficients. Experiments illustrate that the LDMS fingerprint has advantages of high scalability, robustness and small fingerprint size. Moreover, the performance is improved remarkably under the severe recording-noise conditions compared with other power spectrum-based robust fingerprints.

MDCT/IMDCT (MPEG 오디오 신호처리를 위한 MDCT/IMDCT의 FPGA 구현)

  • 노진수;이강현
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2003.05b
    • /
    • pp.69-73
    • /
    • 2003
  • 음향압축에 있어서 인간의 청각신경의 특성을 이용하는 방식이 사용되고 있다. 이러한 방법은 심리음향모델(psychoacustical model)에서 도입되었다. 음향압축에서는 이러한 심리음향모델을 사용하여 인간이 지각할 수 없는 한도 내에서 부호화하지 않는 지각음향부호화(perceptual audio coding)사용한다. 지각음향부호화는 분석필터와 합성필터로 각각 부호화 복호화하는데 이것은 필터뱅크(filter bank)로 구현된 서브밴드코더(subband coder) 이다. 본 논문에서는 분석필터와 합성필터에 사용되는 MDCT(Modified Discrete Cosine Transform)와 IMDCT(Inverse Modified Discrete Cosine Transform)를 FPGA에 구현하였다.

  • PDF