• Title/Summary/Keyword: perceptual audio

Search Result 74, Processing Time 0.034 seconds

Bandwidth Expansion Method Using Spline Codebook Based Spectral Folding (Spline 코드북 기반의 spectral folding을 이용한 대역폭 확장 방법)

  • Park, Ji-Hoon;Han, Seung-Ho;Yang, Hee-Sik;Jeong, Sang-Bae;Hahn, Min-Soo
    • Proceedings of the KSPS conference
    • /
    • 2006.11a
    • /
    • pp.131-134
    • /
    • 2006
  • Quality of narrowband speech $(0{\sim}4kHz)$ can be enhanced by the bandwidth expansion technique, by which the high- band components are estimated. This paper proposes the bandwidth expansion method using the spline codebook based spectral folding. For the performance evaluation, the PESQ(Perceptual Evaluation of Speech Quality) scores are measured as the objective measurement In addition, the MOS (Mean Opinion Score) and the preference tests are performed as the subjective measurement. The results show our proposed method outperforms the existing spline based one.

  • PDF

An Enhancement of the MPEG-2 Audio Encoder Using General DSPs (범용 DSP를 이용한 MPEG-2 오디오 부호화기의 성능 개선)

  • 오현오;김성윤;윤대희;차일환;이준용
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 1997.11a
    • /
    • pp.63-67
    • /
    • 1997
  • The ISO(International Standard Organization) has standardized MPEG-2 audio. The MPEG-2 audio compression algorithm is based upon subband analysis and exploits the human auditory characteristics to achieve a low bit rate with minimum perceptual loss of audio signal quality. This thesis presents an enhanced MPEG-2 audio encoder using multiple TMS320C30 general purpose DSP's. The developed system is made up of five slave boards and one master board. Each slave board performs susband analysis psychoacoustic parameter calculation for one channel, and the master board manages bit allocation, quantization, and bit-stream formatting for all channels. Parallel processing and pipelining techniques are used in hardware structure and fast algorithms are applied in each subroutine to implement a real-time process. The implemented system supports multichannel up to 5.1 and various bitrates.

  • PDF

A Perceptual Audio Coder Based on Temporal-Spectral Structure (시간-주파수 구조에 근거한 지각적 오디오 부호화기)

  • 김기수;서호선;이준용;윤대희
    • Journal of Broadcast Engineering
    • /
    • v.1 no.1
    • /
    • pp.67-73
    • /
    • 1996
  • In general, the high quality audio coding(HQAC) has the structure of the convertional data compression techniques combined with moodels of human perception. The primary auditory characteristic applied to HQAC is the masking effect in the spectral domain. Therefore spectral techniques such as the subband coding or the transform coding are widely used[1][2]. However no effort has yet been made to apply the temporal masking effect and temporal redundancy removing method in HQAC. The audio data compression method proposed in this paper eliminates statistical and perceptual redundancies in both temporal and spectral domain. Transformed audio signal is divided into packets, which consist of 6 frames. A packet contains 1536 samples($256{\times}6$) :nd redundancies in packet reside in both temporal and spectral domain. Both redundancies are elminated at the same time in each packet. The psychoacoustic model has been improved to give more delicate results by taking into account temporal masking as well as fine spectral masking. For quantization, each packet is divided into subblocks designed to have an analogy with the nonlinear critical bands and to reflect the temporal auditory characteristics. Consequently, high quality of reconstructed audio is conserved at low bit-rates.

  • PDF

Enhanced source controlled variable bit-rate scheme in a waveform interpolation coder (Source controlled variable bit-rate scheme을 이용한 파형 보간 부호화기의 음질 개선 기법)

  • Cho, Keun-Seok;Yang, Hee-Sik;Jeong, Sang-Bae;Hahn, Min-Soo
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.315-318
    • /
    • 2007
  • This paper proposes the methods to enhance the speech quality of source controlled variable bit-rate coder based on the waveform interpolation. The methods are to estimate and generate the parameters that are not transmitted from encoder to decoder by the repetition and extrapolation schemes. For the performance evaluation, the PESQ(Perceptual Evaluation of Speech Quality) scores are measured. The experimental results shows that our proposed method outperforms the conventional source controlled variable bit-rate coder. Especially, the performance of the extrapolation method is better than that of the repetition method.

  • PDF

The Improved-Scheme of Audio Steganography using LSB Techniques (LSB 기법을 이용하는 개선된 오디오 스테가노그래피)

  • Ji, Seon-Su
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.5
    • /
    • pp.37-42
    • /
    • 2012
  • Audio steganography is quite similar to the procedure of modifying the least significant bit(LSB) of image media files. The most widely used technique today is hiding of secret messages into a digitized audio signal. In this paper, I propose a new method for hiding messages from attackers, high data inserting rate is achieved. In other words, based on the LSB hiding method and digitized to change the bit position of a secret message, an encrypted stego medium sent to the destination in safe way.

A 3D Audio-Visual Animated Agent for Expressive Conversational Question Answering

  • Martin, J.C.;Jacquemin, C.;Pointal, L.;Katz, B.
    • 한국정보컨버전스학회:학술대회논문집
    • /
    • 2008.06a
    • /
    • pp.53-56
    • /
    • 2008
  • This paper reports on the ACQA(Animated agent for Conversational Question Answering) project conducted at LIMSI. The aim is to design an expressive animated conversational agent(ACA) for conducting research along two main lines: 1/ perceptual experiments(eg perception of expressivity and 3D movements in both audio and visual channels): 2/ design of human-computer interfaces requiring head models at different resolutions and the integration of the talking head in virtual scenes. The target application of this expressive ACA is a real-time question and answer speech based system developed at LIMSI(RITEL). The architecture of the system is based on distributed modules exchanging messages through a network protocol. The main components of the system are: RITEL a question and answer system searching raw text, which is able to produce a text(the answer) and attitudinal information; this attitudinal information is then processed for delivering expressive tags; the text is converted into phoneme, viseme, and prosodic descriptions. Audio speech is generated by the LIMSI selection-concatenation text-to-speech engine. Visual speech is using MPEG4 keypoint-based animation, and is rendered in real-time by Virtual Choreographer (VirChor), a GPU-based 3D engine. Finally, visual and audio speech is played in a 3D audio and visual scene. The project also puts a lot of effort for realistic visual and audio 3D rendering. A new model of phoneme-dependant human radiation patterns is included in the speech synthesis system, so that the ACA can move in the virtual scene with realistic 3D visual and audio rendering.

  • PDF

Implementation of a Person Tracking Based Multi-channel Audio Panning System for Multi-view Broadcasting Services (다시점 방송 서비스를 위한 사용자 위치추적 기반 다채널 오디오 패닝 시스템 구현)

  • Kim, Yong-Guk;Yang, Jong-Yeol;Lee, Young-Han;Kim, Hong-Kook
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.150-157
    • /
    • 2009
  • In this paper, we propose a person tracking based multi-channel audio panning system for multi-view broadcasting services. Multi-view broadcasting is to render the video sequences that are captured from a set of cameras based on different viewpoints, and multi-channel audio panning techniques are necessary for audio rendering in these services. In order to apply such a realistic audio technique to this multi-view broadcasting service, person tracking techniques which are to estimate the position of users are also necessary. For these reasons, proposed methods are composed of two parts. The first part is a person tracking method by using ultrasonic satellites and receiver. We could obtain user's coordinates of high resolution and short duration about 10 mm and 150 ms. The second part is MPEG Surround parameter-based multi-channel audio panning method. It is a method to obtain panned multi-channel audio by controlling the MPEG Surround spatial parameters. A MUSHRA test is conducted to objectively evaluate the perceptual quality and measure localization performance using a dummy head. From the experiments, it is shown that the proposed method provides better perceptual quality and localization performance than the conventional parameter-based audio panning method. In addition, we implement the prototype of person tracking based multi-view broadcasting system by integrating proposed methods with multi-view display system.

  • PDF

A Correlation Study between Acoustic and Perceptual Parameters of the Singing Voice in Singing Students (성악 전공 학생의 가창 시 음성의 음향학적 매개 변수와 지각적 매개 변수사이의 상관 연구)

  • Jo, Sung-Mi;Lee, Sang-Ouk;Jeong, Ok-Ran
    • Proceedings of the KSPS conference
    • /
    • 2004.05a
    • /
    • pp.219-222
    • /
    • 2004
  • The purpose of this study was to determine a correlation between acoustic and perceptual parameters of the singing voice in singing students and compare them with the results with previous studies, and a more sensitive parameters in analyzing professional vocal usage. This study measured acoustic and perceptual parameters in 41 singing students. Digital audio recordings were made in sung vowels acoustic analysis. Each sample was judged by 1 experienced singing teacher and 1 voice pathologist on two semantic bipolar 7-point scales (ringing-dull, rich-thin). The results showed that SPP1 (p<0.01), SPP2 (p<0.01), and P1(p<0.01) had significant correlations with ringing and richness quality.

  • PDF

A Quality Improvement of MP3-Coded Audios Using Bandwidth Extension (대역 확장을 통한 MP3 오디오의 음질 향상)

  • Heo, So-Young;Kim, Rin-Chul
    • Journal of Broadcast Engineering
    • /
    • v.13 no.5
    • /
    • pp.744-751
    • /
    • 2008
  • In this paper, we investigate methods to enhance the perceptual quality of MP3-coded audios. Based on the high frequency reconstruction method by Liu, in the proposed method, we determine adaptively the starting point of high frequency reconstruction. We also present an improved linear estimation method. For high frequency component generation, we compare two methods. One is a replication of low-frequency components and the other is an insertion of additive white Gaussian noise signals. Through subjective tests, we shall show that the proposed method can improve the perceptual quality of MP3-coded audio.

Quality Assessment and Predistortion Evaluation of the Multi-channel Audio Codec according to the bitrate changing (압축율 변화에 따른 멀티채널 오디오의 품질 및 Predistortion 의 영향 평가)

  • Cha, Kyung-Hwan;Jang, Dae-Young;Kim, Sung-Han;Kim, Chun-Duck
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.2
    • /
    • pp.55-60
    • /
    • 1996
  • This paper describes the subjective assessment of the multi-channel audio quality according to the bitrate changing and evaluates the predistortion effect to avoid the unmasked noise after matrixing/dematrxing process in transmission and regeneration of the multi-channel audio. The simulation is processed by the perceptual coding that is MPEG-2 Audio layer II algorithm. We evaluate the quality improvement about predistortion using or not by 384, 320, 256, 128kbps. As the result of the double blind subjective assessment, 5 Grade-Impairment Scale is scored under minus one to 320kbps and so audio quality is evaluated to be perceptible, but not annoying in 3/2 channel. The effect of the predistortion is improved one level in 128kbps and especially speech test material I better improved than music test materials.

  • PDF