• Title/Summary/Keyword: Audio Compression

Search Result 135, Processing Time 0.028 seconds

A study on remote video transmit technique of mobile phone (모바일폰에서의 원격 영상 전송 기술에 관한 연구)

  • Jeong, Jong-Geun;Kim, Chul-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.10
    • /
    • pp.1914-1919
    • /
    • 2006
  • Video transfer problem on mobile is transfer speed and controls. Compression technique is needed to transfer videos and H.263 codec is used for compression, effectively controls camera on remote places, increased the real time connecting users. In this paper, we could solve the problem that use existent RF, and could transfer the most suitable image and audio.

Digital Audio Watermarking Based on Psychoacoustic Model (심리음향모델 기반의 디지털 오디오 워터마킹)

  • Song, You-Su;Kim, Jong-Hwan;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.772-775
    • /
    • 2005
  • This paper describes a study on the digital watermarking algorithm which is used to confirm the copyright protection of digital audio data. The digital audio watermarking algorithm based on psychoacoustic model is used for the inaudibility of the watermark data. The psychoacoustic model which is a key algorithm in MP3 audio compression is analyzed by MATLAB simulation, and is applied to digital audio watermark insertion.

  • PDF

Low-bitrate Multichannel Audio Coding (저비트율 멀티채널 오디오 부호화)

  • Jang, Inseon;Seo, Jeongil;Beak, Seungkwon;Kang, Kyeongok
    • Journal of Broadcast Engineering
    • /
    • v.10 no.3
    • /
    • pp.328-338
    • /
    • 2005
  • Technology for compressing low-bitrate multichannel audio coding is being standardized owing to the increasing need of consumer for multichannel audio contents. In this paper we propose the sound source location cue coding (SSLCC) for extremely compressing multichannel audio to be suitable at the narrow bandwidth transmission environment. To improve the compression capability of the conventional binaural cue coding(BCC), the SSLCC adopts the virtual source location information (VSLI) as a spatial cue parameter, a symmetric uniform quantizer, and Huffman coder. The objective and subjective assessment results show that the SSLCC provides lower bitrate and better audio quality than conventional BCC method.

Study on data augmentation methods for deep neural network-based audio tagging (Deep neural network 기반 오디오 표식을 위한 데이터 증강 방법 연구)

  • Kim, Bum-Jun;Moon, Hyeongi;Park, Sung-Wook;Park, Young cheol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.37 no.6
    • /
    • pp.475-482
    • /
    • 2018
  • In this paper, we present a study on data augmentation methods for DNN (Deep Neural Network)-based audio tagging. In this system, an audio signal is converted into a mel-spectrogram and used as an input to the DNN for audio tagging. To cope with the problem associated with a small number of training data, we augment the training samples using time stretching, pitch shifting, dynamic range compression, and block mixing. In this paper, we derive optimal parameters and combinations for the augmentation methods through audio tagging simulations.

Robust Audio Fingerprinting Using Compressed-Domain Features (압축 도메인 특징을 이용한 강인한 오디오 핑거프린팅)

  • Seo, Jin-Soo;Lee, Seung-Jae
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.4
    • /
    • pp.375-382
    • /
    • 2009
  • This paper proposes a new audio fingerprinting method based on compressed-domain features. By basing on the compressed domain, the computational efficiency of the proposed method can be greatly enhanced. Especially we deal with MDCT domain, which is widely employed in audio compression, and extract three kinds of subband features; energy, centroid, and flatness. By taking signs after differentially filtering each feature, binary audio fingerprints are obtained. The identification performance of the three kinds of fingerprints are experimentally compared. Among the considered compressed-domain subband features, the subband energy showed the best performance for fingerprinting.

The Audio Watermarking method Using the MPEG-2 AAC Psychoacoustic Model (MPEG-2 AAC 심리음향 모델을 이용한 오디오 워터마킹 기법)

  • 성종수;강상구;신재호
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.716-719
    • /
    • 1999
  • In this Paper, we Present a method for embedding digital watermarks into digital audio signals. The watermarking must be imperceptible and should be robust to attacks, such as filtering and compression etc. In our method, we adaptively embedded the watermarks changing the scale factor using the spread spectrum and MPEG-2 AAC psychoacoustic model.

  • PDF

Sound Enhancement of low Sample rate Audio Using LMS in DWT Domain (DWT영역에서 LMS를 이용한 저 샘플링 비율 오디오 신호의 음질 향상)

  • 백수진;윤원중;박규식
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.1
    • /
    • pp.54-60
    • /
    • 2004
  • In order to mitigate the problems in storage space and network bandwidth for the full CD quality audio, current digital audio is always restricted by sampling rate and bandwidth. This restriction normally results in low sample rate audio or calls for the data compression scheme such as MP3. However, they can only reproduce a lower frequency range than a regular CD quality because of the Nyquist sampling theory. Consequently they lose rich spatial information embedded in high frequency. The propose of this paper is to propose efficient high frequency enhancement of low sample rate audio using n adaptive filtering and DWT analysis and synthesis. The proposed algorithm uses the LMS adaptive algorithm to estimate the missing high frequency contents in DWT domain and it then reconstructs the spectrally enhanced audio by using the DWT synthesis procedure. Several experiments with real speech and audio are performed and compared with other algorithm. From the experimental results of spectrogram and sonic test, we confirm that the proposed algorithm outperforms the other algorithm and reasonably works well for the most of audio cases.

A Study on Digital Image Watermarking for Embedding Audio Logo (음성로고 삽입을 위한 디지털 영상 워터마킹에 관한 연구)

  • Cho, Gang-Seok;Koh, Sung-Shik
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.3
    • /
    • pp.21-27
    • /
    • 2002
  • The digital watermarking methods have been proposed as a solution for solving the illegal copying and proof of ownership problems in the context of multimedia data. But it is still difficult to have been overcame the problem of the protection of property to multimedia data, such as digital images, digital video, and digital audio. This paper describes a watermarking algorithm that embeds non-linearly audio logo watermark data which is converted from audio signal of the ownership in the components of pixel intensities in an original image and that insists of ownership by hearing the audio signal transformed from the extracted audio logo through the speaker. Experimental results show that our algorithm using audio logo proposed in this paper is robust against attacks such as particularly lossy JPEG image compression. 

High Quality Audio Watermarking using Spread Spectrum and Psychoacoustic Model (대역확산과 심리음향 모델을 이용한 고음질 오디오 워터마킹)

  • Noh Jin-Soo;Rhee Kang-Hyeon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.5 s.311
    • /
    • pp.48-56
    • /
    • 2006
  • In this paper, we proposed the high quality audio watermarking algorithm using MDCT/IMDCT (Modified DCT/Inverse Modified DCT) with psychoacoustic model. Generally, a digital audio watermark is embedding the frequency domain after frequency transform of the digital audio data but the digital audio quality is affected by watermarking. In our scheme, the digital audio data is spread with PN((Pseudo Noise) code and then audio watermark is embedded in MDCT processing that refers psychoacoustic model. In MDCT processing, according to the shape of filter bank output, the block switching selects a window sequence that has 256, 1,024 or 2,048 points interval for high quality audio. The author confirm that when watermark weight ${\alpha}$ is 2.5 below, the detection ratio of watermark is a satisfied to SDMI's(Secure Digital Music Initiative) recommendation 50% above and SM is $50{\sim}68dB$ range with mainly 4 kind of attacks(Compression, Cropping, FFT and Echo).

Robust Person Identification Using Optimal Reliability in Audio-Visual Information Fusion

  • Tariquzzaman, Md.;Kim, Jin-Young;Na, Seung-You;Choi, Seung-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3E
    • /
    • pp.109-117
    • /
    • 2009
  • Identity recognition in real environment with a reliable mode is a key issue in human computer interaction (HCI). In this paper, we present a robust person identification system considering score-based optimal reliability measure of audio-visual modalities. We propose an extension of the modified reliability function by introducing optimizing parameters for both of audio and visual modalities. For degradation of visual signals, we have applied JPEG compression to test images. In addition, for creating mismatch in between enrollment and test session, acoustic Babble noises and artificial illumination have been added to test audio and visual signals, respectively. Local PCA has been used on both modalities to reduce the dimension of feature vector. We have applied a swarm intelligence algorithm, i.e., particle swarm optimization for optimizing the modified convection function's optimizing parameters. The overall person identification experiments are performed using VidTimit DB. Experimental results show that our proposed optimal reliability measures have effectively enhanced the identification accuracy of 7.73% and 8.18% at different illumination direction to visual signal and consequent Babble noises to audio signal, respectively, in comparison with the best classifier system in the fusion system and maintained the modality reliability statistics in terms of its performance; it thus verified the consistency of the proposed extension.