• Title/Summary/Keyword: audio engineering

Search Result 817, Processing Time 0.025 seconds

The Audio Signal Classification System Using Contents Based Analysis

  • Lee, Kwang-Seok;Kim, Young-Sub;Han, Hag-Yong;Hur, Kang-In
    • Journal of information and communication convergence engineering
    • /
    • v.5 no.3
    • /
    • pp.245-248
    • /
    • 2007
  • In this paper, we research the content-based analysis and classification according to the composition of the feature parameter data base for the audio data to implement the audio data index and searching system. Audio data is classified to the primitive various auditory types. We described the analysis and feature extraction method for the feature parameters available to the audio data classification. And we compose the feature parameters data base in the index group unit, then compare and analyze the audio data centering the including level around and index criterion into the audio categories. Based on this result, we compose feature vectors of audio data according to the classification categories, and simulate to classify using discrimination function.

Audio Fingerprinting Based Spatial Audio Reproduction System (오디오 핑거프린팅기반 입체음향 재현 시스템)

  • Ryu, Sang Hyeon;Kim, Hyoung-Gook
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.12
    • /
    • pp.217-223
    • /
    • 2013
  • This paper proposes a spatial audio reproduction system based on audio fingerprinting that combines the audio fingerprinting and the spatial audio processing. In the proposed system, a salient audio peak pair fingerprint based on modulation spectrum improves the accuracy of the audio fingerprinting system in real noisy environments and spatial audio information as metadata gives a listener a sensation of being listening to the sound in the space, where the sound is actually recorded.

Convolutional Neural Network based Audio Event Classification

  • Lim, Minkyu;Lee, Donghyun;Park, Hosung;Kang, Yoseb;Oh, Junseok;Park, Jeong-Sik;Jang, Gil-Jin;Kim, Ji-Hwan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.6
    • /
    • pp.2748-2760
    • /
    • 2018
  • This paper proposes an audio event classification method based on convolutional neural networks (CNNs). CNN has great advantages of distinguishing complex shapes of image. Proposed system uses the features of audio sound as an input image of CNN. Mel scale filter bank features are extracted from each frame, then the features are concatenated over 40 consecutive frames and as a result, the concatenated frames are regarded as an input image. The output layer of CNN generates probabilities of audio event (e.g. dogs bark, siren, forest). The event probabilities for all images in an audio segment are accumulated, then the audio event having the highest accumulated probability is determined to be the classification result. This proposed method classified thirty audio events with the accuracy of 81.5% for the UrbanSound8K, BBC Sound FX, DCASE2016, and FREESOUND dataset.

Channel Expansion Technology in MPEG Audio (MPEG 오디오의 채널 확장 기술)

  • Pang, Hee-Suk
    • Journal of Broadcast Engineering
    • /
    • v.16 no.5
    • /
    • pp.714-721
    • /
    • 2011
  • MPEG audio uses the masking effect, high frequency component synthesis based on spectral band replication, and channel expansion based on parametric stereo for efficient compression of audio signals. In this paper, we present an overview of the state-of-the-art channel expansion technology in MPEG audio. We also present technical overviews and application examples to broadcasting services for HE-AAC v.2, MPEG Surround, spatial audio object coding (SAOC), and unified speech and audio coding (USAC) which are MPEG audio codecs based on the channel expansion technology.

Low-Delay, Low-Power, and Real-Time Audio Remote Transmission System over Wi-Fi

  • Hong, Jinwoo;Yoo, Jeongju;Hong, Jeongkyu
    • Journal of information and communication convergence engineering
    • /
    • v.18 no.2
    • /
    • pp.115-122
    • /
    • 2020
  • Audiovisual (AV) facilities such as TVs and signage are installed in various public places. However, audio cannot be used to prevent noise and interference from individuals, which results in a loss of concentration and understanding of AV content. To address this problem, a total technique for remotely listening to audio from audiovisual facilities with clean sound quality while maintaining video and lip-syncing through personal smart mobile devices is proposed in this paper. Through the experimental results, the proposed scheme has been verified to reduce system power consumption by 8% to 16% and provide real-time processing with a low latency of 120 ms. The system described in this paper will contribute to the activation of audio telehearing services as it is possible to provide audio remote services in various places, such as express buses, trains, wide-area and intercity buses, public waiting rooms, and various application services.

A Reversible Audio Watermarking Scheme

  • Kim, Hyoung-Joong;Sachnev, Vasiliy;Kim, Ki-Seob
    • Journal of The Institute of Information and Telecommunication Facilities Engineering
    • /
    • v.5 no.1
    • /
    • pp.37-42
    • /
    • 2006
  • A reversible audio watermarking algorithm is presented in this paper. This algorithm transforms the audio signal with the integer wavelet transform first in order to enhance the correlation between neighbor audio samples. Audio signal has low correlation between neighbor samples, which makes it difficult to apply difference expansion scheme. Second, a novel difference expansion scheme is used to embed more data by reducing the size of location map. Therefore, the difference expansion scheme used in this paper theoretically secures high embedding capacity under low perceptual distortion. Experiments show that this scheme can hide large number of information bits and keeps high perceptual quality.

  • PDF

Defending and Detecting Audio Adversarial Example using Frame Offsets

  • Gong, Yongkang;Yan, Diqun;Mao, Terui;Wang, Donghua;Wang, Rangding
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1538-1552
    • /
    • 2021
  • Machine learning models are vulnerable to adversarial examples generated by adding a deliberately designed perturbation to a benign sample. Particularly, for automatic speech recognition (ASR) system, a benign audio which sounds normal could be decoded as a harmful command due to potential adversarial attacks. In this paper, we focus on the countermeasures against audio adversarial examples. By analyzing the characteristics of ASR systems, we find that frame offsets with silence clip appended at the beginning of an audio can degenerate adversarial perturbations to normal noise. For various scenarios, we exploit frame offsets by different strategies such as defending, detecting and hybrid strategy. Compared with the previous methods, our proposed method can defense audio adversarial example in a simpler, more generic and efficient way. Evaluated on three state-of-the-arts adversarial attacks against different ASR systems respectively, the experimental results demonstrate that the proposed method can effectively improve the robustness of ASR systems.

Analysis of learning effects using audio-visual manual of SWAT (SWAT의 시청각 매뉴얼을 통한 학습 효과 분석)

  • Lee, Ju-Yeong;Kim, Tea-Ho;Ryu, Ji-Chul;Kang, Hyun-Woo;Kum, Dong-Hyuk;Woo, Won-Hee;Jang, Chun-Hwa;Choi, Jong-Dae;Lim, Kyoung-Jae
    • Korean Journal of Agricultural Science
    • /
    • v.38 no.4
    • /
    • pp.731-737
    • /
    • 2011
  • In the modern society, GIS-based decision support system has been used in evaluating environmental issues and changes due to spatial and temporal analysis capabilities of the GIS. However without proper manual of these systems, its desired goals could not be achieved. In this study, audio-visual SWAT tutorial system was developed to evaluate its effectives in learning the SWAT model. Learning effects was analyzed after in-class demonstration and survey. The survey was conducted for $3^{rd}$ grade students with/without audio-visual materials using 30 questionnaires, composed of 3 items for trend of respondent, 5 items for effects of audio-visual materials, and 12 items for effects of with/without manual in learning the model. For group without audio-visual manual, 2.98 out of 5 was obtained and 4.05 out of 5 was obtained for group with audio-visual manual, indicating higher content delivery with audio-visual learning effects. As shown in this study, the audio-visual learning material should be developed and used in various computer-based modeling system.

A Study on the Digital Audio Watermarking for a High Quality Audio (고음질을 위한 디지털 오디오 워터마킹에 관한 연구)

  • Jo, Byeong-Rok;Jeong, Il-Yong;Park, Chang-Gyun;Lee, Gang-Hyeon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.39 no.3
    • /
    • pp.53-61
    • /
    • 2002
  • In this paper, the authors proposed the digital audio watermarking algorithm for a high quality audio. Today, the digital watermark is used to confirm to the digital copyright protection, not only the digital image but the digital audio study is an activeness in the digital watermarking area. Especially, the watermark insertion in the digital audio area affects deeply not only a robustness but the audio quality of the watermarked audio data. Generally, the audio watermark is inserted in the frequence domain after FFT, the quality of audio data is affected by the watermark insertion. Thus, a high quality audio to be maintained at the same time, the study related a inserting of the robustness watermark happened to a hot issue. In this paper, the authors proposed the digital audio watermarking algorithm using psychoacoustic model and MDCT/IMDCT (Modified Discrete Cosine Transform/Inverse Modified Discrete Cosine Transform). In the proposed scheme, the authors experimented the stereo audio file with 44.1KHz, and 128kbps for the audio watermarking algorithm proposed. When the audio data is processed by MDCT, the watermark is able to insert into the frequence domain with 256, 1024 and 2048 interval. In case of 50㎳ RMS window, it was confirmed that the difference between the original audio data and the watermarked audio data of RMS power is 0.8㏈.

Implementation of a 16-Bit Fixed-Point MPEG-2/4 AAC Decoder for Mobile Audio Applications

  • Kim, Byoung-Eul;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.3C
    • /
    • pp.240-246
    • /
    • 2008
  • An MPEG-2/4 AAC decoder on 16-bit fixed-point processor is presented in this paper. To meet audio quality criteria, despite small word length, special design methods for 16-bit foxed-point AAC decoder were devised. This paper presents particular algorithms for 16-bit AAC decoding. We have implemented an efficient AAC decoder using the proposed algorithms. Audio contents can be replayed in the decoder without quality degradation.