• Title/Summary/Keyword: Audio preprocessing

Search Result 19, Processing Time 0.021 seconds

Comparison of environmental sound classification performance of convolutional neural networks according to audio preprocessing methods (오디오 전처리 방법에 따른 콘벌루션 신경망의 환경음 분류 성능 비교)

  • Oh, Wongeun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.3
    • /
    • pp.143-149
    • /
    • 2020
  • This paper presents the effect of the feature extraction methods used in the audio preprocessing on the classification performance of the Convolutional Neural Networks (CNN). We extract mel spectrogram, log mel spectrogram, Mel Frequency Cepstral Coefficient (MFCC), and delta MFCC from the UrbanSound8K dataset, which is widely used in environmental sound classification studies. Then we scale the data to 3 distributions. Using the data, we test four CNNs, VGG16, and MobileNetV2 networks for performance assessment according to the audio features and scaling. The highest recognition rate is achieved when using the unscaled log mel spectrum as the audio features. Although this result is not appropriate for all audio recognition problems but is useful for classifying the environmental sounds included in the Urbansound8K.

Preprocessing method for enhancing digital audio quality in speech communication system (음성통신망에서 디지털 오디오 신호 음질개선을 위한 전처리방법)

  • Song Geun-Bae;Ahn Chul-Yong;Kim Jae-Bum;Park Ho-Chong;Kim Austin
    • Journal of Broadcast Engineering
    • /
    • v.11 no.2 s.31
    • /
    • pp.200-206
    • /
    • 2006
  • This paper presents a preprocessing method to modify the input audio signals of a speech coder to obtain the finally enhanced signals at the decoder. For the purpose, we introduce the noise suppression (NS) scheme and the adaptive gain control (AGC) where an audio input and its coding error are considered as a noisy signal and a noise, respectively. The coding error is suppressed from the input and then the suppressed input is level aligned to the original input by the following AGC operation. Consequently, this preprocessing method makes the spectral energy of the music input redistributed all over the spectral domain so that the preprocessed music can be coded more effectively by the following coder. As an artifact, this procedure needs an additional encoding pass to calculate the coding error. However, it provides a generalized formulation applicable to a lot of existing speech coders. By preference listening tests, it was indicated that the proposed approach produces significant enhancements in the perceived music qualities.

Energy-Aware Data-Preprocessing Scheme for Efficient Audio Deep Learning in Solar-Powered IoT Edge Computing Environments (태양 에너지 수집형 IoT 엣지 컴퓨팅 환경에서 효율적인 오디오 딥러닝을 위한 에너지 적응형 데이터 전처리 기법)

  • Yeontae Yoo;Dong Kun Noh
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.18 no.4
    • /
    • pp.159-164
    • /
    • 2023
  • Solar energy harvesting IoT devices prioritize maximizing the utilization of collected energy due to the periodic recharging nature of solar energy, rather than minimizing energy consumption. Meanwhile, research on edge AI, which performs machine learning near the data source instead of the cloud, is actively conducted for reasons such as data confidentiality and privacy, response time, and cost. One such research area involves performing various audio AI applications using audio data collected from multiple IoT devices in an IoT edge computing environment. However, in most studies, IoT devices only perform sensing data transmission to the edge server, and all processes, including data preprocessing, are performed on the edge server. In this case, it not only leads to overload issues on the edge server but also causes network congestion by transmitting unnecessary data for learning. On the other way, if data preprocessing is delegated to each IoT device to address this issue, it leads to another problem of increased blackout time due to energy shortages in the devices. In this paper, we aim to alleviate the problem of increased blackout time in devices while mitigating issues in server-centric edge AI environments by determining where the data preprocessed based on the energy state of each IoT device. In the proposed method, IoT devices only perform the preprocessing process, which includes sound discrimination and noise removal, and transmit to the server if there is more energy available than the energy threshold required for the basic operation of the device.

An Efficient Audio Watermark Extraction in Time Domain

  • Kang, Hae-Won;Jung, Sung-Hwan
    • Journal of Information Processing Systems
    • /
    • v.2 no.1
    • /
    • pp.13-17
    • /
    • 2006
  • In this paper, we propose an audio extraction method to decrease the influence of the original signal by modifying the watermarking detection system proposed by P. Bassia et al. In the extraction of the watermark, we employ a simple mean filter to remove the influence of the original signal as a preprocessing of extraction and the repetitive insertion of the watermark. As the result of the experiment, for which we used about 20 kinds of actual audio data, we obtain a watermark detection rate of about 95% and a good performance even after the various signal processing attacks.

Digital Audio Watermarking Scheme Using Perceptual Modeling (지각 모델링을 이용한 디지털 오디오 워터마킹 방법)

  • 석종원;홍진우
    • Journal of Broadcast Engineering
    • /
    • v.6 no.2
    • /
    • pp.195-202
    • /
    • 2001
  • As a solution for copyright protection of digital multimedia contents, digital watermark technology is now drawing the attention. In this paper, we presented two novel audio watermarking algorithms as a solution for protecting unauthorized copy of digital audio. Proposed watermarking schemes include the psychoacoustic model of MPEG audio coding to achieve the perceptual transparency after watermark embedding and preprocessing procedure before correlation in watermark detection to extract copyright information without access to the original audio signal. Experimental results show that our watermarking scheme is robust to common signal Processing attacks and it Introduces no audible distortion after watermark insertion.

  • PDF

Neural perceptron-based Training and Classification of Acoustic Signal

  • Kim, Yoon-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.1
    • /
    • pp.1133-1136
    • /
    • 2005
  • The MPEG/audio standard results from three years of co-work by an international committe of high-fidelity audio compression experts in the Moving Picture Experts Group (MPEG/audio). The MPEG standard is rigid only where necessary to ensure interoperability. In this paper, a new approach of training and classification of acoustic signal is addressed. This is some what a fields of application aspects rather than technonical problems such as MPEG/codec, MIDI. In preprocessing, acoustic signal is transformmed using DWT so as to extract a feature parameters of sound such as loudness, pitch, bandwidth and harmonicity. these accoustic parameters are exploited to the input vector of neural perceptron. Experimental results showed that proposed approach can be used for tunning the dissonance chord.

  • PDF

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

  • Liu, Min;Tang, Jun
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.754-771
    • /
    • 2021
  • In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.

DNN based Speech Detection for the Media Audio (미디어 오디오에서의 DNN 기반 음성 검출)

  • Jang, Inseon;Ahn, ChungHyun;Seo, Jeongil;Jang, Younseon
    • Journal of Broadcast Engineering
    • /
    • v.22 no.5
    • /
    • pp.632-642
    • /
    • 2017
  • In this paper, we propose a DNN based speech detection system using acoustic characteristics and context information of media audio. The speech detection for discriminating between speech and non-speech included in the media audio is a necessary preprocessing technique for effective speech processing. However, since the media audio signal includes various types of sound sources, it has been difficult to achieve high performance with the conventional signal processing techniques. The proposed method improves the speech detection performance by separating the harmonic and percussive components of the media audio and constructing the DNN input vector reflecting the acoustic characteristics and context information of the media audio. In order to verify the performance of the proposed system, a data set for speech detection was made using more than 20 hours of drama, and an 8-hour Hollywood movie data set, which was publicly available, was further acquired and used for experiments. In the experiment, it is shown that the proposed system provides better performance than the conventional method through the cross validation for two data sets.

Pretreatment For The Problem Solution Of Contents-Based Music Retrieval (내용 기반 음악 검색의 문제점 해결을 위한 전처리)

  • Chung, Myoung-Beom;Sung, Bo-Kyung;Ko, Il-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.6
    • /
    • pp.97-104
    • /
    • 2007
  • This paper presents the problem of the feature extraction techniques that has been used a content-based analysis, classification and retrieval in audio data and proposes a course of the preprocessing for a new contents-based retrieval methods. Because the feature vector according to sampling value changes, the existing audio data analysis is problem that same music is appraised by other music. Therefore, we propose waveform information extraction method of PCM data for retrieval audio data of various format to contents-based. If this method is used. we can find that audio datas that get into sampling in various format are same data. And it may be applied in contents-based music retrieval system. To verity the performance of the method, an experiment was done feature extraction using STFT and waveform information extraction using PCM data. As a result, we could know that the method to propose is effective more.

  • PDF

Shooting sound analysis using convolutional neural networks and long short-term memory (합성곱 신경망과 장단기 메모리를 이용한 사격음 분석 기법)

  • Kang, Se Hyeok;Cho, Ji Woong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.312-318
    • /
    • 2022
  • This paper proposes a model which classifies the type of guns and information about sound source location using deep neural network. The proposed classification model is composed of convolutional neural networks (CNN) and long short-term memory (LSTM). For training and test the model, we use the Gunshot Audio Forensic Dataset generated by the project supported by the National Institute of Justice (NIJ). The acoustic signals are transformed to Mel-Spectrogram and they are provided as learning and test data for the proposed model. The model is compared with the control model consisting of convolutional neural networks only. The proposed model shows high accuracy more than 90 %.