• Title/Summary/Keyword: Audio Event Detection

Search Result 25, Processing Time 0.021 seconds

Acoustic Event Detection in Multichannel Audio Using Gated Recurrent Neural Networks with High-Resolution Spectral Features

  • Kim, Hyoung-Gook;Kim, Jin Young
    • ETRI Journal
    • /
    • v.39 no.6
    • /
    • pp.832-840
    • /
    • 2017
  • Recently, deep recurrent neural networks have achieved great success in various machine learning tasks, and have also been applied for sound event detection. The detection of temporally overlapping sound events in realistic environments is much more challenging than in monophonic detection problems. In this paper, we present an approach to improve the accuracy of polyphonic sound event detection in multichannel audio based on gated recurrent neural networks in combination with auditory spectral features. In the proposed method, human hearing perception-based spatial and spectral-domain noise-reduced harmonic features are extracted from multichannel audio and used as high-resolution spectral inputs to train gated recurrent neural networks. This provides a fast and stable convergence rate compared to long short-term memory recurrent neural networks. Our evaluation reveals that the proposed method outperforms the conventional approaches.

Audio Event Detection Based on Attention CRNN (Attention CRNN에 기반한 오디오 이벤트 검출)

  • Kwak, Jin-Yeol;Chung, Yong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.3
    • /
    • pp.465-472
    • /
    • 2020
  • Recently, various deep neural networks based methods have been proposed for audio event detection. In this study, we improved the performance of audio event detection by adopting an attention approach to a baseline CRNN. We applied context gating at the input of the baseline CRNN and added an attention layer at the output. We improved the performance of the attention based CRNN by using the audio data of strong labels in frame units as well as the data of weak labels in clip levels. In the audio event detection experiments using the audio data from the Task 4 of the DCASE 2018/2019 Challenge, we could obtain maximally a 66% relative increase in the F-score in the proposed attention based CRNN compared with the baseline CRNN.

Audio Event Detection Using Deep Neural Networks (깊은 신경망을 이용한 오디오 이벤트 검출)

  • Lim, Minkyu;Lee, Donghyun;Park, Hosung;Kim, Ji-Hwan
    • Journal of Digital Contents Society
    • /
    • v.18 no.1
    • /
    • pp.183-190
    • /
    • 2017
  • This paper proposes an audio event detection method using Deep Neural Networks (DNN). The proposed method applies Feed Forward Neural Network (FFNN) to generate output probabilities of twenty audio events for each frame. Mel scale filter bank (FBANK) features are extracted from each frame, and its five consecutive frames are combined as one vector which is the input feature of the FFNN. The output layer of FFNN produces audio event probabilities for each input feature vector. More than five consecutive frames of which event probability exceeds threshold are detected as an audio event. An audio event continues until the event is detected within one second. The proposed method achieves as 71.8% accuracy for 20 classes of the UrbanSound8K and the BBC Sound FX dataset.

Polyphonic sound event detection using multi-channel audio features and gated recurrent neural networks (다채널 오디오 특징값 및 게이트형 순환 신경망을 사용한 다성 사운드 이벤트 검출)

  • Ko, Sang-Sun;Cho, Hye-Seung;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.4
    • /
    • pp.267-272
    • /
    • 2017
  • In this paper, we propose an effective method of applying multichannel-audio feature values to GRNNs (Gated Recurrent Neural Networks) in polyphonic sound event detection. Real life sounds are often overlapped with each other, so that it is difficult to distinguish them by using a mono-channel audio features. In the proposed method, we tried to improve the performance of polyphonic sound event detection by using multi-channel audio features. In addition, we also tried to improve the performance of polyphonic sound event detection by applying a gated recurrent neural network which is simpler than LSTM (Long Short Term Memory), which shows the highest performance among the current recurrent neural networks. The experimental results show that the proposed method achieves better sound event detection performance than other existing methods.

Performance Improvement of Mean-Teacher Models in Audio Event Detection Using Derivative Features (차분 특징을 이용한 평균-교사 모델의 음향 이벤트 검출 성능 향상)

  • Kwak, Jin-Yeol;Chung, Yong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.3
    • /
    • pp.401-406
    • /
    • 2021
  • Recently, mean-teacher models based on convolutional recurrent neural networks are popularly used in audio event detection. The mean-teacher model is an architecture that consists of two parallel CRNNs and it is possible to train them effectively on the weakly-labelled and unlabeled audio data by using the consistency learning metric at the output of the two neural networks. In this study, we tried to improve the performance of the mean-teacher model by using additional derivative features of the log-mel spectrum. In the audio event detection experiments using the training and test data from the Task 4 of the DCASE 2018/2019 Challenges, we could obtain maximally a 8.1% relative decrease in the ER(Error Rate) in the mean-teacher model using proposed derivative features.

Towards Low Complexity Model for Audio Event Detection

  • Saleem, Muhammad;Shah, Syed Muhammad Shehram;Saba, Erum;Pirzada, Nasrullah;Ahmed, Masood
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.175-182
    • /
    • 2022
  • In our daily life, we come across different types of information, for example in the format of multimedia and text. We all need different types of information for our common routines as watching/reading the news, listening to the radio, and watching different types of videos. However, sometimes we could run into problems when a certain type of information is required. For example, someone is listening to the radio and wants to listen to jazz, and unfortunately, all the radio channels play pop music mixed with advertisements. The listener gets stuck with pop music and gives up searching for jazz. So, the above example can be solved with an automatic audio classification system. Deep Learning (DL) models could make human life easy by using audio classifications, but it is expensive and difficult to deploy such models at edge devices like nano BLE sense raspberry pi, because these models require huge computational power like graphics processing unit (G.P.U), to solve the problem, we proposed DL model. In our proposed work, we had gone for a low complexity model for Audio Event Detection (AED), we extracted Mel-spectrograms of dimension 128×431×1 from audio signals and applied normalization. A total of 3 data augmentation methods were applied as follows: frequency masking, time masking, and mixup. In addition, we designed Convolutional Neural Network (CNN) with spatial dropout, batch normalization, and separable 2D inspired by VGGnet [1]. In addition, we reduced the model size by using model quantization of float16 to the trained model. Experiments were conducted on the updated dataset provided by the Detection and Classification of Acoustic Events and Scenes (DCASE) 2020 challenge. We confirm that our model achieved a val_loss of 0.33 and an accuracy of 90.34% within the 132.50KB model size.

Intelligent User Pattern Recognition based on Vision, Audio and Activity for Abnormal Event Detections of Single Households

  • Jung, Ju-Ho;Ahn, Jun-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.5
    • /
    • pp.59-66
    • /
    • 2019
  • According to the KT telecommunication statistics, people stayed inside their houses on an average of 11.9 hours a day. As well as, according to NSC statistics in the united states, people regardless of age are injured for a variety of reasons in their houses. For purposes of this research, we have investigated an abnormal event detection algorithm to classify infrequently occurring behaviors as accidents, health emergencies, etc. in their daily lives. We propose a fusion method that combines three classification algorithms with vision pattern, audio pattern, and activity pattern to detect unusual user events. The vision pattern algorithm identifies people and objects based on video data collected through home CCTV. The audio and activity pattern algorithms classify user audio and activity behaviors using the data collected from built-in sensors on their smartphones in their houses. We evaluated the proposed individual pattern algorithm and fusion method based on multiple scenarios.

Retrieval of Player Event in Golf Videos Using Spoken Content Analysis (음성정보 내용분석을 통한 골프 동영상에서의 선수별 이벤트 구간 검색)

  • Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.7
    • /
    • pp.674-679
    • /
    • 2009
  • This paper proposes a method of player event retrieval using combination of two functions: detection of player name in speech information and detection of sound event from audio information in golf videos. The system consists of indexing module and retrieval module. At the indexing time audio segmentation and noise reduction are applied to audio stream demultiplexed from the golf videos. The noise-reduced speech is then fed into speech recognizer, which outputs spoken descriptors. The player name and sound event are indexed by the spoken descriptors. At search time, text query is converted into phoneme sequences. The lists of each query term are retrieved through a description matcher to identify full and partial phrase hits. For the retrieval of the player name, this paper compares the results of word-based, phoneme-based, and hybrid approach.

Deep Learning-Based User Emergency Event Detection Algorithms Fusing Vision, Audio, Activity and Dust Sensors (영상, 음성, 활동, 먼지 센서를 융합한 딥러닝 기반 사용자 이상 징후 탐지 알고리즘)

  • Jung, Ju-ho;Lee, Do-hyun;Kim, Seong-su;Ahn, Jun-ho
    • Journal of Internet Computing and Services
    • /
    • v.21 no.5
    • /
    • pp.109-118
    • /
    • 2020
  • Recently, people are spending a lot of time inside their homes because of various diseases. It is difficult to ask others for help in the case of a single-person household that is injured in the house or infected with a disease and needs help from others. In this study, an algorithm is proposed to detect emergency event, which are situations in which single-person households need help from others, such as injuries or disease infections, in their homes. It proposes vision pattern detection algorithms using home CCTVs, audio pattern detection algorithms using artificial intelligence speakers, activity pattern detection algorithms using acceleration sensors in smartphones, and dust pattern detection algorithms using air purifiers. However, if it is difficult to use due to security issues of home CCTVs, it proposes a fusion method combining audio, activity and dust pattern sensors. Each algorithm collected data through YouTube and experiments to measure accuracy.

Sound Event Detection based on Deep Neural Networks (딥 뉴럴네트워크 기반의 소리 이벤트 검출)

  • Chung, Suk-Hwan;Chung, Yong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.2
    • /
    • pp.389-396
    • /
    • 2019
  • In this paper, various architectures of deep neural networks were applied for sound event detection and their performances were compared using a common audio database. The FNN, CNN, RNN and CRNN were implemented using hyper-parameters optimized for the database as well as the architecture of each neural network. Among the implemented deep neural networks, CRNN performed best at all testing conditions and CNN followed CRNN in performance. Although RNN has a merit in tracking the time-correlations in audio signals, it showed poor performance compared with CNN and CRNN.