DOI QR코드

DOI QR Code

Towards Low Complexity Model for Audio Event Detection

  • Saleem, Muhammad (Department of Telecommunication Engineering, Mehran University of Engineering & Technology) ;
  • Shah, Syed Muhammad Shehram (Department of Software Engineering, Mehran University of Engineering & Technology) ;
  • Saba, Erum (Information Technology Centre, Sindh Agriculture University) ;
  • Pirzada, Nasrullah (Department of Telecommunication Engineering, Mehran University of Engineering & Technology) ;
  • Ahmed, Masood (Department of Computer System, Mehran University of Engineering & Technology)
  • 투고 : 2022.09.05
  • 발행 : 2022.09.30

초록

In our daily life, we come across different types of information, for example in the format of multimedia and text. We all need different types of information for our common routines as watching/reading the news, listening to the radio, and watching different types of videos. However, sometimes we could run into problems when a certain type of information is required. For example, someone is listening to the radio and wants to listen to jazz, and unfortunately, all the radio channels play pop music mixed with advertisements. The listener gets stuck with pop music and gives up searching for jazz. So, the above example can be solved with an automatic audio classification system. Deep Learning (DL) models could make human life easy by using audio classifications, but it is expensive and difficult to deploy such models at edge devices like nano BLE sense raspberry pi, because these models require huge computational power like graphics processing unit (G.P.U), to solve the problem, we proposed DL model. In our proposed work, we had gone for a low complexity model for Audio Event Detection (AED), we extracted Mel-spectrograms of dimension 128×431×1 from audio signals and applied normalization. A total of 3 data augmentation methods were applied as follows: frequency masking, time masking, and mixup. In addition, we designed Convolutional Neural Network (CNN) with spatial dropout, batch normalization, and separable 2D inspired by VGGnet [1]. In addition, we reduced the model size by using model quantization of float16 to the trained model. Experiments were conducted on the updated dataset provided by the Detection and Classification of Acoustic Events and Scenes (DCASE) 2020 challenge. We confirm that our model achieved a val_loss of 0.33 and an accuracy of 90.34% within the 132.50KB model size.

키워드

참고문헌

  1. K. Ooi, S. Peksi, and G. Woon-Seng, "Ensemble of Pruned Models for Low-Complexity Acoustic Scene Classification," Jun. 2020", in Detection and Classification of Acoustic Scenes and Events 2020,
  2. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 770-778, 2016, DOI: 10.1109/CVPR.2016.90.
  3. Y. LeCun, P. Haffner, L. Bottou, and Y. Bengio, "Object recognition with gradient-based learning," in Lecture Notes in Computer Science, 1999, pp. 1-28.
  4. Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, "Hierarchical attention networks for document classification," in Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 1480-1489. DOI: 10.18653/v1/n16-1174.
  5. F. Visin, K. Kastner, K. Cho, M. Matteucci, A. Courville, and Y. Bengio, "ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks," arXiv:1505.00393, pp. 1-9, 2015.
  6. A. Sampathkumar and D. Kowerko, "Low complexity acoustic scene classification using aalnet-94," Detect. Classif. Acoust. Scenes Event, pp. 1-5, 2020.
  7. Y. Tokozume, Y. Ushiku, and T. Harada, "Learning from between-class examples for deep sound recognition," in International Conference on Learning Representations, 2018, pp. 1-13.
  8. S. Kahl et al., "Large-Scale Bird Sound Classification using Convolutional Neural Networks," in CLEF 2017 Working Notes, 2017, pp. 1-14.
  9. Y. Tokozume, Y. Ushiku, and T. Harada, "Learning from between-class examples for deep sound recognition," in International Conference on Learning Representations, 2018, pp. 1-13.
  10. J. Salamon, C. Jacoby, and J. Bello, "A dataset and taxonomy for urban sound research," in Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03 - 07, 2014, 11 2014.
  11. B. McFee et al., "librosa: Audio and music signal analysis in python," in 14th Python in Science Conference, 2015, pp. 18-24.
  12. J. A. Lopez et al., "Low-memory Convolutional neural networks for acoustic scene classification," in Detection and Classification of Acoustic Scenes and Events 2020, 2020, no. November, pp. 96-99
  13. J. A. Lopez et al., "low-memory convolutional neural networks for acoustic scene classification," in Detection and Classification of Acoustic Scenes and Events 2020, 2020, no. November, pp. 96-99.
  14. N. Pajusco, R. Huang, and N. Farrugia, "Lightweight Convolutional Neural Networks on Binaural Waveforms for Low Complexity Acoustic Scene Classification," Jun. 2020.
  15. N. Pajusco, R. Huang, and N. Farrugia, "Lightweight Convolutional Neural Networks on Binaural Waveforms for Low Complexity Acoustic Scene Classification," Jun. 2020.
  16. J. Zhang, C. Ren, and S. Li, "Bupt submissions to dcase 2020: low-complexity acoustic scene classification with posttraining static quantization and prune," Detect. Classif. Acoust. Scenes Events, pp. 1-5, 2020.
  17. A. S. Singh1 and P. R. Devalraju2, Dhanunjaya Varma, Rajan3, "pruning and quantization for low-complexity acoustic scene classification," in 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 2015, pp. 2-3.
  18. J. A. Lopez et al., "low-memory convolutional neural networks for acoustic scene classification," in Detection and Classification of Acoustic Scenes and Events 2020, 2020, no. November, pp. 96-99.
  19. D. Ngo, L. Pham, A. Nguyen, and H. Hoang, "Cnn-based framework for dcase 2020 task 1b challenge," Detect. Classif. Acoust. Scenes Event, pp. 1-5, 2020.
  20. N. Pajusco, R. Huang, and N. Farrugia, "Lightweight Convolutional Neural Networks on Binaural Waveforms for Low Complexity Acoustic Scene Classification," Jun. 2020.
  21. A. S. Singh1 and P. R. Devalraju2, Dhanunjaya Varma, Rajan3, "pruning and quantization for low-complexity acoustic scene classification," in 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 2015, pp. 2-3.
  22. L. Pham, H. Tang, A. Jalali, A. Schindler, R. King, and I. McLoughlin, "A Low-Complexity Deep Learning Framework For Acoustic Scene Classification," Data Sci. - Anal. Appl., pp. 26-32, 2022, DOI: 10.1007/978-3-658-36295-9_4.
  23. A. Sampathkumar and D. Kowerko, "Low complexity acoustic scene classification using aalnet-94," Detect. Classif. Acoust. Scenes Event, pp. 1-5, 2020.