DOI QR코드

DOI QR Code

Research on Machine Learning Rules for Extracting Audio Sources in Noise

  • Kyoung-ah Kwon (Dept. of Global Media, Soong-sil Univ.)
  • Received : 2024.05.25
  • Accepted : 2024.09.01
  • Published : 2024.09.30

Abstract

This study presents five selection rules for training algorithms to extract audio sources from noise. The five rules are Dynamics, Roots, Tonal Balance, Tonal-Noisy Balance, and Stereo Width, and the suitability of each rule for sound extraction was determined by spectrogram analysis using various types of sample sources, such as environmental sounds, musical instruments, human voice, as well as white, brown, and pink noise with sine waves. The training area of the algorithm includes both melody and beat, and with these rules, the algorithm is able to analyze which specific audio sources are contained in the given noise and extract them. The results of this study are expected to improve the accuracy of the algorithm in audio source extraction and enable automated sound clip selection, which will provide a new methodology for sound processing and audio source generation using noise.

Keywords

Acknowledgement

This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (RS-2023-00240479)

References

  1. D. Pituk, Automatic audio sample finder for music creation: Melodic audio segmentation using DSP and machine learning, Master Thesis. KTH Royal Institute of Technology, Stockholm, Sweden, 2019.
  2. Van Balen, Jan., Automatic recognition of samples in musical audio, Master Thesis. Barcelona: Universitat Pompeu Fabra, 2011.
  3. D. Zhiyao, G. J. Mysore, and P. Smaragdis, "Speech enhancement by online non-negative spectrogram decomposition in nonstationary noise environments," Thirteenth annual conference of the international speech communication association, 2012.
  4. D. Jonathan, H. D. Tran, and H. Li, "Spectrogram image feature for sound event classification in mismatched conditions," IEEE signal processing letters , Vol. 18, No. 2, pp. 130-133, 2010.
  5. K. Peerapol, C. Lursinsap, and T. Raicharoen, "Very short time environmental sound classification based on spectrogram pattern matching," Information Sciences, Vol. 243, No. -, pp. 57-74, 2013.
  6. M. Mauch and S. Dixon, "PYIN: A fundamental frequency estimator using probabilistic threshold distributions," 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, pp. 659-663, 2014, doi: 10.1109/ICASSP.2014.6853678.
  7. J. Dan-Ning, L. Lu, H. J. Zhang, J. H. Tao, and L. H. Cai, "Music type classification by spectral contrast feature," in Multimedia and Expo, 2002, ICME'02, Proceedings, 2002 IEEE International Conference on, vol. 1, pp. 113-116, IEEE, 2002.
  8. V. Mark, The Synthesizer: A Comprehensive Guide to Understanding, Programming, Playing, and Recording the Ultimate Electronic Music Instrument, Oxford University Press, Incorporated, 2014, p. 152.