• Title/Summary/Keyword: T-F 마스킹

Search Result 1, Processing Time 0.015 seconds

Performance comparison evaluation of real and complex networks for deep neural network-based speech enhancement in the frequency domain (주파수 영역 심층 신경망 기반 음성 향상을 위한 실수 네트워크와 복소 네트워크 성능 비교 평가)

  • Hwang, Seo-Rim;Park, Sung Wook;Park, Youngcheol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.1
    • /
    • pp.30-37
    • /
    • 2022
  • This paper compares and evaluates model performance from two perspectives according to the learning target and network structure for training Deep Neural Network (DNN)-based speech enhancement models in the frequency domain. In this case, spectrum mapping and Time-Frequency (T-F) masking techniques were used as learning targets, and a real network and a complex network were used for the network structure. The performance of the speech enhancement model was evaluated through two objective evaluation metrics: Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) depending on the scale of the dataset. Test results show the appropriate size of the training data differs depending on the type of networks and the type of dataset. In addition, they show that, in some cases, using a real network may be a more realistic solution if the number of total parameters is considered because the real network shows relatively higher performance than the complex network depending on the size of the data and the learning target.