Speech Enhancement Using Phase-Dependent A Priori SNR Estimator in Log-Mel Spectral Domain

Lee, Yun-Kyung;Park, Jeon Gue;Lee, Yun Keun;Kwon, Oh-Wook;

doi:10.4218/etrij.14.2214.0039

ETRI Journal

Volume 36 Issue 5
/
Pages.721-729
/
2014
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Speech Enhancement Using Phase-Dependent A Priori SNR Estimator in Log-Mel Spectral Domain

Lee, Yun-Kyung (SW.Content Research Laboratory, ETRI) ;
Park, Jeon Gue (SW.Content Research Laboratory, ETRI) ;
Lee, Yun Keun (SW.Content Research Laboratory, ETRI) ;
Kwon, Oh-Wook (School of Electronics Engineering, Chungbuk National University)

Received : 2014.01.29
Accepted : 2014.06.16
Published : 2014.10.01

https://doi.org/10.4218/etrij.14.2214.0039 Citation PDF KSCI KPUBS

Download PDF

⟨ Previous Next ⟩

Abstract

We propose a novel phase-based method for single-channel speech enhancement to extract and enhance the desired signals in noisy environments by utilizing the phase information. In the method, a phase-dependent a priori signal-to-noise ratio (SNR) is estimated in the log-mel spectral domain to utilize both the magnitude and phase information of input speech signals. The phase-dependent estimator is incorporated into the conventional magnitude-based decision-directed approach that recursively computes the a priori SNR from noisy speech. Additionally, we reduce the performance degradation owing to the one-frame delay of the estimated phase-dependent a priori SNR by using a minimum mean square error (MMSE)-based and maximum a posteriori (MAP)-based estimator. In our speech enhancement experiments, the proposed phase-dependent a priori SNR estimator is shown to improve the output SNR by 2.6 dB for both the MMSE-based and MAP-based estimator cases as compared to a conventional magnitude-based estimator.

Keywords

References

H.J. Song, Y.K. Lee, and H.S. Kim, "Probabilistic Bilinear Transformation Space-Based Joint Maximum A Posteriori Adaptation," ETRI J., vol. 34, no. 5, Oct. 2012, pp. 783-786. https://doi.org/10.4218/etrij.12.0212.0054
S.J. Lee et al., "Intra- and Inter-Frame Features for Automatic Speech Recognition," ETRI J., vol. 36, no. 3, June 2014, pp. 514-517. https://doi.org/10.4218/etrij.14.0213.0181
Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum-Mean Square Error Short-Time Spectral Amplitude Estimator," IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 6, Dec. 1984, pp. 1109-1121. https://doi.org/10.1109/TASSP.1984.1164453
P.C. Loizou, "Part II Algorithms," in Speech Enhancement, CRC Press, 2007, pp. 97-289.
M.J. Alam, D. O'Shaughnessy, and S.-A. Selouani, "Speech Enhancement Based on Novel Two-Step A Priori SNR Estimators," Proc. INTERSPEECH, Brisbane, Australia, Sept. 2008, pp. 565-568.
D.L. Wang and J.S. Lim, "The Unimportance of Phase in Speech Enhancements," IEEE Trans. Acoust., Speech, Signal Process., vol. 30, no. 4, Aug. 1982, pp. 679-681. https://doi.org/10.1109/TASSP.1982.1163920
F. Faubel, J. Mcdonough, and D. Klakow, "A Phase-Averaged Model for the Relationship between Noisy Speech, Clean Speech, and Noise in the Log-Mel Domain," Proc. INTERSPEECH, Brisbane, Australia, Sept. 2008, pp. 553-556.
L. Deng, J. Droppo, and A. Acero, "Enhancement of Log Mel Power Spectra of Speech Using a Phase-Sensitive Model of the Acoustic Environment and Sequential Estimation of the Corrupting Noise," IEEE Trans. Speech Audio Process., vol. 12, no. 2, Mar. 2004, pp. 133-143. https://doi.org/10.1109/TSA.2003.820201
K.K. Paliwal, "Usefulness of Phase in Speech Processing," Proc. IPSJ Spoken Language Process. Workshop, Gifu, Japan, 2003, pp. 1-6.
Y.-K. Lee, I.S. Lee, and O.-W. Kwon, "Single-Channel Speech Separation Using Phase-Based Methods," IEEE Trans. Consum. Electron., vol. 56, no. 4, Nov. 2010, pp. 2453-2459. https://doi.org/10.1109/TCE.2010.5681127
Y.-K. Lee and O.-W. Kwon, "A Phase-Dependent A Priori SNR Estimator in the Log-Mel Spectral Domain for Speech Enhancement," IEEE Int. Conf. Consum. Electron., Las Vegas, NV, USA, Jan. 9-12, 2011, pp. 413-414.
B. Andrassy, D. Vlaj, and C. Beaugeant, "Recognition Performance of the Siemens Front-End with and without Frame Dropping on the Aurora 2 Database," Proc. European Conf. Speech Commun. Technol., vol. 1, 2001, pp. 193-196.
S. Sigurdsson, K.B. Petersen, and T. Lehn-Schiole, "Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music," Proc. Int. Conf. Music Inf. Retrieval, Victoria, Canada, Oct. 2006.
M. Kato, A. Sugiyama, and M. Serizawa, "Noise Suppression with High Speech Quality Based on Weighted Noise Estimation and MMSE STSA," IEICE Trans. Fundam., vol. E85-A, no. 7, July 2002, pp. 1710-1718.
A.V. Oppenheim and R.W. Schaefer, Digital Signal Processing, Englewood Cliffs, NJ: Prentice-Hall, 1989.
M.P. Cooke et al., "An Audio-Visual Corpus for Speech Perception and Automatic Speech Recognition," J. Acoust. Soc. America, vol. 120, no. 5, Nov. 2006, pp. 2421-2424. https://doi.org/10.1121/1.2229005

Cited by

Hard component detection of transient noise and its removal using empirical mode decomposition and wavelet‐based predictive filter vol.12, pp.7, 2014, https://doi.org/10.1049/iet-spr.2017.0167

ETRI Journal

Speech Enhancement Using Phase-Dependent A Priori SNR Estimator in Log-Mel Spectral Domain

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)