Non-Intrusive Speech Intelligibility Estimation Using Autoencoder Features with Background Noise Information

Jeong, Yue Ri;Choi, Seung Ho;

doi:10.7236/IJIBC.2020.12.3.220

International Journal of Internet, Broadcasting and Communication

제12권3호
/
Pages.220-225
/
2020
/
2288-4920(pISSN)
/
2288-4939(eISSN)

한국인터넷방송통신학회 (The Institute of Internet, Broadcasting and Communication)

DOI QR Code

Non-Intrusive Speech Intelligibility Estimation Using Autoencoder Features with Background Noise Information

Jeong, Yue Ri (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology) ;
Choi, Seung Ho (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology)

투고 : 2020.07.21
심사 : 2020.08.01
발행 : 2020.08.31

https://doi.org/10.7236/IJIBC.2020.12.3.220 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

This paper investigates the non-intrusive speech intelligibility estimation method in noise environments when the bottleneck feature of autoencoder is used as an input to a neural network. The bottleneck feature-based method has the problem of severe performance degradation when the noise environment is changed. In order to overcome this problem, we propose a novel non-intrusive speech intelligibility estimation method that adds the noise environment information along with bottleneck feature to the input of long short-term memory (LSTM) neural network whose output is a short-time objective intelligence (STOI) score that is a standard tool for measuring intrusive speech intelligibility with reference speech signals. From the experiments in various noise environments, the proposed method showed improved performance when the noise environment is same. In particular, the performance was significant improved compared to that of the conventional methods in different environments. Therefore, we can conclude that the method proposed in this paper can be successfully used for estimating non-intrusive speech intelligibility in various noise environments.

키워드

참고문헌

Ludovic Malfait, Jens Berger, and Martin Kastner, "P.563 -The ITU-T standard for single-ended speech quality assessment," IEEE Transactions on Audio, Speech, and Language Processing 14.6, pp.1924-1934, 2006. DOI: 10.1109/TASL.2006.883177
C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "An algorithm for intelligibility prediction of time-frequency weighted noisy speech," IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125- 2136, 2011. DOI: https://www.doi.org/10.1109/TASL.2011.2114881
Dushyant Sharma, Yu Wang, Patrick A. Naylor, Mike Brookes, "A data-driven non-intrusive measure of speech quality and intelligibility," Speech Communication, vol. 80, June 2016, pp. 84-94, June 2016. DOI: https://doi.org/10.1016/j.specom.2016.03.005
A. H. Andersen, J. M. de Haan, Z. tan and J. Jensen, "Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 10, pp. 1925-1939, Oct. 2018. DOI: 10.1109/TASLP.2018.2847459
Anderson R. Avila, Hannes Gamper, Chandan Reddy, Ross Cutler, Ivan Tashev, and Johannes Gehrke, "Nonintrusive Speech Quality Assessment Using Neural Networks," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 18777982, May 2019. DOI: 10.1109/ICASSP.2019.8683175
D. K. Yun, H. N. Lee, and S. H. Choi, "A Deep Learning-Based Approach to Non-Intrusive Speech Intelligibility Estimation," IEICE Trans. Information and Systems, pp. 1207-1208, Apr. 2018. DOI: 10.1587/transinf.2017EDL8225
Y. H. Kim, D. K. Yun, H. N. Lee, and S. H. Choi, "A Non-Intrusive Speech Intelligibility Estimation Method Based on Deep Learning Using Autoencoder Features" IEICE Trans. Information and Systems, Vol.E103-D No.3, March. 2020. DOI: 10.1587/transinf.2019EDL8150
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, Nov. 1997. DOI: 10.1162/neco.1997.9.8.1735
Hasim Sak, Andrew W. Senior, and Françoise Beaufays, "Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling models," Proc. INTERSPEECH, pp. 338-342, 2014.
Tara N. Sainath, Brian Kingsbury, and Bhuvana Ramab, "Auto-encoder bottleneck features using deep belief networks," Proc. ICASSP, pp. 4153-4156, 2012. DOI: 10.1109/ICASSP.2012.6288833
V. Nair and G. E. Hinton, "Rectified linear units improve restricted Boltzmann machines," Proc. of the 27th international conference on machine learning (ICML-10), pp. 807-814. 2010. DOI: https://dl.acm.org/citation.cfm?id=3104425
Diederik P. Kingma and Jimmy Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014. DOI: https://arxiv.org/abs/1412.6980
J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, "DARPA TIMIT acoustic phonetic continuous speech corpus CDROM," NIST, 1993.

International Journal of Internet, Broadcasting and Communication

Non-Intrusive Speech Intelligibility Estimation Using Autoencoder Features with Background Noise Information

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)