DOI QR코드

DOI QR Code

Adaptive Speech Streaming Based on Packet Loss Prediction Using Support Vector Machine for Software-Based Multipoint Control Unit over IP Networks

  • Kang, Jin Ah (5G Giga Communication Research Laboratory, ETRI) ;
  • Han, Mikyong (5G Giga Communication Research Laboratory, ETRI) ;
  • Jang, Jong-Hyun (5G Giga Communication Research Laboratory, ETRI) ;
  • Kim, Hong Kook (School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology)
  • Received : 2016.04.29
  • Accepted : 2016.11.03
  • Published : 2016.12.01

Abstract

An adaptive speech streaming method to improve the perceived speech quality of a software-based multipoint control unit (SW-based MCU) over IP networks is proposed. First, the proposed method predicts whether the speech packet to be transmitted is lost. To this end, the proposed method learns the pattern of packet losses in the IP network, and then predicts the loss of the packet to be transmitted over that IP network. The proposed method classifies the speech signal into different classes of silence, unvoiced, speech onset, or voiced frame. Based on the results of packet loss prediction and speech classification, the proposed method determines the proper amount and bitrate of redundant speech data (RSD) that are sent with primary speech data (PSD) in order to assist the speech decoder to restore the speech signals of lost packets. Specifically, when a packet is predicted to be lost, the amount and bitrate of the RSD must be increased through a reduction in the bitrate of the PSD. The effectiveness of the proposed method for learning the packet loss pattern and assigning a different speech coding rate is then demonstrated using a support vector machine and adaptive multirate-narrowband, respectively. The results show that as compared with conventional methods that restore lost speech signals, the proposed method remarkably improves the perceived speech quality of an SW-based MCU under various packet loss conditions in an IP network.

Keywords

References

  1. Cisco, "Visual Networking Index: Forecast and Methodology, 2014-2019," Cisco Systems, Inc., San Jose, CA, USA, May 2015.
  2. Cisco, "TelePresence Packet Loss and Poor Audio/Visual Quality in One Direction," Cisco Systems, Inc., San Jose, CA, USA, Nov. 2014.
  3. J.A. Kang and H.K. Kim, "Adaptive Redundant Speech Transmission over Wireless Multimedia Sensor Networks based on Estimation of Perceived Speech Quality," Sensors, vol. 11, no. 9, Aug. 2011, pp. 8469-8484. https://doi.org/10.3390/s110908469
  4. F. Merazka, "Improved Packet Loss Recovery using Interleaving for CELP-type Speech Coders in Packet Networks," IAENG Int. J. Comput. Sci., vol. 36, Feb. 2009, pp. 1-5.
  5. W. Lizhong et al., "An Adaptive Forward Error Control Method for Voice Communication," Int. Conf. Netw. Digital Soc., Wenzhou, China, May 30-31, 2010, pp. 186-189.
  6. I. Kouvelas et al., "Redundancy Control in Real-Time Internet Audio conferencing," Int. Workshop Audio-Visual Services over Packet Netw., Aberdeen, UK, Sept. 14-15, 1997, pp. 195-201.
  7. K. Park et al., "A Dynamic Packet Recovery Mechanism for Realtime Service in Mobile Computing Environments," ETRI J., vol. 25, no. 5, Oct. 2003, pp. 356-368. https://doi.org/10.4218/etrij.03.0102.0001
  8. T. Wu et al., "An Enhanced Structure of Layered Forward Error Correction and Interleaving for Scalable Video Coding in Wireless Video Delivery," IEEE Wireless Commun., vol. 20, no. 4, Aug. 2013, pp. 146-152. https://doi.org/10.1109/MWC.2013.6590062
  9. 3GPP TS 06.11, Substitution and Muting of Lost Frames for Full Rate Speech Channels, Nov. 2000.
  10. 3GPP TS 26.091, Mandatory Speech Codec Speech Processing Functions; AMR Speech Codec; Error Concealment of Lost Frames, Jan. 2010.
  11. N.I. Park, et al., "Burst Packet Loss Concealment using Multiple Codebooks and Comfort Noise for CELP-Type Speech Coders in Wireless Sensor Networks," Sensors, vol. 11, no. 5, May 2011, pp. 5323-5336. https://doi.org/10.3390/s110505323
  12. J. Huang, X. Zhang, and Y. Zhang, "Recovery of Lost Speech Segments using Incremental Subspace Learning," ETRI J., vol. 34, no. 4, Aug. 2012, pp. 645-648. https://doi.org/10.4218/etrij.12.0211.0408
  13. J.A. Kang et al., "Adaptive Speech Streaming Based on Speech Quality Estimation and Artificial Bandwidth Extension for Voice over Wireless Multimedia Sensor Networks," Int. J. Distrib. Sensor Netw., vol. 2015, Apr. 2015, pp. 1-8.
  14. RFC 3550, RTP: A Transport Protocol for Real-Time Applications, July 2003.
  15. RFC 3267, Real-Time Transport Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-rate (AMR) and Adaptive Multi-rate Wideband (AMR-WB) Audio Codecs, June 2002.
  16. 3GPP TS 26.101, Mandatory Speech Codec Speech Processing Functions; Adaptive Multi-rate (AMR) Speech Codec Frame Structure, Jan. 2010.
  17. L. Malfait, J. Bergerand, and M. Kastner, "P.563-the ITU-T Standard for Single-Ended Speech Quality Assessment," IEEE Trans. Audio, Speech, Language Process., vol. 14, no. 6, Nov. 2006, pp. 1924-1934. https://doi.org/10.1109/TASL.2006.883177
  18. Y. Gao et al., "The SMV Algorithm Selected by TIA and 3GPP2 for CDMA Applications," IEEE Int. Conf. Acoustics, Speech, Signal Process., Salt Lake City, Canada, May 7-11, 2001, pp. 709-712.
  19. J.A. Kang, Packet Loss Robust Speech Streaming Techniques Based on Speech Quality Estimation, Ph.D. Dissertation, School of Information and Mechatronics, Gwangju Institute of Science and Technology, Rep. of Korea, 2012.
  20. ITU-T Recommendation P.563, Single-Ended Method for Objective Speech Quality Assessment in Narrow-Band Telephony Applications, May 2004.
  21. NTT-AT, Multi-lingual Speech Database for Telephonometry 1994, NTT Advanced Technology Corp., Kanagawa, Japan, 1994.
  22. C. Cortes and V. Vapnik, "Support-Vector networks," Mach. Learning, vol. 20, no. 3, Sept. 1995, pp 273-297. https://doi.org/10.1007/BF00994018
  23. C. Chang and C. Lin, "LIBSVM: A Library for Support Vector Machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, Apr. 2011, pp. 1-27.
  24. H. Yoon et al., "Improved Two-Phase Framework for Facial Emotion Recognition," ETRI J., vol. 37, no. 6, Dec. 2015, pp. 1199-1210. https://doi.org/10.4218/etrij.15.0114.0523
  25. A. Ameri et al., "Support Vector Regression for Improved Real-Time, Simultaneous Myoelectric Control," IEEE Trans. Neural Syst. Rehabil. Eng., vol. 22, no. 6, Nov. 2014, pp. 1198-1209. https://doi.org/10.1109/TNSRE.2014.2323576
  26. J. Sun, J. Sun, and P. Chen, "Use of Support Vector Machine Models for Real-Time Prediction of Crash Risk on Urban Expressways," J. Trans. Res. Board, vol. 2432, 2014, pp. 91-98. https://doi.org/10.3141/2432-11
  27. T. Nagano and A. Ito, "Packet Loss Concealment of Voice-over IP Packet using Redundant Parameter Transmission under Severe Loss Conditions," J. Inform. Hiding Multimedia Signal Process., vol. 5, no. 2, Apr. 2014, pp. 286-295.
  28. H. Hsu et al., "Speech Attribute Classifier using Support Vector Machine for Speech Packet Loss Concealment," Int. Committee Co-ordination Standardization Speech Databases Assessment Techn., Macau, China, Dec. 9-12, 2012, pp. 68-71.
  29. M. Ellis, C. Perkins, and D. Pezaros, "End-to-end and Network-Internal Measurements of Real-Time Traffic to Residential Users," ACM Multimedia Syst. Conf., Santa Clara, CA, USA, Feb. 23-25, 2011, pp. 111-116.
  30. M. Ellis et al., "A Two-Level Markov Model for Packet Loss in UDP/IP-based Real-Time Video Applications Targeting Residential Users," Comput. Netw., vol. 70, Sept. 2014, pp. 384-399. https://doi.org/10.1016/j.comnet.2014.05.013
  31. ITU-T Recommendation G.191, Software Tools for Speech and Audio Coding Standardization, Mar. 2010.
  32. ITU-T Recommendation P.862, Perceptual Evaluation of Speech Quality (PESQ), an Objective Method for End-to-End Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs, Feb. 2001.