Enhanced Spectral Hole Substitution for Improving Speech Quality in Low Bit-Rate Audio Coding

  • Lee, Chang-Heon (Department of Electrical and Electronic Engineering, Yonsei University) ;
  • Kang, Hong-Goo (Department of Electrical and Electronic Engineering, Yonsei University)
  • Received : 2010.06.24
  • Accepted : 2010.08.17
  • Published : 2010.09.30

Abstract

This paper proposes a novel spectral hole substitution technique for low bit-rate audio coding. The spectral holes frequently occurring in relatively weak energy bands due to zero bit quantization result in severe quality degradation, especially for harmonic signals such as speech vowels. The enhanced aacPlus (EAAC) audio codec artificially adjusts the minimum signal-to-mask ratio (SMR) to reduce the number of spectral holes, but it still produces noisy sound. The proposed method selectively predicts the spectral shapes of hole bands using either intra-band correlation, i.e. harmonically related coefficients nearby or inter-band correlation, i.e. previous frames. For the bands that have low prediction gain, only the energy term is quantized and spectral shapes are replaced by pseudo random values in the decoding stage. To minimize perceptual distortion caused by spectral mismatching, the criterion of the just noticeable level difference (JNLD) and spectral similarity between original and predicted shapes are adopted for quantizing the energy term. Simulation results show that the proposed method implemented into the EAAC baseline coder significantly improves speech quality at low bit-rates while keeping equivalent quality for mixed and music contents.

Keywords

References

  1. J. D. Johnston, "Transform coding of audio signals using perceptual noise criteria," IEEE J. Select. Areas Commun., vol. 6, pp. 314-323, 1988. https://doi.org/10.1109/49.608
  2. 3GPP TS 26.403 v7.0.0, Enhanced aacPlus general audio codec; Encoder specification; Advanced audio coding (AAC) part, June, 2006.
  3. J. Herre and D. Schulz, "Extending the MPEG-4 AAC codec by perceptual noise substitution," AES 104th Convention, Amsterdam, May 1998.
  4. E. Zwicker and H. Fastl, Psychoacoustics, Facts and Models, Second Updated Edition. New York: Springer, 1999.
  5. D. Schulz, "Improving audio codecs by noise substitution," J. Audio Eng. Soc., vol. 44, no. 7/8, pp. 593-598, July/August, 1996.
  6. M. Neuendorf, P. Gournay, M. Multrus, J. Lecomte, B. Bessette, R. Geiger, S. Bayer, G. Fuchs, J. Hilpert, N. Rettelbach, F. Nagel, J. Robilliard, R. Salami, G. Schuller, R. Lefebvre, and B. Grill, "A novel scheme for low bitrate unified speech and audio coding-MPEG RM0," AES 126th Convention, Munich, Germany, May 2009.
  7. 3GPP TS 26.404 v6.0.0, Enhanced aacPlus general audio codec; Encoder specification; Spectral Band Replication (SBR) part, Sep., 2004.
  8. 3GPP TS 26.401 v6.2.0, Enhanced aacPlus general audio codec; General description, Mar., 2005.
  9. M. R. Schroeder, B. S. Atal, and J. L. Hall, "Optimizing digital speech coders by exploiting masking properties of the human ear," J. Acoust. Soc. Amer., vol. 66, pp. 1647-1652, 1979. https://doi.org/10.1121/1.383662
  10. Thomas F. Quatieri, Discrete-Time Speeh Signal Processing, Principles and Practice. Prentice Hall PTR, 2002.
  11. T. Sporer, "Objective audio signal evaluation-applied psychoacoustics for modeling the perceived quality of digital audio," AES 103rd Convention, preprint 4280, 1997.
  12. J. H. Chen and A. Gersho, "Adaptive postfiltering for quality enhancement of coded speech," IEEE Trans. Speech Audio Processing, vol. 3, no. 1, January, 1995.
  13. ITU-R Rec. BS.1387-1, Method for objective measurements of perceived audio quality, 1999.
  14. ITU-R BS.1534-1, Method for the Subjective Assessment of Intermediate Quality Level of Coding Systems, 2003.
  15. ISO/IEC JTC1/SC29/WG11 MPEG2007/N9095, Framework for Exploration of Speech and Audio Coding, San Jose, USA, Apr., 2007.
  16. ITU-R Rec. P.862, Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech coders, Feb. 2001.